Elenchus: Building Eval Systems for LLMs
My notes on building robust evaluation systems for large language models. Learn how to generate synthetic test data and use LLM-as-a-judge.
Where I share my thoughts, experiments, and discoveries in machine learning and software engineering.
My notes on building robust evaluation systems for large language models. Learn how to generate synthetic test data and use LLM-as-a-judge.
A comprehensive guide to building LLM agents from the ground up. This post covers agent architecture, tool integration, memory systems, planning strategies, and how to coordinate multiple agents working together on complex tasks. We explore ReAct, function calling, and chain-of-thought prompting patterns.

A step-by-step tutorial on implementing the LLaMA3 language model from scratch using pure JAX.

Building an image captioning model from scratch. The CNN encoder extracts visual features from images, the LSTM decoder generates captions word-by-word, and the attention mechanism helps the model focus on relevant image regions while generating each word. This project demonstrates the power of combining computer vision with natural language processing.
