From language models to autonomous agents that reason, plan, and take actions.
Seminar led by Prof. Uri Yechiali
Part 1
Understanding how LLMs work
Core Principle
An LLM is a neural network trained to predict the next token given all previous tokens.
This simple objective, scaled to billions of parameters and trillions of tokens, produces remarkably capable models.
Modern LLMs (GPT-4, Claude, Llama) use decoder-only transformers:
01
Pretraining
Next-token prediction on trillions of tokens from the web
02
Fine-tuning (SFT)
Supervised learning on curated instruction-response pairs
03
RLHF
Reinforcement learning from human preferences
๐ก Scale
GPT-3: 175B parameters trained on ~500B tokens
GPT-4: Estimated 1.8T parameters
Inference
How LLMs produce output
Controls randomness by scaling logits before softmax:
Prompt: "The capital of France is ___"
Greedy
Always pick highest probability token
Top-k
Sample from k most likely tokens
Top-p (Nucleus)
Sample from tokens covering probability p
๐ก Context Window
Maximum tokens the model can process. Attention is O(nยฒ), so longer contexts are expensive.
Prompting
How to get the best results from LLMs
No examples provided. The model relies on its pretraining knowledge.
Classify the sentiment as positive, negative, or neutral: "The food was amazing but the service was slow." Sentiment:
The model outputs: "neutral" or "mixed"
Provide examples that demonstrate the pattern:
Review: "Best purchase ever!" โ positive Review: "Broke after one day." โ negative Review: "The food was amazing but the service was slow." โ
The model learns the format from context and outputs: "mixed"
Encourage step-by-step reasoning before answering:
# Prompt with "Let's think step by step" 1. "food was amazing" โ positive signal 2. "service was slow" โ negative signal 3. "amazing" is stronger than "slow" 4. Overall: positive with caveats
๐ก Emergent Ability
CoT only works reliably in sufficiently large models
Performance scales predictably as a power law:
Part 2
From text generation to autonomous action
Definition
LLM Agent = LLM (reasoning) + Tools (actions) + Memory (state) + Loop (orchestration)
1
Perceive
Receive input from user/environment
2
Think
LLM reasons about what to do
3
Act
Execute tool or respond
4
Observe
Process action result
5
Repeat
Until task complete
โป
Loop
Continuous reasoning cycle
LLMs output structured requests that external code executes:
# Define tools tools = [ {"name": "search_web", "description": "Search for info"}, {"name": "calculate", "description": "Do math"} ] # User: "What's 15% of 847?" # LLM outputs: {"name": "calculate", "args": {"expr": "847 * 0.15"}}
Web Search
Calculator
Code Execution
File Operations
API Calls
Database
Complex tasks decomposed across specialized agents:
Hierarchical
Manager delegates to specialists
Collaborative
Peers share information
Debate
Agents argue, judge decides
LangChain
General purpose, extensive integrations
PydanticAI
Type-safe, clean API
CrewAI
Role-based multi-agent
AutoGen
Multi-agent chat
LlamaIndex
RAG & retrieval
Your Own!
Custom implementation
Workshop
Hands-on with PydanticAI
from pydantic_ai import Agent agent = Agent('gemini-2.5-flash') # Free! @agent.tool def search_web(query: str) -> str: return perform_search(query) result = agent.run_sync("Find latest AI news")