What are Large Language Models?
A Large Language Model (LLM) is a neural network trained to understand and generate human language. Built on the Transformer architecture (covered in the previous lecture), LLMs learn patterns from vast amounts of text data.
Core Principle: Next-Token Prediction
At its heart, an LLM is trained to predict the next token given all previous tokens. This simple objective, when scaled to billions of parameters and trillions of tokens, produces remarkably capable models.
Decoder-Only Architecture
Modern LLMs like GPT-4, Claude, and Llama use a decoder-only architecture. Unlike the original encoder-decoder Transformer, they process input and generate output in a single unified model using causal (masked) attention.
Key Characteristics
- Autoregressive: Generates one token at a time, feeding each output back as input
- Causal masking: Each position can only attend to previous positions
- Unified representation: Same model handles both "understanding" and "generation"
- Context window: Fixed maximum sequence length (e.g., 8K, 128K, 1M+ tokens)
Scale Matters
The "Large" in LLM refers to both model size and training data. GPT-3 has 175 billion parameters trained on ~500 billion tokens. GPT-4 is estimated to be even larger.
๐ก Key Insight
Scaling laws show that model performance improves predictably with compute, data, and parameters. This empirical finding drove the race to build ever-larger models.
Training LLMs
Training an LLM is a multi-stage process, from unsupervised pretraining on web-scale data to supervised fine-tuning for specific capabilities.
Stage 1: Pretraining
The model learns language patterns by predicting the next token on massive text corpora:
Training Data Sources
- Common Crawl: Petabytes of web pages
- Books: Literature, textbooks, technical manuals
- Wikipedia: Encyclopedic knowledge
- Code: GitHub repositories, documentation
- Scientific papers: arXiv, PubMed, etc.
Stage 2: Supervised Fine-Tuning (SFT)
After pretraining, models are fine-tuned on curated datasets of high-quality examples demonstrating desired behaviors:
# Instruction-following example { "instruction": "Explain quantum entanglement simply", "response": "Quantum entanglement is when two particles become connected in such a way that measuring one instantly affects the other, no matter how far apart they are..." }
Stage 3: Reinforcement Learning from Human Feedback (RLHF)
To align model outputs with human preferences, RLHF trains a reward model on human comparisons, then optimizes the LLM to maximize this reward:
๐ก Why RLHF?
RLHF helps models be helpful, harmless, and honest. It teaches models to refuse harmful requests, admit uncertainty, and follow complex instructions.
Inference: Generating Text
At inference time, the model generates text token-by-token. Several parameters control this process:
Temperature
Temperature controls the "randomness" of generation by scaling the logits before softmax:
Where \(T\) is temperature. Low temperature (โ0) makes output deterministic; high temperature (โโ) makes it uniform random.
Interactive: Temperature Effect
Adjust temperature to see how it affects token probabilities for "The capital of France is ___"
Sampling Strategies
Common Methods
- Greedy: Always pick the highest probability token
- Top-k: Sample from the k most likely tokens
- Top-p (nucleus): Sample from tokens whose cumulative probability exceeds p
- Beam search: Maintain multiple candidate sequences
Context Window
The context window is the maximum number of tokens the model can process. Attention is O(nยฒ), so longer contexts are expensive. Modern techniques like sliding window attention, sparse attention, and RoPE scaling extend context lengths.
Prompting Techniques
Prompting is the art of instructing LLMs through natural language. Different techniques unlock different capabilities.
Classify the sentiment of this review as positive, negative, or neutral: "The food was amazing but the service was slow." Sentiment:
Zero-shot: No examples provided. The model relies entirely on its pretraining knowledge.
Classify the sentiment of reviews: Review: "Best purchase I ever made!" Sentiment: positive Review: "Terrible quality, broke after one day." Sentiment: negative Review: "The food was amazing but the service was slow." Sentiment:
Few-shot: Provide examples that demonstrate the pattern. The model learns from context.
Classify the sentiment of this review. Think step by step: Review: "The food was amazing but the service was slow." Let me analyze this: 1. "food was amazing" - this is positive 2. "service was slow" - this is negative 3. Mixed signals, but "amazing" is strong positive 4. Overall leaning positive with a caveat Sentiment: positive (mixed)
Chain-of-Thought: Encourage the model to reason step-by-step before answering.
Why Chain-of-Thought Works
CoT prompting improves performance on reasoning tasks by:
- Breaking complex problems into manageable steps
- Making intermediate reasoning explicit (auditable)
- Reducing compounding errors in multi-step reasoning
- Leveraging the model's ability to follow demonstrated patterns
๐ก Emergent Ability
Chain-of-thought reasoning is an emergent abilityโit only works reliably in sufficiently large models. Smaller models may produce incoherent chains.
Scaling Laws
Empirical research has revealed predictable relationships between model performance and three key factors:
Where \(N\) = parameters, \(D\) = dataset size, \(C\) = compute, and \(\alpha \approx 0.076\) for parameters.
Key Findings (Kaplan et al., 2020)
- Performance scales as a power law with compute, data, and parameters
- Larger models are more sample efficient
- Optimal allocation: scale model size faster than dataset size
- No signs of diminishing returns at current scales
Emergent Abilities
Some capabilities appear suddenly at certain scales. Examples include:
- Multi-step arithmetic
- Word unscrambling
- Following complex instructions
- In-context learning from few examples
What are Agents?
An LLM Agent is a system that uses a language model as its core reasoning engine, combined with the ability to take actions, use tools, and maintain memory across interactions.
Definition: LLM Agent
An LLM Agent = LLM (reasoning) + Tools (actions) + Memory (state) + Loop (orchestration)
The Agent Loop
Agents operate in a continuous loop:
- Perceive: Receive input from user or environment
- Think: LLM reasons about what to do next
- Act: Execute a tool or generate a response
- Observe: Process the result of the action
- Repeat: Until task is complete
๐ก Agents vs. Chatbots
A chatbot generates text responses. An agent can take actions in the world: search the web, execute code, call APIs, modify files, and more.
Tool Use & Function Calling
Modern LLMs can be taught to use external tools through function calling. The model outputs structured requests that are executed by external code.
How Function Calling Works
# Define available tools tools = [ { "name": "search_web", "description": "Search the web for information", "parameters": { "query": {"type": "string", "description": "Search query"} } }, { "name": "calculate", "description": "Perform mathematical calculations", "parameters": { "expression": {"type": "string", "description": "Math expression"} } } ] # LLM decides to call a tool response = llm.chat( messages=[{"role": "user", "content": "What's 15% of 847?"}], tools=tools ) # Output: {"name": "calculate", "arguments": {"expression": "847 * 0.15"}}
Common Tool Types
Web Search
Query search engines for real-time information
Calculator
Precise mathematical computations
Code Execution
Run Python/JavaScript in sandboxed environments
File Operations
Read, write, and manipulate files
API Calls
Interact with external services
Database
Query and update databases
Structured Outputs
Beyond tool calls, LLMs can output any structured format (JSON, XML, etc.). This enables reliable parsing and integration with downstream systems.
from pydantic import BaseModel class MovieReview(BaseModel): title: str rating: float sentiment: Literal["positive", "negative", "neutral"] summary: str # LLM outputs validated, typed data review = llm.generate(MovieReview, prompt="Review: The Matrix...")
Multi-Agent Systems
Complex tasks can be decomposed across multiple specialized agents that collaborate, debate, or supervise each other.
Common Patterns
Hierarchical
A "manager" agent delegates subtasks to specialist agents and synthesizes results.
Collaborative
Peer agents work together, sharing information and building on each other's outputs.
Adversarial / Debate
Agents argue different positions; a judge synthesizes the best answer.
Example: Research Team
- Researcher Agent: Searches for information, reads papers
- Analyst Agent: Synthesizes findings, identifies patterns
- Writer Agent: Produces the final report
- Critic Agent: Reviews for accuracy and clarity
Agent Frameworks
Several frameworks simplify building LLM agents. Each has different design philosophies and trade-offs.
| Framework | Focus | Key Feature |
|---|---|---|
| LangChain | General purpose | Extensive integrations, chains |
| PydanticAI | Type safety | Pydantic-based structured outputs |
| CrewAI | Multi-agent | Role-based agent teams |
| AutoGen | Conversations | Multi-agent chat orchestration |
| LlamaIndex | RAG / Data | Document indexing & retrieval |
Why PydanticAI?
For our workshop, we'll use PydanticAI because:
- Clean, Pythonic API with type hints
- Built-in structured output validation
- Easy tool definition with decorators
- Provider-agnostic (OpenAI, Anthropic, etc.)
- Lightweight and easy to understand
from pydantic_ai import Agent agent = Agent( 'gemini-2.5-flash', # Free tier! system_prompt="You are a helpful research assistant." ) @agent.tool def search_web(query: str) -> str: """Search the web for information.""" return perform_search(query) result = agent.run_sync("What's the latest news on AI?") print(result.output)
Ready to Build Your Own Agent?
Join the hands-on workshop where we'll build a research assistant agent step-by-step using PydanticAI.
Start WorkshopFurther Reading
Papers
- Vaswani et al., "Attention Is All You Need" (2017) โ The Transformer
- Brown et al., "Language Models are Few-Shot Learners" (2020) โ GPT-3
- Wei et al., "Chain-of-Thought Prompting" (2022)
- Ouyang et al., "Training language models to follow instructions with human feedback" (2022) โ InstructGPT
- Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2023)