Lecture 2

LLMs & Agents

From language models to autonomous agents that reason, plan, and take actions.

Nir Naim

Tel Aviv University

Queueing Theory Seminar

Part 1

Large Language Models

Understanding how LLMs work

What is an LLM?

Core Principle

An LLM is a neural network trained to predict the next token given all previous tokens.

P(x_{t+1} | x_1, x_2, \ldots, x_t)

This simple objective, scaled to billions of parameters and trillions of tokens, produces remarkably capable models.

Decoder-Only Architecture

Modern LLMs (GPT-4, Claude, Llama) use decoder-only transformers:

Autoregressive: Generate one token at a time
Causal masking: Each position only sees previous positions
Context window: Fixed max sequence length

Training Pipeline

01

Pretraining

Next-token prediction on trillions of tokens from the web

02

Fine-tuning (SFT)

Supervised learning on curated instruction-response pairs

03

RLHF

Reinforcement learning from human preferences

\mathcal{L} = -\sum_{t=1}^{T} \log P(x_t | x_1, \ldots, x_{t-1}; \theta)

Training Data Sources

Common Crawl: Petabytes of web pages
Books: Literature, textbooks
Wikipedia: Encyclopedic knowledge
Code: GitHub repositories
Scientific papers: arXiv, PubMed

💡 Scale

GPT-3: 175B parameters trained on ~500B tokens
GPT-4: Estimated 1.8T parameters

Inference

Generating Text

How LLMs produce output

Temperature

Controls randomness by scaling logits before softmax:

P(x_i) = \frac{\exp(z_i / T)}{\sum_j \exp(z_j / T)}

Prompt: "The capital of France is ___"

Deterministic Random T=1.0

Sampling Strategies

Greedy

Always pick highest probability token

Top-k

Sample from k most likely tokens

Top-p (Nucleus)

Sample from tokens covering probability p

💡 Context Window

Maximum tokens the model can process. Attention is O(n²), so longer contexts are expensive.

Prompting

The Art of Instruction

How to get the best results from LLMs

Zero-Shot Prompting

No examples provided. The model relies on its pretraining knowledge.

Classify the sentiment as positive, negative, or neutral:

"The food was amazing but the service was slow."

Sentiment:

The model outputs: "neutral" or "mixed"

Few-Shot Prompting

Provide examples that demonstrate the pattern:

Review: "Best purchase ever!" → positive
Review: "Broke after one day." → negative
Review: "The food was amazing but the service was slow." →

The model learns the format from context and outputs: "mixed"

Chain-of-Thought

Encourage step-by-step reasoning before answering:

# Prompt with "Let's think step by step"

1. "food was amazing" → positive signal
2. "service was slow" → negative signal  
3. "amazing" is stronger than "slow"
4. Overall: positive with caveats

💡 Emergent Ability

CoT only works reliably in sufficiently large models

Scaling Laws

Performance scales predictably as a power law:

L(N) \propto N^{-0.076}, \quad L(D) \propto D^{-0.095}

Larger models are more sample efficient
No signs of diminishing returns at current scales
Some abilities emerge suddenly at certain scales

Part 2

LLM Agents

From text generation to autonomous action

What is an Agent?

Definition

LLM Agent = LLM (reasoning) + Tools (actions) + Memory (state) + Loop (orchestration)

The Agent Loop

1

Perceive

Receive input from user/environment

2

Think

LLM reasons about what to do

3

Act

Execute tool or respond

4

Observe

Process action result

5

Repeat

Until task complete

↻

Loop

Continuous reasoning cycle

Tool Use / Function Calling

LLMs output structured requests that external code executes:

# Define tools
tools = [
    {"name": "search_web", "description": "Search for info"},
    {"name": "calculate", "description": "Do math"}
]

# User: "What's 15% of 847?"
# LLM outputs: {"name": "calculate", "args": {"expr": "847 * 0.15"}}

Common Tool Types

🔍

Web Search

🧮

Calculator

💻

Code Execution

📁

File Operations

🌐

API Calls

🗄️

Database

Multi-Agent Systems

Complex tasks decomposed across specialized agents:

Hierarchical

Manager delegates to specialists

Collaborative

Peers share information

Debate

Agents argue, judge decides

Agent Frameworks

🦜

LangChain

General purpose, extensive integrations

⚡

PydanticAI

Type-safe, clean API

👥

CrewAI

Role-based multi-agent

💬

AutoGen

Multi-agent chat

📚

LlamaIndex

RAG & retrieval

🔧

Your Own!

Custom implementation

Workshop

Build Your Own Agent

Hands-on with PydanticAI

from pydantic_ai import Agent

agent = Agent('gemini-2.5-flash')  # Free!

@agent.tool
def search_web(query: str) -> str:
    return perform_search(query)

result = agent.run_sync("Find latest AI news")

Questions?

Thank you for your attention.

Workshop

Website