Agent vs Run session
Agents and run sessions are two halves of the same workflow. The agent describes the capability you want to expose. The run session is the stateful executor that turns that description into actual model calls, tool usage, and telemetry for a specific request or tenant.
An agent is a reusable blueprint. It keeps the shared configuration for a capability—name, default model, instructions, tools, and optional toolkits—and remains safe to reuse across users because it never captures request-specific state.
It exposes three entry points: run, run_stream, and create_session. The first two spin up a temporary run session, execute exactly once, and always close it so you do not leak resources. Reach for create_session when you need to reuse initialized toolkits (e.g. MCP Toolkit) or run several calls back to back; you become responsible for closing that session afterwards.
export class Agent<TContext> {readonly name: string;
constructor(params: AgentParams<TContext>);
run(request: AgentRequest<TContext>): Promise<AgentResponse>;
runStream(request: AgentRequest<TContext>): AsyncGenerator<AgentStreamEvent, AgentResponse>;
createSession(context: TContext): Promise<RunSession<TContext>>;}Because the agent stays stateless, every request must provide its own context. That value feeds dynamic instructions, toolkit factories, and tool executions without leaking across users or tenants.
Run session
Section titled “Run session”A run session binds an agent to a specific context. When it is created, the session applies parameter defaults, resolves all context-aware instructions, and asks each toolkit for a per-session instance with personalized prompts and tools. Static tools from the agent configuration are combined with toolkit-provided tools into a single roster for the run. Each call to run or run_stream starts from a clean RunState made from the AgentItem[] you pass in.
class RunSession<TContext> {run(request: RunSessionRequest): Promise<AgentResponse>;
runStream(request: RunSessionRequest): AsyncGenerator<AgentStreamEvent, AgentResponse>;
close(): Promise<void>;}
interface RunSessionRequest {input: AgentItem[];}Within a run session the bound context is passed to every toolkit session and tool invocation, keeping the runtime consistent for the lifetime of that session. When your workflow ends, call close to release toolkit resources and clear cached prompts. If you only need a single answer or do not have any stateful components, stick with agent.run or agent.run_stream; otherwise reuse the session for as many runs as you need before closing it.
The flow looks like this:
sequenceDiagram
participant Client
participant Agent
participant RunSession
participant ToolkitSession as Toolkit sessions
participant Tools
participant Model
Client->>Agent: run({ input, context })
Agent->>RunSession: create(context + params)
RunSession->>ToolkitSession: create_session(context)
ToolkitSession-->>RunSession: prompts + tools
RunSession->>Model: generate()
Model-->>RunSession: response/tool calls
RunSession->>Tools: execute(args, context, state)
Tools-->>RunSession: tool results
RunSession->>Model: append tool results
Model-->>RunSession: response
RunSession-->>Client: AgentResponse
Agent->>RunSession: close()
run_stream follows the same lifecycle but emits partial deltas the moment the language model produces them. Internally the session uses a stream accumulator to turn those deltas into the final response before yielding tool events and the closing payload back to you.
A typical web request creates the agent once, derives a context for the user, opens a run session, supplies the current conversation as AgentItem[], gathers the response (streaming or not), and finally closes the session. This separation keeps per-user data explicit while letting the same agent power every tenant without risk of leakage.