Agent vs Run session
Agents and run sessions are two halves of the same workflow. The agent describes the capability you want to expose. The run session is the stateful executor that turns that description into actual model calls, tool usage, and telemetry for a specific request or tenant.
An agent is a reusable blueprint. It keeps the shared configuration for a capability—name, default model, instructions, tools, and optional toolkits—and remains safe to reuse across users because it never captures request-specific state.
Every implementation exposes three entry points: run
, run_stream
, and create_session
. The first two spin up a temporary run session, execute exactly once, and always close it so you do not leak resources. Reach for create_session
when you need to reuse initialized toolkits or run several calls back to back; you become responsible for closing that session afterwards. (In TypeScript the latter two are named runStream
and createSession
, but they follow the same contract.)
export class Agent<TContext> {readonly name: string;
constructor(params: AgentParams<TContext>);
run(request: AgentRequest<TContext>): Promise<AgentResponse>;
runStream(request: AgentRequest<TContext>): AsyncGenerator<AgentStreamEvent, AgentResponse>;
createSession(context: TContext): Promise<RunSession<TContext>>;}
Because the agent stays stateless, every request must provide its own context
. That value feeds dynamic instructions, toolkit factories, and tool executions without leaking across users or tenants.
Run session
Section titled “Run session”A run session binds an agent to a specific context. When it is created, the session applies parameter defaults, resolves all context-aware instructions, and asks each toolkit for a per-session instance so tools and prompts are ready to go. Static tools from the agent configuration are combined with toolkit-provided tools into a single roster for the run. Each call to run
or run_stream
starts from a clean RunState
made from the AgentItem[]
you pass in. The session never remembers prior inputs once the call finishes, so persist any conversation history you care about and include it explicitly on the next run.
class RunSession<TContext> {run(request: RunSessionRequest): Promise<AgentResponse>;
runStream(request: RunSessionRequest): AsyncGenerator<AgentStreamEvent, AgentResponse>;
close(): Promise<void>;}
interface RunSessionRequest {input: AgentItem[];}
interface AgentRequest<TContext> { /** * The input items for this run, such as LLM messages. */ input: AgentItem[]; /** * The context used to resolve instructions and passed to tool executions. */ context: TContext;}
interface AgentResponse { /** * The items generated during the run, such as new tool and assistant messages. */ output: AgentItem[];
/** * The final output content generated by the agent. */ content: Part[];}
pub struct AgentRequest<TCtx> { /// The input items for this run, such as LLM messages. pub input: Vec<AgentItem>, /// The context used to resolve instructions and passed to tool executions. pub context: TCtx,}
pub struct AgentResponse { /// The items generated during the run, such as new tool and assistant /// messages. pub output: Vec<AgentItem>,
/// The last assistant output content generated by the agent. pub content: Vec<Part>,}
type AgentRequest[C any] struct { // Input contains the items for this run, such as LLM messages. Input []AgentItem `json:"input"` // Context is the value used to resolve instructions and passed to tool executions. Context C `json:"context"`}
type AgentResponse struct { // Output contains the items generated during the run, such as new tool and assistant messages. Output []AgentItem `json:"output"`
// Content is the final output content generated by the agent. Content []llmsdk.Part `json:"content"`}
Within a run session the bound context is passed to every toolkit session and tool invocation, keeping the runtime consistent for the lifetime of that session. When your workflow ends, call close
to release toolkit resources and clear cached prompts. If you only need a single answer, stick with agent.run
or agent.run_stream
; otherwise reuse the session for as many runs as you need before closing it.
The flow looks like this:
sequenceDiagram participant Client participant Agent participant RunSession participant ToolkitSession as Toolkit sessions participant Tools participant Model Client->>Agent: run({ input, context }) Agent->>RunSession: create(context + params) RunSession->>ToolkitSession: create_session(context) ToolkitSession-->>RunSession: prompts + tools RunSession->>Model: generate() Model-->>RunSession: response/tool calls RunSession->>Tools: execute(args, context, state) Tools-->>RunSession: tool results RunSession-->>Client: AgentResponse Agent->>RunSession: close()
run_stream
follows the same lifecycle but emits partial deltas the moment the language model produces them. Internally the session uses a stream accumulator to turn those deltas into the final response before yielding tool events and the closing payload back to you.
A typical web request creates the agent once, derives a context for the user, opens a run session, supplies the current conversation as AgentItem[]
, gathers the response (streaming or not), and finally closes the session. This separation keeps per-user data explicit while letting the same agent power every tenant without risk of leakage.