Skip to content

Language model

A language model instance satisfies the LanguageModel interface, which includes the following:

  • provider: The LLM provider name.
  • model_id: The model identifier.
  • metadata: Metadata about the model, such as pricing information or capabilities.
  • generate(LanguageModelInput) -> ModelResponse: Generate a non-streaming response from the model.
  • stream(LanguageModelInput) -> AsyncIterable<PartialModelResponse>: Generate a streaming response from the model.

All models in the library implement the LanguageModel interface and can be used interchangeably.

get-model.ts
import type { LanguageModel } from "@hoangvvo/llm-sdk";
import { AnthropicModel } from "@hoangvvo/llm-sdk/anthropic";
import { CohereModel } from "@hoangvvo/llm-sdk/cohere";
import { GoogleModel } from "@hoangvvo/llm-sdk/google";
import { MistralModel } from "@hoangvvo/llm-sdk/mistral";
import { OpenAIChatModel, OpenAIModel } from "@hoangvvo/llm-sdk/openai";
import assert from "node:assert";
export function getModel(provider: string, modelId: string): LanguageModel {
switch (provider) {
case "openai":
assert(process.env["OPENAI_API_KEY"]);
return new OpenAIModel({
apiKey: process.env["OPENAI_API_KEY"],
modelId,
});
case "openai-chat-completion":
assert(process.env["OPENAI_API_KEY"]);
return new OpenAIChatModel({
apiKey: process.env["OPENAI_API_KEY"],
modelId,
});
case "anthropic":
assert(process.env["ANTHROPIC_API_KEY"]);
return new AnthropicModel({
apiKey: process.env["ANTHROPIC_API_KEY"],
modelId,
});
case "google":
assert(process.env["GOOGLE_API_KEY"]);
return new GoogleModel({
apiKey: process.env["GOOGLE_API_KEY"],
modelId,
});
case "cohere":
assert(process.env["CO_API_KEY"]);
return new CohereModel({ apiKey: process.env["CO_API_KEY"], modelId });
case "mistral":
assert(process.env["MISTRAL_API_KEY"]);
return new MistralModel({
apiKey: process.env["MISTRAL_API_KEY"],
modelId,
});
default:
throw new Error(`Unsupported provider: ${provider}`);
}
}

LanguageModelInput is a unified format to represent the input for generating responses from the language model, applicable to both non-streaming and streaming requests. The library converts these inputs into corresponding properties for each LLM provider, if applicable. This allows specifying:

  • The conversation history, which includes UserMessage, AssistantMessage, and ToolMessage.
  • Sampling parameters: max_tokens, temperature, top_p, top_k, presence_penalty, frequency_penalty, and seed.
  • Tool definitions and tool selection.
  • The response format to enforce the model to return structured objects instead of plain text.
  • modalities for the model to generate, such as text, images, or audio.
  • Specific part output options like audio, reasoning.
types.ts
interface LanguageModelInput {
/**
* A system prompt is a way of providing context and instructions to the model
*/
system_prompt?: string;
/**
* A list of messages comprising the conversation so far.
*/
messages: Message[];
/**
* Definitions of tools that the model may use.
*/
tools?: Tool[];
tool_choice?: ToolChoiceOption;
response_format?: ResponseFormatOption;
/**
* The maximum number of tokens that can be generated in the chat completion.
*/
max_tokens?: number;
/**
* Amount of randomness injected into the response. Ranges from 0.0 to 1.0
*/
temperature?: number;
/**
* An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. Ranges from 0.0 to 1.0
*/
top_p?: number;
/**
* Only sample from the top K options for each subsequent token. Used to remove 'long tail' low probability responses. Must be a non-negative integer.
*/
top_k?: number;
/**
* Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
*/
presence_penalty?: number;
/**
* Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
*/
frequency_penalty?: number;
/**
* The seed (integer), if set and supported by the model, to enable deterministic results.
*/
seed?: number;
/**
* The modalities that the model should support.
*/
modalities?: Modality[];
/**
* Options for audio generation.
*/
audio?: AudioOptions;
/**
* Options for reasoning generation.
*/
reasoning?: ReasoningOptions;
/**
* A set of key/value pairs that store additional information about the request. This is forwarded to the model provider if supported.
*/
metadata?: Record<string, string>;
/**
* Extra options that the model may support.
*/
extra?: Record<string, unknown>;
}

Messages are primitives that make up the conversation history, and Parts are the building blocks of each message. The library converts them into a format suitable for the underlying LLM provider and maps those from different providers to the unified format.

Three message types are defined in the SDK: UserMessage, AssistantMessage, and ToolMessage.

types.ts
type Message = UserMessage | AssistantMessage | ToolMessage;
interface UserMessage {
role: "user";
content: Part[];
}
interface AssistantMessage {
role: "assistant";
content: Part[];
}
interface ToolMessage {
role: "tool";
content: Part[];
}

The following Part types are implemented in the SDK: TextPart, ImagePart, AudioPart, SourcePart (for citation), ToolCallPart, and ToolResultPart.

types.ts
type Part =
| TextPart
| ImagePart
| AudioPart
| SourcePart
| ToolCallPart
| ToolResultPart
| ReasoningPart;
types.ts
interface TextPart {
type: "text";
text: string;
citations?: Citation[];
}
types.ts
interface ImagePart {
type: "image";
/**
* The MIME type of the image. E.g. "image/jpeg", "image/png".
*/
mime_type: string;
/**
* The base64-encoded image data.
*/
image_data: string;
/**
* The width of the image in pixels.
*/
width?: number;
/**
* The height of the image in pixels.
*/
height?: number;
/**
* ID of the image part, if applicable
*/
id?: string;
}
types.ts
interface AudioPart {
type: "audio";
/**
* The base64-encoded audio data.
*/
audio_data: string;
format: AudioFormat;
/**
* The sample rate of the audio. E.g. 44100, 48000.
*/
sample_rate?: number;
/**
* The number of channels of the audio. E.g. 1, 2.
*/
channels?: number;
/**
* The transcript of the audio.
*/
transcript?: string;
/**
* ID of the audio part, if applicable
*/
id?: string;
}
type AudioFormat =
| "wav"
| "mp3"
| "linear16"
| "flac"
| "mulaw"
| "alaw"
| "aac"
| "opus";
types.ts
interface SourcePart {
type: "source";
/**
* The source URL or identifier of the document.
*/
source: string;
/**
* The title of the document.
*/
title: string;
/**
* The content of the document.
*/
content: Part[];
}
types.ts
interface ToolCallPart {
type: "tool-call";
/**
* The ID of the tool call, used to match the tool result with the tool call.
*/
tool_call_id: string;
/**
* The name of the tool to call.
*/
tool_name: string;
/**
* The arguments to pass to the tool.
*/
args: Record<string, unknown>;
/**
* The ID of the tool call part, if applicable.
* This is different from tool_call_id which is used to match tool results.
*/
id?: string;
}
types.ts
interface ToolResultPart {
type: "tool-result";
/**
* The ID of the tool call from previous assistant message.
*/
tool_call_id: string;
/**
* The name of the tool that was called.
*/
tool_name: string;
/**
* The content of the tool result.
*/
content: Part[];
/**
* Marks the tool result as an error.
*/
is_error?: boolean;
}
types.ts
interface ReasoningPart {
type: "reasoning";
/**
* The reasoning text content
*/
text: string;
/**
* The reasoning internal signature
*/
signature?: string;
/**
* The ID of the reasoning part, if applicable
*/
id?: string;
}

The response from the language model is represented as a ModelResponse that includes:

  • content: An array of Part that represents the generated content, which usually comes from the AssistantMessage.
  • usage: Token usage information, if available.
  • cost: The estimated cost of the request, if the model’s pricing information is provided.
types.ts
interface ModelResponse {
content: Part[];
usage?: ModelUsage;
/**
* The cost of the response.
*/
cost?: number;
}
interface ModelUsage {
input_tokens: number;
output_tokens: number;
input_tokens_details?: ModelTokensDetails;
output_tokens_details?: ModelTokensDetails;
}
interface ModelTokensDetails {
text_tokens?: number;
cached_text_tokens?: number;
audio_tokens?: number;
cached_audio_tokens?: number;
image_tokens?: number;
cached_image_tokens?: number;
}

For streaming calls, the response is represented as a series of PartialModelResponse objects that include:

  • delta: A PartDelta and its index in the eventual content array.
  • usage: Token usage information, if available.
types.ts
interface ContentDelta {
index: number;
part: PartDelta;
}
interface PartialModelResponse {
delta?: ContentDelta;
usage?: ModelUsage;
cost?: number;
}

All SDKs provide the StreamAccumulator utility to help build the final ModelResponse from a stream of PartialModelResponse.