* This document is a memo researched and formatted with the help of ChatGPT/Gemini in order to organize my own understanding.

OpenAI Harmony Format Specification

0. Purpose of This Document

This document is a detailed specification of OpenAI's Harmony response format. Rather than relying on the official documentation alone, it organizes the specification by cross-referencing publicly available materials: the gpt-oss chat template, tokenizer configuration, the openai-harmony library, and practical operational resources for Transformers / vLLM / Ollama and the like.

This document aims primarily to clarify the following:

The technical problems that the Harmony format solves
The role definitions of system / developer / user / assistant / tool
The separation of output into the three channels analysis / commentary / final
How reasoning (reasoning / CoT), tool calls, and structured output are represented
A complete list of the publicly documented special tokens and the criteria for their use
The treatment of additional token candidates that exist within the openai/harmony implementation but are not published in the tokenizer configuration
The actual processing logic in Hugging Face's chat_template.jinja
Concrete conversation examples, tool-use examples, and implementation examples

1. Overview of Harmony

Harmony is a prompt and response formatting format optimized for OpenAI's gpt-oss family of models, designed to handle conversation, internal reasoning, tool calls, and structured output in a unified way. OpenAI officially states that applying the Harmony format is mandatory for gpt-oss models to operate correctly.

The format is designed to follow the mental model of the OpenAI Responses API, enabling the model to process and produce the following elements as a single continuous token sequence:

The hierarchical structure of the conversation (system / developer / user / assistant / tool)
The model's internal reasoning (analysis)
Intermediate notices to the user (commentary) and the final answer (final)
Calls to user-defined functions (function tools)
Execution of built-in tools (browser / python)
Structured output (output based on JSON Schema, etc.)

Note that manually constructing the Harmony format in your implementation is not recommended; the basic policy is to use the openai-harmony library or the official chat template.

1.1 Positioning of the Format

Harmony is a concept that extends a simple role-based chat template, and its main characteristics are as follows:

The assistant's output can be split into independent channels: analysis / commentary / final.
The end of a tool call is controlled by the <|call|> token.
The end of a final answer is controlled by the <|return|> token.
When constructing the conversation history for the next turn, the terminating <|return|> can be normalized to <|end|>, which indicates a completed state.
A reasoning process (reasoning) can be interleaved within the sequence of tool calls.
A response format (with a schema definition) can be embedded within a developer message.

2. Reference Resources

In preparing this specification, the following public information was referenced as primary and secondary sources.

2.1 Official / Semi-official Documentation

OpenAI Cookbook: OpenAI Harmony Response Format https://developers.openai.com/cookbook/articles/openai-harmony
GitHub: openai/harmony https://github.com/openai/harmony
openai/harmony Python API docs
https://github.com/openai/harmony/blob/main/docs/python.md
openai/harmony implementation code (src/encoding.rs)
https://github.com/openai/harmony/blob/main/src/encoding.rs
GitHub: openai/gpt-oss https://github.com/openai/gpt-oss
OpenAI Cookbook: How to run gpt-oss with Hugging Face Transformers https://github.com/openai/openai-cookbook/blob/main/articles/gpt-oss/run-transformers.md

2.2 Implemented Templates and Model Assets

Hugging Face openai/gpt-oss-20b chat_template.jinja
https://huggingface.co/openai/gpt-oss-20b/blob/main/chat_template.jinja
Hugging Face openai/gpt-oss-20b tokenizer_config.json
https://huggingface.co/openai/gpt-oss-20b/blob/main/tokenizer_config.json
vLLM Recipe: GPT OSS https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
Ollama gpt-oss template blob
https://ollama.com/library/gpt-oss%3A20b/blobs/51468a0fd901

2.3 Stance on Vetting Information in This Document

The official documentation (2.1) is treated as the primary source of information.
However, because actual system behavior depends on the chat template, tokenizer configuration, and library implementation, these secondary sources (2.2) are also comprehensively evaluated as part of the specification.
Where discrepancies between the official documentation and the actual implementation / templates are confirmed, those differences are noted explicitly.
For special tokens whose purpose is not published, they are treated as "undefined" or "unknown," and no meaning is assigned by conjecture.

3. The Design Philosophy of Harmony

3.1 Basic Architectural Concept

Harmony is a format designed on the premise that "the model's output is not a single continuous string." The output of the assistant role is not a single text generation but encompasses the following staged process:

Execution of internal reasoning (CoT) via the analysis channel
Intermediate notices to the user via the commentary channel (e.g., a declaration such as "I will begin searching")
Function / tool calls (Tool Call) via the commentary or analysis channel
Reading of tool execution results
Output of the final answer via the final channel

Thus, logically separating the output paths for "internal thought," "intermediate notices," and "final answer" into independent channels is the core design philosophy of this format.

3.2 Mandatory Requirement in gpt-oss

The official documentation explicitly states that gpt-oss models should not be used without applying the Harmony format. The reason gpt-oss can run on inference frameworks such as Transformers, Ollama, and vLLM without directly being aware of the Harmony specification is that each framework applies the template and performs parsing in the background on your behalf.

4. Conceptual Model

4.1 Role Definitions

Harmony defines the following five roles.

Role	Priority	Purpose	Representative Example
system	1	Definition of the model's fixed meta-information (identity, knowledge cutoff, date, reasoning, built-in tools, valid channels)	You are ChatGPT...
developer	2	Instructions from the application developer, function tool definitions, and the response format definition	# Instructions
user	3	Input from the end user	Questions / requests
assistant	4	The model's output (using the analysis / commentary / final channels)	Internal thought, intermediate explanations, final answer, tool calls
tool	5	Returning the execution result for a tool call	Weather API JSON, browser output, python execution result

4.2 Instruction Hierarchy

In Harmony, the hierarchy of roles itself functions directly as the Instruction Hierarchy.

Priority: system > developer > user > assistant > tool

When an instruction specified in a lower role (e.g., a developer instruction to "answer in Japanese") conflicts with the definition of a higher role (e.g., system), the definition of the higher role is always applied with precedence.

4.3 The Special Nature of the system Role

In typical chat model implementations, "application-specific instructions" are often treated as the system prompt, but in Harmony the principle is to write application-specific instructions in the developer role.

The system role in Harmony is strictly reserved for defining "foundational meta-information" such as the following:

A fixed identity (e.g., You are ChatGPT, a large language model trained by OpenAI.)
The knowledge cutoff (Knowledge cutoff: ...)
The current date (Current date: ...)
The reasoning level (Reasoning: low | medium | high)
Definitions of built-in tools
Declaration of the valid channels

4.4 Differences from Implementation Frameworks (the Transformers Case)

In the official chat_template.jinja provided by Hugging Face, whether the leading role of the input message array is system or developer, it is internally reinterpreted as a Harmony developer message. The proper Harmony system message is automatically generated by that template.

Example input configuration in Transformers:

messages = [
    {"role": "system", "content": "Always answer in Japanese."},
    {"role": "user", "content": "Hello"},
]

Conceptual diagram of the internal expansion in the Harmony format:

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2026-04-04

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|>
<|start|>developer<|message|># Instructions

Always answer in Japanese.

<|end|>
<|start|>user<|message|>Hello<|end|>
<|start|>assistant

As shown, a system specification in a general inference API and a pure system message in the Harmony standard are different entities, so the two must be clearly distinguished in implementation and operation.

5. Channels

An assistant message uses one of the following three channels depending on its purpose.

Channel	Purpose	Shown to User	Typical Example
analysis	Internal reasoning, CoT (Chain of Thought), calls to built-in tools	Hidden	Working through formulas, planning a search, Python code
commentary	Intermediate notices to the user (preamble), calls to ordinary function tools	Optional (depends on requirements)	Notices such as "I'll search first"
final	The final answer	Shown	The final response content

5.1 Minimal Configuration Examples for Each Channel

analysis

<|start|>assistant<|channel|>analysis<|message|>Need to verify the date before answering.<|end|>

commentary

<|start|>assistant<|channel|>commentary<|message|>I'll answer after confirming the latest information.<|end|>

final

<|start|>assistant<|channel|>final<|message|>The conclusion is 42.<|return|>

5.2 Difference Between User-defined Functions (Function Tool) and Built-in Tools (Built-in Tool)

As a principle, user-defined function tools are called on the commentary channel, and built-in tools (browser / python) are called on the analysis channel. However, since the official documentation also mentions cases where built-in tools are emitted on commentary, these are not absolute constraints but are defined as the standard behavior.

5.3 Differences Between the Specification and the Implementation

A standard system message includes the specification # Valid channels: analysis, commentary, final. Channel must be included for every message. However, in the official reference templates and API implementations, channels are never attached to system, developer, or user messages. In practice, treat it as "make the channel explicit only for messages that the assistant generates."

6. Basic Message Syntax

6.1 Basic Structure

The basic syntax of a message in Harmony is as follows.

<|start|>{header}<|message|>{content}<|end|>

The terminating token when generating an assistant message is chosen from the following three, depending on context.

An ordinary completed message: <|end|>
Completion of generating a final answer (decoding stops): <|return|>
Completion of generating a tool call (decoding stops): <|call|>

6.2 Pseudo-BNF Definition

A practical pseudo-BNF definition of this syntax is shown below.

message      := "<|start|>" header "<|message|>" content terminator
header       := author [recipient] [channel] [content_type]
author       := role | role ":" name | tool_name
recipient    := " to=" recipient_name
channel      := "<|channel|>" channel_name
content_type := " " plain_type | " <|constrain|>" constrained_type
terminator   := "<|end|>" | "<|return|>" | "<|call|>"

6.3 Structural Analysis of a Tool Call Header

Using the following tool call message as an example, each element is explained.

<|start|>assistant to=functions.get_current_weather<|channel|>commentary <|constrain|>json<|message|>{"location":"Tokyo"}<|call|>

<|start|>: The token indicating the start of the message.
assistant: The role of the sender.
to=functions.get_current_weather: The specification of the function being called.
<|channel|>commentary: The channel being used.
<|constrain|>json: The data type of the payload.
<|message|>: The boundary token between the header and the body (content).
{"location":"Tokyo"}: The body (JSON-formatted arguments).
<|call|>: The terminating token for a tool call. Reasoning stops here and transitions to the tool execution process.

7. Specification of the "assistant Prefill" at Generation Time

In a typical inference request, the end of the input prompt is terminated with <|start|>assistant as follows.

<|start|>user<|message|>What is 2 + 2?<|end|>
<|start|>assistant

Because generation begins from this state, there is no need to re-declare the assistant role in the model's first output. As a result, as seen in the official examples, the model's generated string begins with the channel specification, as follows.

<|channel|>analysis<|message|>...

This is not a missing or corrupted output; it is a normal continuation of the <|start|>assistant already presented at the end of the prompt, as designed.

8. Specification of the system Message

8.1 Composition Requirements

It is recommended that the system message include the following elements:

A fixed identity: for example, specify something like "You are ChatGPT, a large language model trained by OpenAI."
Knowledge cutoff: specify the knowledge cutoff date.
Current date: specify the current date.
Reasoning: specify the degree of reasoning as one of low, medium, or high.
built-in tools: define built-in tools as needed.
valid channels: declare the valid channels (analysis, commentary, final).
Specification of function tools: when using function tools, add a statement instructing that those tools be called on the commentary channel.

8.2 Minimal Configuration Example

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2026-04-04

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|>

8.3 Configuration Example When Defining function tools

When using function tools, explicitly state that calls to those tools are made on the commentary channel.

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2026-04-04

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|>

8.4 Configuration Example When Defining built-in tools

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2026-04-04

Reasoning: medium

# Tools

## browser

// Tool for browsing.
// The `cursor` appears in brackets before each browsing display: `[{cursor}]`.
// Cite information from the tool using the following format:
// `〖{cursor}†L{line_start}(-L{line_end})?〗`, for example: `〖6†L9-L11〗` or `〖8†L3〗`.
// Do not quote more than 10 words directly from the tool output.
// sources=web (default: web)
namespace browser {

type search = (_: {
query: string,
topn?: number, // default: 10
source?: string,
}) => any;

type open = (_: {
id?: number | string, // default: -1
cursor?: number, // default: -1
loc?: number, // default: -1
num_lines?: number, // default: -1
view_source?: boolean, // default: false
source?: string,
}) => any;

type find = (_: {
pattern: string,
cursor?: number, // default: -1
}) => any;

} // namespace browser

## python

Use this tool to execute Python code in your chain of thought. The code will not be shown to the user.
When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment...

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|>

8.5 Note on the Specification Regarding Identity Changes

The official documentation recommends keeping the identity fixed and using developer messages (developer instructions) when changing the model's persona or behavior. Although changing model_identity is technically possible at the API or template argument level, note that the recommended operation per the specification is to keep it "fixed."

9. Specification of the developer Message

The developer message functions as the area for writing what would traditionally be a general system prompt (application-specific instructions). It is typically composed of the following three sections.

# Instructions
# Tools
# Response Formats

9.1 Minimal Configuration Example Using Only Instructions

<|start|>developer<|message|># Instructions

Please answer in Japanese, keeping bullet points to a minimum.<|end|>

9.2 Configuration Example Including function tools

<|start|>developer<|message|># Instructions

Please answer in a friendly tone.

# Tools

## functions

namespace functions {

// Returns the current location
type get_location = () => any;

// Returns the current weather for the specified city
type get_current_weather = (_: {
// City name. Example: "Tokyo"
location: string,
// Unit
format?: "celsius" | "fahrenheit", // default: celsius
}) => any;

} // namespace functions<|end|>

9.3 Configuration Example Including structured output

<|start|>developer<|message|># Instructions

You are a shopping-list creation assistant.

# Response Formats

## shopping_list

{"type":"object","properties":{"items":{"type":"array","items":{"type":"string"},"description":"shopping items"}},"required":["items"]}<|end|>

10. Specification of the user Message

The user message is the most basic message area for holding input from the end user.

10.1 Basic Syntax Example

<|start|>user<|message|>Tell me today's temperature in Tokyo.<|end|>

10.2 Extended Syntax (Named author)

In the openai/harmony implementation, a syntax for attaching a named author (role:name) to user and assistant is supported. It can be written as in the following example.

<|start|>user:alice<|message|>Hello<|end|>

However, in the general use cases of the published official documentation, named authors are not used. Therefore, for ordinary purposes, a configuration specifying an anonymous user or assistant is sufficient to meet the requirements.

11. Specification of the assistant Message

The assistant message plays a central role in the Harmony format. Depending on the purpose, it is used in one of the following four types.

analysis: internal reasoning
commentary: intermediate explanations to the user and preambles
final: the final answer
Tool calls (tool call) with a to=... attribute

11.1 Basic Syntax of final

<|start|>assistant<|channel|>final<|message|>2 + 2 = 4.<|return|>

11.2 Basic Syntax of analysis

<|start|>assistant<|channel|>analysis<|message|>The user is asking simple arithmetic. Answer directly.<|end|>

11.3 Basic Syntax of commentary

<|start|>assistant<|channel|>commentary<|message|>First, I'll confirm the official information.<|end|>

11.4 Basic Syntax of a function tool call

A standard tool call example conforming to the Cookbook is shown below.

<|start|>assistant<|channel|>analysis<|message|>Need to use function get_current_weather.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>

11.5 Implementation Difference: Plain-text json Output in the Hugging Face Template

In the chat_template.jinja provided by Hugging Face, the content_type of a tool call is by default placed in the header as plain-text json. As a result, the actual rendered output may look like the following.

<|start|>assistant to=functions.get_current_weather<|channel|>commentary json<|message|>{"location":"San Francisco"}<|call|>

On the other hand, the official Cookbook shows examples using <|constrain|>json. In the openai/harmony implementation, the content_type accepts both of the following forms.

The plain-text "json"
The special-token form "<|constrain|>json"

Therefore, although there is a surface-level difference between the Cookbook's description and the Hugging Face template's output format, the system is designed to accept both representations.

12. Specification of the tool Message

The tool message is a message for returning the result of an executed tool to the model.

12.1 Treatment of the role Attribute During Serialization

Note that during serialization, the literal string tool is not placed in the role position; instead, the specific "tool name" is placed there. An example of serialization conforming to the official Cookbook specification is shown below.

<|start|>functions.get_current_weather to=assistant<|channel|>commentary<|message|>{"sunny":true,"temperature":20}<|end|>

As shown above, the leading element of the header is not tool but functions.get_current_weather.

12.2 Message Example of the built-in python Tool

<|start|>python to=assistant<|channel|>analysis<|message|>55<|end|>

12.3 Message Example of the built-in browser Tool

<|start|>browser.search to=assistant<|channel|>analysis<|message|>[12] Bank of Japan - Policy Interest Rate<|end|>

13. The Lifecycle of reasoning / CoT

13.1 Basic Principles

The basic principles of the reasoning process are as follows.

gpt-oss functions as a reasoning model.
The reasoning effort is specified as one of low, medium, or high.
The raw reasoning process (CoT: Chain of Thought) is emitted on the analysis channel.
The content of the analysis channel is, by specification, hidden from the end user.
The final answer to the user is emitted via the final channel.

13.2 A Standard Reasoning Example

Input

<|start|>user<|message|>What is 2 + 2?<|end|>
<|start|>assistant

Output

<|channel|>analysis<|message|>User asks: "What is 2 + 2?" Simple arithmetic. Provide answer.<|end|>
<|start|>assistant<|channel|>final<|message|>2 + 2 = 4.<|return|>

13.3 Choosing Between <|return|> and <|end|>

The treatment of the terminating token must be chosen according to context, as follows.

The final answer at the point generation stops: <|return|>
When storing as conversation history: <|end|>

When storing an assistant final message into the conversation history for the next turn, rather than keeping <|return|> as is, a normalization to <|end|> is required, as shown below.

Before normalization (immediately after generation):

<|start|>assistant<|channel|>final<|message|>2 + 2 = 4.<|return|>

After normalization (when storing to history):

<|start|>assistant<|channel|>final<|message|>2 + 2 = 4.<|end|>

13.4 On the Necessity of Normalization

<|return|> functions as a decode-time stop token that instructs "the generation of this response is complete." Therefore, per the official Cookbook specification, the proper treatment is to unify the termination of completed messages in the conversation history to <|end|>.

13.5 Retaining CoT in the Middle of a Tool Call

Normally, the CoT of past turns can be excluded from the history (or exclusion is recommended). However, when a tool call is in progress, this is an exception. The assistant executes the process in the following flow.

Output of analysis
Output of the tool call
Receipt of the tool execution result
Continuation of reasoning, or output of final

In this sequence of processes, the analysis immediately preceding the tool call must be retained in the history as context for continuing reasoning generation.

Example:

<|start|>assistant<|channel|>analysis<|message|>Need to use function get_current_weather.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>
<|start|>functions.get_current_weather to=assistant<|channel|>commentary<|message|>{"sunny": true, "temperature": 20}<|end|>
<|start|>assistant

13.6 Auto-drop Behavior in openai-harmony and the Hugging Face Template

The history-handling policy of each public implementation is as follows.

openai-harmony implements RenderConversationConfig(auto_drop_analysis=True).
The Hugging Face chat_template.jinja contains logic that discards preceding tool-CoT and analysis when a subsequent final exists.

In practical system implementations, a design that does not excessively accumulate unnecessary analysis messages in the conversation history is standard.

13.7 Constraints When Applying the Hugging Face Template

chat_template.jinja operates on the following premises.

It assumes at most one tool call per assistant message.
The tool name in a tool-role message is inferred from the name of the immediately preceding assistant tool call.
An assistant tool call message that contains both content and thinking simultaneously is handled as an exception.

Because of this specification, when constructing the message array manually, you must conform to the structure the template allows.

14. Specification of Function Tool Definitions

14.1 Basic Rules

The Function Tool definition syntax recommended in the official Cookbook is as follows.

Declare namespaces in the form namespace functions { ... }.
Define a function with no arguments as type foo = () => any;.
Define a function with arguments as type foo = (_: { ... }) => any;.
Write a description (comment) above each field using //.
Always specify the return type as any.

14.2 Complete Example

# Tools

## functions

namespace functions {

// Returns the user's location information
type get_location = () => any;

// Returns the current weather for the specified city
type get_current_weather = (_: {
// City name. Example: "Tokyo"
location: string,
// Temperature unit
format?: "celsius" | "fahrenheit", // default: celsius
}) => any;

// Retrieves the weather for multiple cities at once
type get_multiple_weathers = (_: {
// List of cities
locations: string[],
format?: "celsius" | "fahrenheit", // default: celsius
}) => any;

} // namespace functions

14.3 Conversion from JSON Schema to a TypeScript-like Signature

In Hugging Face's chat_template.jinja and the openai/harmony implementation, tool definitions are internally rendered from the original JSON Schema (or an OpenAI-type tool spec) into a TypeScript-like string for Harmony.

The representative conversion rules at that time are as follows.

Array types are converted to string[], number[], boolean[], etc.
Enum types are converted to union types like "a" | "b" | "c".
Properties that are not required have ? appended.
Default values are noted as comments in the form // default: ....
Complex union types and object types may fall back to the any type.

15. Specification of Built-in Tools

In the public materials for Harmony (gpt-oss), the following two built-in tools (Built-in Tools) are primarily defined.

browser
python

15.1 browser

15.1.1 Role

Web search
Expanding (opening) web pages
In-page search
Citing information with line numbers

15.1.2 Definition in the Template

## browser

// Tool for browsing.
// The `cursor` appears in brackets before each browsing display: `[{cursor}]`.
// Cite information from the tool using the following format:
// `〖{cursor}†L{line_start}(-L{line_end})?〗`
// Do not quote more than 10 words directly from the tool output.
// sources=web (default: web)
namespace browser {

type search = (_: {
query: string,
topn?: number, // default: 10
source?: string,
}) => any;

type open = (_: {
id?: number | string, // default: -1
cursor?: number, // default: -1
loc?: number, // default: -1
num_lines?: number, // default: -1
view_source?: boolean, // default: false
source?: string,
}) => any;

type find = (_: {
pattern: string,
cursor?: number, // default: -1
}) => any;

} // namespace browser

15.1.3 Call Channel

Per the official specification, the browser tool is normally called within the analysis channel.

15.1.4 Concrete Example

<|start|>assistant<|channel|>analysis<|message|>Need to verify the latest policy rate from an official source.<|end|>
<|start|>assistant to=browser.search<|channel|>analysis <|constrain|>json<|message|>{"query":"site:boj.or.jp policy rate","topn":5,"source":"web"}<|call|>
<|start|>browser.search to=assistant<|channel|>analysis<|message|>[12] Bank of Japan - Monetary Policy<|end|>
<|start|>assistant to=browser.open<|channel|>analysis <|constrain|>json<|message|>{"cursor":12,"id":0,"loc":120,"num_lines":20,"source":"web"}<|call|>
<|start|>browser.open to=assistant<|channel|>analysis<|message|>[13]
L120: ...
L121: ...
L122: ...<|end|>
<|start|>assistant<|channel|>final<|message|>According to the official material, the policy rate is stated as 0.5%. 〖13†L120-L122〗<|return|>

15.2 python

15.2.1 Role

Computation within the internal reasoning process
Data formatting
Execution of small pieces of code
Preparation of final deliverables (tables, figures, files, etc.)

15.2.2 Provisions in the Template

The specification of this tool in the template is defined as follows.

Python code can be executed as part of the Chain-of-Thought (CoT) process.
The executed code is, by premise, hidden from the end user.
A stateful, Jupyter-notebook-like environment is provided.
A timeout constraint exists on execution.
Access to the /mnt/data directory is permitted.
Whether internet access is available depends on the environment settings of the hosting cluster and is not uniform.

15.2.3 Call Channel

Per the official specification, the python tool is also normally called within the analysis channel.

15.2.4 Concrete Example

<|start|>assistant<|channel|>analysis<|message|>Need exact computation for the sum of squares from 1 to 5.<|end|>
<|start|>assistant to=python<|channel|>analysis<|message|>sum(i*i for i in range(1, 6))<|call|>
<|start|>python to=assistant<|channel|>analysis<|message|>55<|end|>
<|start|>assistant<|channel|>final<|message|>The sum from 1^2 to 5^2 is 55.<|return|>

16. Specification of Structured Output

In the Harmony format, by declaring a # Response Formats section at the end of the developer message, the target JSON Schema can be embedded.

16.1 Syntax

# Response Formats

## {format_name}

// Optional description
{json_schema}

16.2 Concrete Example

<|start|>developer<|message|># Instructions

You are a shopping assistant.

# Response Formats

## shopping_list

{"type":"object","properties":{"items":{"type":"array","items":{"type":"string"},"description":"entries on the shopping list"}},"required":["items"]}<|end|>
<|start|>user<|message|>I want to buy coffee, eggs, and milk.<|end|>
<|start|>assistant

Expected output example:

<|start|>assistant<|channel|>final<|message|>{"items":["coffee","eggs","milk"]}<|return|>

17. List of Public Special Tokens

This chapter defines all the added special tokens included in the tokenizer_config.json of openai/gpt-oss-20b published on Hugging Face.

17.1 Special Tokens Whose Specification Is Public

A list of special tokens whose operational meaning is explicitly stated and which are actually used is shown below.

Token ID	Token	Purpose / Specification	Concrete Example
199998	<\|startoftext\|>	The BOS token. It is prepended by the tokenizer, so it is normally not entered manually.	-
199999	<\|endoftext\|>	The Pad token. It is used for padding in batch processing, so it is normally not entered manually.	-
200002	<\|return\|>	The decode-stop token when generating a final answer.	<\|start\|>assistant<\|channel\|>final<\|message\|>This is the answer.<\|return\|>
200003	<\|constrain\|>	Explicit indication of the content type and format constraint.	... <\|constrain\|>json<\|message\|>{"x":1}<\|call\|>
200005	<\|channel\|>	The delimiter for the channel.	assistant<\|channel\|>analysis
200006	<\|start\|>	Explicit indication of the start position of a message.	<\|start\|>user<\|message\|>Hello<\|end\|>
200007	<\|end\|>	Explicit indication of the end position of a completed message.	<\|start\|>user<\|message\|>Hello<\|end\|>
200008	<\|message\|>	The delimiter between the header and the message body.	user<\|message\|>Hello
200012	<\|call\|>	The decode-stop token when generating a tool call.	... {"location":"Tokyo"}<\|call\|>
200018	<\|endofprompt\|>	A special token in the tokenizer. No ordinary use is defined in the published Harmony grammar specification.	-

17.2 Reserved Special Tokens

The following tokens are Reserved, and their meaning in the public specification is not defined. Manually inserting them into a prompt is not recommended.

<|reserved_200000|>
<|reserved_200001|>
<|reserved_200004|>
<|reserved_200009|>
<|reserved_200010|>
<|reserved_200011|>
<|reserved_200013|>
<|reserved_200014|>
<|reserved_200015|>
<|reserved_200016|>
<|reserved_200017|>

17.3 Supplementary Notes on the Tokenizer Configuration

The basic token assignments in tokenizer_config.json are as follows.

bos_token = "<|startoftext|>"
eos_token = "<|return|>"
pad_token = "<|endoftext|>"

Note that in gpt-oss, the stop token indicating the end of a sequence (EOS) is set to <|return|>.

17.4 Notes on Handling Reserved Tokens

Because their meaning is not guaranteed in the public specification, manually composing reserved tokens within a prompt is not permitted.

Example of a discouraged input:

<|start|>user<|message|>Hello<|reserved_200015|><|end|>

18. Additional Format Tokens Dependent on the openai/harmony Implementation

The FormattingToken enum of the openai/harmony library defines the following tokens, which are not included in the special-token list of the previous chapter. Also, MetaSep is mapped to <|channel|>.

<|refusal|>
<|untrusted|>
<|end_untrusted|>
<|meta_end|>

18.1 Handling of the Additional Format Tokens

These tokens exist as an internal implementation of the library, but they are not included in the tokenizer configuration (added special tokens) of the published gpt-oss-20b. Therefore, whether they function correctly as ordinary prompt input to the published model is not guaranteed.

18.2 Example of a Discouraged Input

Manually composing a string containing the following token and using it as input to the published model is discouraged.

<|start|>assistant<|channel|>final<|refusal|><|message|>...

19. Concrete Examples of Message Syntax by Role

Concrete examples of the definition and message syntax for each role are shown below.

Role	Definition / Purpose	Concrete Example
system	Definition of meta-information	<\|start\|>system<\|message\|>You are ChatGPT...<\|end\|>
developer	Instructions from the developer	<\|start\|>developer<\|message\|># Instructions\n\nAlways answer in Japanese<\|end\|>
user	Input from the end user	<\|start\|>user<\|message\|>What's the weather in Tokyo?<\|end\|>
assistant	Output from the model	<\|start\|>assistant<\|channel\|>final<\|message\|>It's sunny.<\|return\|>
tool	Returning the tool execution result	<\|start\|>functions.get_current_weather to=assistant<\|channel\|>commentary<\|message\|>{"sunny":true}<\|end\|>

19.1 Serialization Specification of the tool Role

The abstract role name tool is, on the serialized wire format, replaced by the specific name of the tool that was actually called.

Serialization example:

<|start|>functions.search_docs to=assistant<|channel|>commentary<|message|>{"results":[...]}<|end|>

19.2 Points to Note

This specification merely defines the "notation rules on the prompt" in Harmony; this notation alone does not fully guarantee strict output constraints (enforcement of the format). When strict structured output is a requirement, it is recommended to implement it in combination with the grammar constraints, Constrained Decoding, or Structured Output mechanisms provided by the inference-serving side.

20. Message Composition Examples for Each Channel

The composition and concrete examples of messages for each channel are shown below.

Channel	Definition / Purpose	Concrete Example
analysis	Internal reasoning (hidden from the user)	<\|start\|>assistant<\|channel\|>analysis<\|message\|>Need to compute exactly.<\|end\|>
commentary	Intermediate explanation of the process, or a function call (can be shown to the user)	<\|start\|>assistant<\|channel\|>commentary<\|message\|>I'll search first.<\|end\|>
final	The final answer	<\|start\|>assistant<\|channel\|>final<\|message\|>The conclusion is 55.<\|return\|>

20.1 Notes on the commentary Channel

The commentary channel, although not the final answer, can be used to present progress to the user. It is used for purposes such as emitting a plan of action before executing multiple tools in succession.

Output example:

<|start|>assistant<|channel|>commentary<|message|>Work plan:
1. Search the official site
2. Open the PDF
3. Summarize the key points
I'll proceed in order.<|end|>

21. Concrete Examples of Conversation Sequences

Below are the message send/receive sequences for various use cases.

21.1 Basic Response and Reasoning Process

Input message:

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2026-04-04

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|>
<|start|>user<|message|>What is 2 + 2?<|end|>
<|start|>assistant

Model output:

<|channel|>analysis<|message|>The user is asking a simple addition. Just answer directly.<|end|>
<|start|>assistant<|channel|>final<|message|>2 + 2 = 4.<|return|>

Normalization of the history data for the next turn:

<|start|>assistant<|channel|>final<|message|>2 + 2 = 4.<|end|>

21.2 User-defined Function (Function Call) and Reasoning Process

Tool definition (placed in the developer role):

<|start|>developer<|message|># Instructions

Please answer weather questions concisely.

# Tools

## functions

namespace functions {

// Returns the current weather for the specified city
type get_current_weather = (_: {
// City name
location: string,
}) => any;

} // namespace functions<|end|>

User input:

<|start|>user<|message|>What's the weather in San Francisco?<|end|>
<|start|>assistant

First model output (tool call):

<|channel|>analysis<|message|>Need to use function get_current_weather.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>

Returning the tool execution result:

<|start|>functions.get_current_weather to=assistant<|channel|>commentary<|message|>{"sunny":true,"temperature":20,"unit":"celsius"}<|end|>
<|start|>assistant

Second model output (final answer):

<|channel|>analysis<|message|>Tool says sunny and 20C. Provide concise final answer.<|end|>
<|start|>assistant<|channel|>final<|message|>San Francisco is sunny, 20°C.<|return|>

Specification notes:
- The output of the analysis channel is hidden from the user.
- The tool call stops at the <|call|> token.
- After returning the execution result, <|start|>assistant is appended at the end to resume generation.

21.3 Using a Built-in Tool (browser)

Requirement: retrieve the Bank of Japan's latest policy rate from the official site and answer.

Example execution sequence:

<|start|>assistant<|channel|>analysis<|message|>Need an official and recent source from the Bank of Japan website.<|end|>
<|start|>assistant to=browser.search<|channel|>analysis <|constrain|>json<|message|>{"query":"site:boj.or.jp policy interest rate","topn":5,"source":"web"}<|call|>
<|start|>browser.search to=assistant<|channel|>analysis<|message|>[12] Bank of Japan - Monetary Policy Meetings<|end|>
<|start|>assistant to=browser.open<|channel|>analysis <|constrain|>json<|message|>{"cursor":12,"id":0,"loc":120,"num_lines":10,"source":"web"}<|call|>
<|start|>browser.open to=assistant<|channel|>analysis<|message|>[13]
L120: ...
L121: The uncollateralized overnight call rate ...
L122: ...
<|end|>
<|start|>assistant<|channel|>final<|message|>According to the Bank of Japan's official material, the target for the uncollateralized overnight call rate is stated as 0.5%. 〖13†L121-L122〗<|return|>

Specification notes:
- The browser tool is executed on the analysis channel as a rule.
- For citations in the final answer, use the line-number format (〖cursor†Lx-Ly〗).
- Direct string quotations from tool output are limited to 10 words or fewer due to template constraints.

21.4 Using a Built-in Tool (python)

Requirement: exactly compute the sum from 1 squared to 5 squared.

Example execution sequence:

<|start|>assistant<|channel|>analysis<|message|>Need exact calculation; using python is simplest.<|end|>
<|start|>assistant to=python<|channel|>analysis<|message|>sum(i*i for i in range(1, 6))<|call|>
<|start|>python to=assistant<|channel|>analysis<|message|>55<|end|>
<|start|>assistant<|channel|>final<|message|>The sum from 1^2 to 5^2 is 55.<|return|>

Specification notes:
- The execution of Python code itself is visible in the format, but because it uses the analysis channel, the design principle is not to display it in the user UI.

21.5 Advance Notice Before Executing Multiple Tools (Commentary Preamble)

Example execution sequence:

<|start|>assistant<|channel|>analysis<|message|>Need a multi-step plan involving file generation and server startup.<|end|>
<|start|>assistant<|channel|>commentary<|message|>Work plan:
1. Generate the HTML file
2. Generate the Node.js server code
3. Confirm startup
---
I'll execute them in order.<|end|>
<|start|>assistant to=functions.generate_file<|channel|>commentary <|constrain|>json<|message|>{"template":"basic_html","path":"index.html"}<|call|>

Specification notes:
- commentary functions as an intermediate report to the user about the most recent processing.
- final is positioned as the final deliverable after all processing is complete.

21.6 Standalone Return of Structured Output

developer configuration example:

<|start|>developer<|message|># Instructions

Please return only the shopping list.

# Response Formats

## shopping_list

{"type":"object","properties":{"items":{"type":"array","items":{"type":"string"}}},"required":["items"]}<|end|>

User input:

<|start|>user<|message|>I need coffee, sparkling water, and eggs.<|end|>
<|start|>assistant

Output example:

<|start|>assistant<|channel|>final<|message|>{"items":["coffee","sparkling water","eggs"]}<|return|>

22. Implementation Guidelines for Each Environment

22.1 Using the openai-harmony Library

When using the Python API, implement the processing with the following conceptual model.

from openai_harmony import (
    HarmonyEncodingName,
    load_harmony_encoding,
    Conversation,
    Message,
    Role,
    SystemContent,
    DeveloperContent,
)

encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)

convo = Conversation.from_messages([
    Message.from_role_and_content(Role.SYSTEM, SystemContent.new()),
    Message.from_role_and_content(
        Role.DEVELOPER,
        DeveloperContent.new().with_instructions("Always answer in Japanese")
    ),
    Message.from_role_and_content(Role.USER, "What is 2 + 2?"),
])

prefill_ids = encoding.render_conversation_for_completion(convo, Role.ASSISTANT)
stop_ids = encoding.stop_tokens_for_assistant_actions()

Key APIs:
- render_conversation_for_completion(...)
- render_conversation_for_training(...)
- render_conversation(...)
- parse_messages_from_completion_tokens(...)
- stop_tokens()
- stop_tokens_for_assistant_actions()
- StreamableParser(...)
RenderConversationConfig settings:
- By setting the auto_drop_analysis=True parameter, unnecessary analysis data can be automatically excluded during history rendering.
- This feature prevents unnecessary reasoning processes from accumulating in the conversation history.

22.2 Using Hugging Face Transformers

The official chat template for gpt-oss automatically applies the Harmony format to the input.

Implementation example:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "openai/gpt-oss-20b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "Always answer in Japanese."},
    {"role": "user", "content": "What is MXFP4?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to(model.device)

Notes on implementation:
- A system role defined at the head is automatically treated as a Harmony developer role.
- The proper Harmony system message is automatically generated by the template.
- Specifying add_generation_prompt=True appends <|start|>assistant, the generation-start trigger, at the end.
- The template provides arguments such as builtin_tools, model_identity, and reasoning_effort, and the default value of reasoning_effort is medium.

22.3 Using vLLM

In vLLM's GPT-OSS recipe, the extent of Harmony support differs depending on the endpoint used.

/v1/responses:
- Supports tool execution in the middle of reasoning (Chain-of-Thought).
- Can process the entire flow up to the final response.
- The openai-harmony library is used to render the input and parse the output.
- Stateful operation and full streaming processing are currently work in progress (WIP).
/v1/chat/completions:
- Provides a standard chat completion interface, but does not perform tool execution.
- Can return the reasoning data and the final text in a structured form.
- Specifying include_reasoning: false excludes the reasoning process from the response.
/v1/completions:
- Supports only plain input/output; the advanced features of Harmony must be handled on the client side.
Startup arguments to enable Function Calling:

vllm serve ... --tool-call-parser openai --enable-auto-tool-choice

22.4 Using Ollama

Ollama also provides a gpt-oss template that includes browser and python. In terms of the system prompt structure, the Harmony structure is implemented with an approach similar to Hugging Face's template, and it is positioned as one of the implementations that process the Harmony format in the background.

23. Implementation Requirements for Manual Construction

Write application-specific instructions in the developer role
Do not include all instructions in the system role.
Do not retain the trailing <|return|> of the final channel in the history
When retaining it as history for the next turn, normalize the terminating token to <|end|>.
Do not display the output of the analysis channel to the end user
By the design of the Harmony format, analysis is treated as explicitly separated internal reasoning.
Stop tool calls at <|call|>
After that, return the tool execution result to the model as a tool message.
Do not manually insert reserved tokens
Even if a definition exists in the tokenizer, its meaning is not guaranteed as a public specification.
Use the standard template or renderer rather than a custom implementation
To avoid the risk of the parser erroring due to minor syntax differences.
Conform to the behavior of the Hugging Face (HF) template
- A system message is automatically converted to a Harmony developer message.
- It assumes one tool call per assistant message.
- After generating the final response, it discards the preceding CoT (Chain of Thought) history.
Be aware of the difference between <|constrain|>json and plain-text json
The syntax examples differ between the official Cookbook and the HF template, but the library implementation accepts both forms.

24. Recommended Usage Policy

24.1 Safe Architectural Options

When using Transformers
Use tokenizer.apply_chat_template(...).
When implementing custom generation processing
Apply the renderer and parser of the openai-harmony library.
When building an inference server
Choose middleware that natively supports the Harmony format, such as vLLM, Ollama, or an official compatibility layer.

24.2 Use Cases Where Manual Prompt Construction Is Assumed

Verifying the behavior of parsers and renderers
Academic or technical research purposes
Development of compatible implementations
Debugging work at the tokenizer or protocol level

In general application development, understanding the Harmony format specification is essential, but manual construction of prompts is not recommended.

25. Summary of Key Specifications

Harmony is a format designed for gpt-oss models to handle "conversation," "reasoning," "tool use," and "structured output" in a unified way.
The system role is used to define meta-information, and application-specific instructions are defined in the developer role.
The output of the assistant role is separated into the three channels analysis, commentary, and final.
The terminating token for a tool call is <|call|>, and the terminating token for a final answer is <|return|>.
When retaining as history for the next turn, the terminating token must be replaced (normalized) from <|return|> to <|end|>.
User-defined function tools are normally called from the commentary channel, and the built-in browser and python tools are normally called from the analysis channel.
The HF chat template reinterprets the system message at the head of the input as the Harmony developer role.
The tokenizer defines 21 added special tokens, but the majority are treated as reserved tokens.
The openai/harmony implementation includes additional format tokens that do not exist in the public tokenizer, but avoid manually entering these for gpt-oss.
For implementation, the standard is to use the openai-harmony library or the official chat template.

26. References

OpenAI Cookbook / OpenAI Harmony Response Format
https://developers.openai.com/cookbook/articles/openai-harmony
openai/harmony
https://github.com/openai/harmony
openai/harmony Python API docs
https://github.com/openai/harmony/blob/main/docs/python.md
openai/harmony src/encoding.rs
https://github.com/openai/harmony/blob/main/src/encoding.rs
openai/gpt-oss
https://github.com/openai/gpt-oss
OpenAI Cookbook / How to run gpt-oss with Hugging Face Transformers
https://github.com/openai/openai-cookbook/blob/main/articles/gpt-oss/run-transformers.md
Hugging Face openai/gpt-oss-20b chat_template.jinja
https://huggingface.co/openai/gpt-oss-20b/blob/main/chat_template.jinja
Hugging Face openai/gpt-oss-20b tokenizer_config.json
https://huggingface.co/openai/gpt-oss-20b/blob/main/tokenizer_config.json
vLLM Recipes / GPT OSS
https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
Ollama gpt-oss template blob
https://ollama.com/library/gpt-oss%3A20b/blobs/51468a0fd901

Last Modified: April 4, 2026