Reasoning representations and chat template specifications in major LLMs
chat_template.jinja files, tokenizer_config.json files, encoder implementations, model cards, and official API documentation.
<think>...</think> or [THINK]...[/THINK], or the analysis channel in the Harmony architecture.
Overview of the target models
- gpt-oss-120b
- LLM-jp-4-thinking
- Gemma 4
- DeepSeek-V3.2
- Qwen3.5
- Kimi K2.5
- Phi-4-reasoning
- GLM-5
- Mistral 3 family (using
Ministral-3-14B-Reasoning-2512as the representative public template)
1. Architectural premise: separating the API Surface and the Prompt Surface
1.1 API Surface
{
"messages": [
{"role": "user", "content": "Look up and summarize tomorrow's weather in Tokyo"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Returns the weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}
]
}
1.2 Prompt Surface
<think>...</think>family: GLM, Kimi, Qwen, DeepSeek, Phi<|think|>orthoughtchannel family: Gemma 4[THINK]...[/THINK]family: Ministral Reasoning- Harmony channels (
analysis/commentary/final) family: gpt-oss, LLM-jp-4-thinking
reasoning_effort or enable_thinking).
2. Standard input payload for evaluation
2.1 Message input
[
{"role": "user", "content": "Look up and summarize tomorrow's weather in Tokyo"}
]
2.2 Available tool definition
[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Takes a city name and returns weather information",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}
]
3. Per-model specification summary
- gpt-oss-120b
- Public artifact:
openai/gpt-oss-120b/chat_template.jinja - Reasoning representation:
analysischannel - Control parameter:
reasoning_effort=low|medium|high - Tool-call representation: Harmony call syntax
- Public artifact:
- LLM-jp-4-thinking
- Public artifact:
llm-jp/llm-jp-4-8b-thinking/chat_template.jinja - Reasoning representation: Harmony
analysischannel - Control parameter:
reasoning_effort=low|medium|high - Tool-call representation: Harmony call syntax
- Public artifact:
- Gemma 4
- Public artifact:
google/gemma-4-E2B-it/chat_template.jinja - Reasoning representation:
<|think|>/ thought channel - Control parameter:
enable_thinking=True|False - Tool-call representation:
<|tool>,<|tool_call>,<|tool_response>
- Public artifact:
- DeepSeek-V3.2
- Public artifact:
deepseek-ai/DeepSeek-V3.2/encoding/encoding_dsv32.py - Reasoning representation:
<think>...</think> - Control parameter:
thinking_mode="thinking"|"chat",drop_thinking - Tool-call representation: DSML syntax (
<|DSML|function_calls>)
- Public artifact:
- Qwen3.5
- Public artifact:
Qwen/Qwen3.5-35B-A3B/tokenizer_config.json - Reasoning representation:
<think>...</think> - Control parameter:
enable_thinking=False - Tool-call representation:
<tool_call><function=...><parameter=...>
- Public artifact:
- Kimi K2.5
- Public artifact:
moonshotai/Kimi-K2.5/chat_template.jinja - Reasoning representation:
<think>...</think> - Control parameter: thinking on/off (default is on)
- Tool-call representation:
tool_declare,<|tool_calls_section_begin|>family
- Public artifact:
- Phi-4-reasoning
- Public artifact:
microsoft/Phi-4-reasoning/tokenizer_config.json - Reasoning representation:
<think>...</think>(fixed system instruction) - Control parameter: none applicable
- Tool-call representation: no tool-call syntax
- Public artifact:
- GLM-5
- Public artifact:
zai-org/GLM-5-FP8/chat_template.jinja - Reasoning representation:
<think>...</think> - Control parameter:
enable_thinking/thinking.type=disabled/clear_thinking - Tool-call representation:
<tools>...</tools>,<tool_call>...
- Public artifact:
- Mistral 3 family
- Public artifact:
mistralai/Ministral-3-14B-Reasoning-2512/chat_template.jinja - Reasoning representation:
[THINK]...[/THINK] - Control parameter: the public template is fixed; the API side supports
reasoning_effort - Tool-call representation:
[AVAILABLE_TOOLS],[TOOL_CALLS],[TOOL_RESULTS]
- Public artifact:
4. gpt-oss-120b specification details
4.1 Reference artifacts
- Hugging Face:
openai/gpt-oss-120b/chat_template.jinja - GitHub:
openai/harmonyREADME
<think> tag for reasoning blocks; instead it adopts the analysis channel based on the Harmony protocol.
4.2 API specification
builtin_tools(e.g.,["browser", "python"])model_identityreasoning_effort("low","medium", or"high")
tokenizer.apply_chat_template(
messages,
tools=tools,
builtin_tools=["browser"],
reasoning_effort="high",
)
4.3 Prompt structure
<|start|>system<|message|>
You are ChatGPT...
Knowledge cutoff: 2024-06
Current date: 2026-04-05
Reasoning: high
# Valid channels: analysis, commentary, final.
...
<|end|>
<|start|>developer<|message|>
# Tools
## functions
namespace functions {
type get_weather = (_: { city: string }) => any;
}
<|end|>
<|start|>user<|message|>
Look up and summarize tomorrow's weather in Tokyo
<|end|>
<|start|>assistant<|channel|>analysis<|message|>
Need a weather lookup.
<|end|>
<|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>
{"city":"Tokyo"}
<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>
"{\"weather\":\"sunny\"}"
<|end|>
<|start|>assistant<|channel|>final<|message|> Tomorrow Tokyo will be sunny. ... <|end|>
4.4 Reasoning representation specification
- Reasoning: uses the
analysischannel of the assistant message. - Tool integration: uses the
commentarychannel. - Final answer: uses the
finalchannel.
4.5 Reasoning effort and mode specification
"low", "medium", or "high" is inserted directly into the prompt text as a Reasoning: ... directive inside the synthesized system message.
4.6 Tool-call specification
- Tool definition: defined inside the developer message in TypeScript
namespaceform. - Call instruction: uses the form
assistant to=functions.NAME. - Channel setting: specifies
commentary json. - Arguments: constructed as a JSON payload.
- Terminating tag: uses
<|call|>.
4.7 Management of reasoning history
analysis history as raw data.
4.8 Implementation notes
- Integrating this model requires a parser and data model dedicated to the Harmony protocol. A generic
<think>parser cannot be applied. - Tool Calling and Reasoning must not be mixed within the same assistant text block; they must be routed using the appropriate channels and destination attributes.
5. LLM-jp-4-thinking specification details
5.1 Reference artifacts
- Hugging Face:
llm-jp/llm-jp-4-8b-thinking/chat_template.jinja - The corresponding model card
5.2 API specification
builtin_toolsmodel_identityreasoning_effort="low"|"medium"|"high"
tokenizer.apply_chat_template(..., reasoning_effort="medium").
5.3 Prompt structure
<|start|>system<|message|>
You are LLM-jp-4...
Knowledge cutoff: 2025-12
Current date: 2026-04-05
Reasoning: medium
# Valid channels: analysis, commentary, final.
<|end|>
<|start|>developer<|message|>
# Tools
## functions
namespace functions {
type get_weather = (_: { city: string }) => any;
}
<|end|>
<|start|>user<|message|>
Look up and summarize tomorrow's weather in Tokyo
<|end|>
<|start|>assistant<|channel|>analysis<|message|>
First get the weather
<|end|>
<|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>
{"city":"Tokyo"}
<|call|>
<|start|>assistant<|channel|>final<|message|> Tomorrow Tokyo will be sunny. ... <|end|>
5.4 Reasoning representation specification
- The reasoning process is represented not by
<think>tags but as theanalysischannel. - By running the
tokenizer.parse_response(response)method, the reasoning part (thinking) and the final answer (content) can be separated. - The model card explicitly states that a dedicated tokenizer must be used.
5.5 Reasoning effort and mode specification
- Allowed values:
reasoning_effort="low"|"medium"|"high" - At template evaluation time, the corresponding parameter is embedded into the
Reasoning: ...line inside the synthesized system message.
5.6 Tool-call specification
- Definitions using TypeScript-style
namespace functions - Call syntax:
assistant to=functions.NAME - Channel used:
commentary json - Terminating token:
<|call|> - Response format:
functions.NAME to=assistant
5.7 Handling of reasoning history
5.8 Implementation notes
- While it is compatible with the philosophy of the
openai-harmonylibrary, the tokenizer implementation is not completely identical. For safety, it is recommended to use the official tokenizer and parser provided by LLM-jp. - When running the provided Reasoning parser, there are public examples where, depending on the configuration, enabling the
trust_remote_code=Trueoption is required.
5.9 Related resources
6. Gemma 4 specification details
6.1 Reference artifacts
google/gemma-4-E2B-it/chat_template.jinja- Official documentation: Thinking mode in Gemma
6.2 API specification
apply_chat_template method.
tokenizer.apply_chat_template(messages, tools=tools, enable_thinking=True)
6.3 Prompt structure
<|turn>system
<|think|><turn|>
<|tool>
declaration:get_weather{description:..., parameters:{city:string}}
<tool|>
<|turn>user
Look up and summarize tomorrow's weather in Tokyo<turn|>
<|turn>model
thought channel to the model side when reasoning is disabled.
6.4 Reasoning representation specification
- E2B / E4B models: when reasoning is enabled,
<|think|>is inserted at the beginning. - 26B / 31B models: a representation form is documented that opens a
thoughtchannel after themodelturn. - The provided template includes a
strip_thinkingmacro that removes the reasoning part when reconstructing history.
6.5 Reasoning effort and mode specification
enable_thinking=True|False
6.6 Tool-call specification
- Tool definition:
<|tool> ... <tool|> - Tool call:
<|tool_call>call:FUNC{arg:...}<tool_call|> - Tool-execution result:
<|tool_response>response:FUNC{...}<tool_response|>
<|tool_call>call:get_weather{city:<|"|>Tokyo<|"|>}<tool_call|>
<|tool_response>response:get_weather{weather:<|"|>sunny<|"|>}<tool_response|>
6.7 Implementation notes
- Unlike earlier Gemma specifications, Gemma 4's public template has been updated to generate system turns for reasoning and tool calling.
- When implementing, it is strongly recommended to reference the actual
chat_template.jinjafile as the source of truth for the specification.
7. DeepSeek-V3.2 specification details
7.1 Reference artifacts
deepseek-ai/DeepSeek-V3.2/encoding/encoding_dsv32.py
7.2 API specification
encode_messages(
messages,
tools=tools,
thinking_mode="thinking",
drop_thinking=True,
add_default_bos_token=True,
)
thinking_mode: specify"thinking"or"chat".drop_thinking: specifies whether to discard past reasoning history to save context size.
7.3 Prompt structure
<|begin▁of▁sentence|> ## Tools ... <|User|>Look up and summarize tomorrow's weather in Tokyo<|Assistant|><think>
<|DSML|function_calls>
<|DSML|invoke name="get_weather">
<|DSML|parameter name="city" string="true">Tokyo</|DSML|parameter>
</|DSML|invoke>
</|DSML|function_calls>
<function_results>
<result>{"weather":"sunny"}</result>
</function_results>
<think>
7.4 Reasoning representation specification
- The reasoning process is stored in a
<think>...</think>block. - As delimiters between user and assistant,
<|User|>and<|Assistant|>are used respectively. - When the assistant's output completes, the reasoning content, answer content, tool calls, and the EOS token are concatenated.
7.5 Reasoning effort and mode specification
thinking_mode="thinking": enables reasoning mode.thinking_mode="chat": enables a standard conversation mode with reasoning suppressed.drop_thinking=True|False: controls whether to retain or discard past reasoning history.
7.6 Tool-call specification
- The function-call block is defined by
<|DSML|function_calls>. - Each invoke is given a
nameattribute specifying the target tool. - Parameters are defined with
<|DSML|parameter ...>, and for the string type,string="true"is explicitly specified. - List-type or object-type data is embedded in JSON format.
7.7 Implementation notes
- When confirming the prompt-construction specification, you must reference
encoding_dsv32.pyas the source of truth, not a Jinja template. - The client side requires a parser implementation dedicated to DeepSeek-V3.2.
8. Qwen3.5 specification details
8.1 Reference artifacts
Qwen/Qwen3.5-35B-A3B/tokenizer_config.json- The model card and quickstart documentation for
Qwen/Qwen3.5-9B
8.2 API specification
extra_body={"chat_template_kwargs": {"enable_thinking": False}}
8.3 Prompt structure
<|im_start|>system
# Tools
You may call one or more functions...
<tools>
[{"name":"get_weather", ...}]
</tools><|im_end|>
<|im_start|>user
Look up and summarize tomorrow's weather in Tokyo<|im_end|>
<|im_start|>assistant
<think>
<tool_call> <function=get_weather> <parameter=city> Tokyo </parameter> </function> </tool_call>
tool role but as a <tool_response> block wrapped inside a user turn.
<|im_start|>user
<tool_response>
{"weather":"sunny"}
</tool_response><|im_end|>
8.4 Reasoning representation specification
<think>...</think> block. When the reasoning feature is enabled, the assistant's generated output begins with the <think> tag, and after reasoning completes, the final answer is emitted.
8.5 Reasoning effort and mode specification
- Default: reasoning feature enabled
- Disable setting:
enable_thinking=False
8.6 Tool-call specification
- The schema is defined in the
<tools>...</tools>block inside the system prompt. - The assistant emits a
<tool_call>block. - Tool-execution results are returned in a
<tool_response>inside auserturn. - The reasoning block must be placed before the tool call; placing it after the call is not allowed.
8.7 Implementation notes
- The template is designed to automatically compress reasoning history that precedes the last user request.
- In a self-hosting environment, you must incorporate Qwen-specific reasoning parsers and tool-call parsers.
9. Kimi K2.5 specification details
9.1 Reference artifacts
- Hugging Face:
moonshotai/Kimi-K2.5/chat_template.jinja - Official documentation: Kimi API Platform (thinking model guide)
9.2 API specification
extra_body parameter of the OpenAI-compatible API.
extra_body={"thinking": {"type": "disabled"}}
message.reasoning_content and message.content are obtained separately from the response payload.
9.3 Prompt structure
<|im_system|>tool_declare<|im_middle|>
[{"name":"get_weather", ...}]
<|im_end|>
<|im_user|>user<|im_middle|>
Look up and summarize tomorrow's weather in Tokyo
<|im_end|>
<|im_assistant|>assistant<|im_middle|>
<think>
<think>Get the weather first</think>
<|tool_calls_section_begin|>
<|tool_call_begin|get_weather
<|tool_call_argument_begin|>{"city":"Tokyo"}
<|tool_call_end|>
<|tool_calls_section_end|>
tool role, but on the prompt they are formatted as text with the following label.
## Return of call_1
{"weather":"sunny"}
9.4 Reasoning representation specification
- The assistant's reasoning process is stored inside a
<think>...</think>block. - Depending on whether Thinking mode is enabled, generation begins with
<think>if enabled, or with an empty<think></think>at the beginning if disabled. - As a Kimi template specification, a compression logic is incorporated that intentionally discards the assistant's Reasoning from past turns and retains only the most recent Reasoning.
9.5 Reasoning effort and mode specification
- Default state: Thinking mode enabled (on)
- Disable specification:
thinking.type="disabled" - A graduated Effort specification (low/medium/high, etc.) is not supported by the current public template or the official API documentation.
9.6 Tool-call specification
- Tool definition: declared inside a system turn using
tool_declare. - Tool call: uses a dedicated block starting from
<|tool_calls_section_begin|>. - Call arguments: constructed by a sequence of special tokens:
tool_call_begin,tool_call_argument_begin, andtool_call_end. - Tool-execution result: handled as the
toolrole at the API layer, but converted into plain text of the formReturn of <id>inside the prompt.
9.7 Implementation notes
- When using Thinking mode, it is recommended to restore the
reasoning_contentof past turns into the history in a complete state. - Because the template provided by Hugging Face itself has the property of compressing some Reasoning data in the history, take care not to conflate the history-management implementation on the API-client side with the compression policy at the template layer.
10. Phi-4-reasoning specification details
10.1 Reference artifacts
- Hugging Face:
microsoft/Phi-4-reasoning/tokenizer_config.json - Model card
10.2 API specification
messages. As an important point for configuration, a dedicated fixed system instruction is automatically inserted at the beginning when the tokenizer template is processed.
10.3 Prompt structure
<|im_start|>system<|im_sep|>
[Fixed reasoning instruction. Separates Thought and Solution,
prompting to answer in the form <think>{Thought}</think>{Solution}]
<|im_end|>
<|im_start|>user<|im_sep|>
Look up and summarize tomorrow's weather in Tokyo
<|im_end|>
<|im_start|>assistant<|im_sep|>
10.4 Reasoning representation specification
- The Reasoning content is written inside a
<think>...</think>block. - The output format is not something the user specifies as a system message in the API request; it is enforced by the system prompt automatically inserted by the tokenizer template.
<think>and</think>are registered as dedicated tokens in the tokenizer vocabulary.
10.5 Reasoning effort and mode specification
low, medium, high, etc.) or an enable/disable switch. It always exhibits fixed behavior that outputs the "Thought + Solution" form.
10.6 Tool-call specification
- Implement the Tool Calling logic in an external agent layer.
- Position the model itself as a reasoning-only component that generates
<think>...</think>.
10.7 Implementation notes
- Modifying or removing the leading fixed system prompt inserted by the template risks collapsing the model's expected output format.
- When building a parser in a self-hosting environment, the Reasoning parser component of the DeepSeek-R1 family can be repurposed.
11. GLM-5 specification details
11.1 Reference artifacts
- Hugging Face:
zai-org/GLM-5-FP8/chat_template.jinja - Official documentation: GLM-5 overview / Thinking Mode / Function Calling
11.2 API specification
messages, tools, tool_choice, tool_calls, and reasoning_content.
{"thinking": {"type": "disabled"}}
clear_thinking=false.
11.3 Prompt structure
[gMASK]<sop>
<|system|>
# Tools
You may call one or more functions...
<tools>
[{"name":"get_weather", ...}]
</tools>
<|user|>
Look up and summarize tomorrow's weather in Tokyo
<|assistant|>
<think>
<think>First use the weather tool</think> <tool_call> get_weather <arg_key>city</arg_key><arg_value>Tokyo</arg_value> </tool_call>
tool role in the following form.
<|observation|>
<tool_response>{"weather":"sunny"}</tool_response>
11.4 Reasoning representation specification
- The reasoning process is stored inside a
<think>...</think>block. - The template preferentially evaluates the presence of
message.reasoning_content, and if there is no such data, it attempts to extract the<think>...</think>block fromassistant.content. - Depending on whether Thinking mode is enabled, the generation start token is
<|assistant|><think>(when enabled) or<|assistant|></think>(when disabled).
11.5 Reasoning effort and mode specification
- Local environment / template:
enable_thinking=True|False - Hosted API:
thinking.type="disabled" - Multi-turn history retention:
clear_thinking=false
11.6 Tool-call specification
- The API input layer uses the OpenAI-compatible
tools/tool_callsformat. - When converted into the model input prompt, it is reconstructed into
<tools>...</tools>and<tool_call>...</tool_call>blocks. - Tool-execution results are handled as
<tool_response>...</tool_response>blocks.
11.7 Implementation notes
- If you have a requirement to continue Reasoning across multiple turns, explicitly specify
clear_thinking=false. - When building a self-hosting environment, you need to implement dedicated Reasoning parsers and tool parsers specialized for the GLM family.
12. Mistral 3 family specification details (representative model: Ministral-3-14B-Reasoning-2512)
12.1 Reference model and selection criteria
mistralai/Ministral-3-14B-Reasoning-2512 as the reference. The Reasoning feature is also provided on the Hosted API side, but since the internal prompt is not public, verifying the detailed specification requires referencing the open-weight template.
12.2 API specification
- Open-weight model: uses the standard
messagesandtoolsparameters. - Hosted API models:
mistral-small-latest: graduated adjustment via thereasoning_effortparameter is supported.magistral-small-latest/magistral-medium-latest: these are native Reasoning models, and Thinking processing is always performed.
12.3 Prompt structure
[SYSTEM_PROMPT]...[/SYSTEM_PROMPT]
[AVAILABLE_TOOLS][{"name":"get_weather",...}][/AVAILABLE_TOOLS]
[INST]Look up and summarize tomorrow's weather in Tokyo[/INST]
[THINK]First get the weather[/THINK]
[TOOL_CALLS]get_weather[ARGS]{"city":"Tokyo"}
[TOOL_RESULTS]{"weather":"sunny"}[/TOOL_RESULTS]
12.4 Reasoning representation specification
- In the open-weight template, the Reasoning process is stored inside a
[THINK]...[/THINK]block. - The system prompt is structured on the premise that the model outputs Reasoning.
- The assistant content adopts an architecture that splits processing into
thinkingchunks andtextchunks.
12.5 Reasoning effort and mode specification
- Open-weight model: the public
chat_template.jinjahas no dynamic parameters such aslow|medium|high, and it functions as a fixed Reasoning mode. - Hosted API models:
mistral-small-latestallows adjustment viareasoning_effort. Also, native models (magistral-*) output Thinking chunks without any additional parameter specification.
12.6 Tool-call specification
- Definition:
[AVAILABLE_TOOLS]...[/AVAILABLE_TOOLS] - Call:
[TOOL_CALLS]name[ARGS]{json} - Execution result:
[TOOL_RESULTS]...[/TOOL_RESULTS]
12.7 Implementation notes
- Note that the Mistral-family tags are not the common XML-style
<think>, but the bracket-based[THINK]. - Because it is not public how the Hosted API's
reasoning_effortparameter is processed into the internal prompt, when implementing a strict parser it is recommended to conform to the open-weight specification.
13. Cross-comparison: major patterns
13.1 Inline-tag configuration (<think>...</think> type)
- Target models: GLM-5, Kimi K2.5, Qwen3.5, DeepSeek-V3.2, Phi-4-reasoning
- Characteristics:
- The parser is relatively easy to implement.
- However, the Tool Calling grammar adopts each vendor's own specification and is not standardized.
13.2 Dedicated delimiter-token configuration
- Target models: Ministral-3-Reasoning (
[THINK]...[/THINK]), Gemma 4 (<|think|>or thought channel) - Characteristics:
- The delimiter as a Special Token is strictly defined.
- It may be difficult to repurpose a standard
<think>parser.
13.3 Multi-channel / Harmony configuration
- Target models: gpt-oss-120b, LLM-jp-4-thinking
- Characteristics:
- Reasoning process: uses the
analysischannel - Tool calls: use the
commentarychannel - Final answer: uses the
finalchannel
- Reasoning process: uses the
14. Cross-comparison: Reasoning Effort parameter specification
14.1 Prompt-exposed type (multi-level specification)
- Target models: gpt-oss-120b, LLM-jp-4-thinking
- Specification:
- Supports explicit parameter input of
low|medium|high. - The specified value is passed as a template argument (kwargs) and is expanded directly inside the synthesized system message in the form
Reasoning: high. - The parser and serving environment generally assume the Harmony specification.
- Supports explicit parameter input of
14.2 Binary-toggle type (enable / disable)
- Target models: GLM-5, Kimi K2.5, Qwen3.5, Gemma 4
- Specification:
- Controls not the reasoning depth but whether the Reasoning channel itself is output.
- Representative parameter settings use
enable_thinking=Falseorthinking.type=disabled.
14.3 Mode-enumeration type
- Target model: DeepSeek-V3.2
- Specification:
- On the API interface, the mode is selected as an Enum value such as
thinking_mode="thinking"|"chat". - The compression control of reasoning history is managed by an independent parameter (
drop_thinking).
- On the API interface, the mode is selected as an Enum value such as
14.4 Static configuration (fixed type)
- Target models: Phi-4-reasoning, open-weight Ministral-3-Reasoning
- Specification:
- The Reasoning format is statically defined (hard-coded) inside the prompt template.
- A use case of dynamically adjusting the reasoning depth (Effort) from external parameters is not assumed.
15. Cross-comparison: tool-call syntax specification
- XML / pseudo-XML type: defined using XML-like tags such as
<tools>or<tool_call>. The applicable models are GLM-5, Qwen3.5, and DeepSeek-V3.2 (DSML). - Dedicated special-token type: defines the structure using model-specific special system tokens (e.g.,
<|tool|>). The applicable models are Kimi K2.5 and Gemma 4. - Delimiter-string type: delimits blocks using a specific string such as
[AVAILABLE_TOOLS]. The applicable model is Ministral-3-Reasoning. - Protocol / channel type: an advanced configuration that splits messages into logical channels for processing. The applicable models are gpt-oss-120b and LLM-jp-4-thinking.
Note: When implementing a tool-call-capable server, it is not recommended to base it solely on the OpenAI-compatible JSON format (messages,tools,tool_calls). Because the syntax specification actually fed to the model differs greatly from model to model, you need to implement model-specific parsers on the backend side.
16. Implementation best practices and notes
16.1 Limitations on template compatibility
<think> tag in common, but the tool-call syntax, the history-compression algorithm, and the way the generation prompt is constructed all differ.
16.2 Handling of Harmony-family models
<think> tag, they must be implemented as a dedicated, channel-aware protocol stack.
16.3 Separating the reasoning feature from the effort parameter
- Reasoning feature present, no Effort specification: Phi-4, Ministral (open-weight version), DeepSeek (thinking mode)
- Reasoning feature present, binary enable/disable control: GLM, Kimi, Qwen, Gemma
- Reasoning feature present, multi-level Effort specification: gpt-oss, LLM-jp, Mistral API (managed version)
16.4 Specification for re-submitting context history
- Discards the thinking process from history during inference: gpt-oss, LLM-jp, Kimi, Qwen, DeepSeek (when the
drop_thinking=Trueparameter is specified) - Retention via an explicit parameter: GLM (when the
clear_thinking=falseparameter is specified) - Control via a fixed system prompt: Phi-4
17. Appendix: models out of scope for this document
- Llama 4: The existence of Instruct models (such as
meta-llama/Llama-4-Scout-17B-16E-Instruct) is confirmed, but no public information on a Reasoning-specific chat template, Effort specification, or tool syntax could be confirmed. - OLMo 2: Ordinary chat templates (such as
allenai/OLMo-2-1124-13B-Instruct) are provided, but no specification for a Reasoning-specific template or Effort parameter is public.
18. Summary
- Model group with the clearest text-based Reasoning representation: GLM-5, Kimi K2.5, Qwen3.5, DeepSeek-V3.2, Phi-4-reasoning
- Model group that is protocol-oriented and requires a dedicated parser implementation: gpt-oss-120b, LLM-jp-4-thinking
- Model group with its own tool syntax and high parser-implementation difficulty: DeepSeek-V3.2 (DSML), Gemma 4, Kimi K2.5
- Model group with an explicit Reasoning Effort API specification: gpt-oss-120b, LLM-jp-4-thinking, Mistral (managed API version)
19. Related resources
gpt-oss-120b
LLM-jp-4-thinking
Gemma 4
DeepSeek-V3.2
Qwen3.5
Kimi K2.5
Phi-4-reasoning
GLM-5
- Template: https://huggingface.co/zai-org/GLM-5-FP8/blob/main/chat_template.jinja
- Documentation: https://huggingface.co/zai-org/GLM-5-FP8/blob/main/README.md
- API specification: https://docs.z.ai/guides/llm/glm-5
- Thinking Mode: https://docs.z.ai/guides/capabilities/thinking-mode
- Function Calling: https://docs.z.ai/guides/capabilities/function-calling
Mistral 3 family
- Template: https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512/blob/main/chat_template.jinja
- Reasoning specification: https://docs.mistral.ai/capabilities/reasoning
- Native Reasoning: https://docs.mistral.ai/capabilities/reasoning/native
Appendix targets
Last Modified: April 5, 2026