Chat Completions API — Streaming, Tools & JSON Mode

The Chat Completions endpoint (POST /v1/chat/completions) is the most widely used interface in the OpenAI ecosystem. Through Fhddos, you can direct the same request to OpenAI, Claude, Gemini, or other supported models without changing your application code. This page covers basic usage, streaming, tool calling, JSON mode, and Fhddos’s extended billing fields.

Endpoint

POST /v1/chat/completions

Basic request

curl -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7
  }'

Fhddos automatically normalises common field differences across third-party-compatible implementations, so the same request body works correctly when you switch models.

Common features

Streaming

Set stream: true in the request body to receive incremental tokens over a server-sent event (SSE) stream. In curl, add the -N flag to disable buffering. In Python, pass stream=True and iterate over the chunks:

curl -N -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Tell me a joke."}],
    "stream": true
  }'

Tool calling

Describe callable functions in the tools array and set tool_choice to control when the model invokes them. Parse tool_calls from the response to execute your business logic and return the result in a follow-up message.

cURL

curl -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "What is the weather in Tokyo?"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a city.",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

JSON mode

Set response_format to {"type": "json_schema"} to instruct the model to return strictly structured JSON. Combine this with a detailed system prompt that defines the expected schema.

cURL

curl -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "Return a JSON object with keys: name, capital, population."
      },
      {"role": "user", "content": "Tell me about France."}
    ],
    "response_format": {"type": "json_schema"}
  }'

Fhddos extended fields

Fhddos enriches the standard OpenAI response body with additional billing detail inside usage.prompt_tokens_details. Use these fields to reconcile costs in real time without waiting for a separate billing report.

`usage.prompt_tokens_details`

Field	Description	Supported models
`cached_tokens`	Input tokens that hit the cache	All cache-capable models
`cached_write_tokens`	Total cache write volume (= 5 min + 1 hour)	Claude only
`cached_write_5m_tokens`	Tokens written to the 5-minute cache tier	Claude only
`cached_write_1h_tokens`	Tokens written to the 1-hour cache tier	Claude only

prompt_tokens is the total input token count and includes both cache-hit tokens and cache-write tokens. The cached_write_* fields use omitempty — models without cache-write support (such as GPT) omit them entirely.

{
  "usage": {
    "prompt_tokens": 58518,
    "prompt_tokens_details": {
      "cached_tokens": 11944,
      "cached_write_tokens": 46109,
      "cached_write_5m_tokens": 29762,
      "cached_write_1h_tokens": 16347
    }
  }
}

Multiply cached_write_5m_tokens and cached_write_1h_tokens by their respective per-token prices from the Fhddos console to estimate the cache-write cost of a single request instantly.

Request parameters

model

string

required

The model ID to use, e.g. gpt-4o-mini, gpt-4o, or any model returned by GET /v1/models.

messages

array

required

An array of message objects. Each object must include role (system, user, or assistant) and content.

temperature

number

Sampling temperature between 0 and 2. Higher values produce more creative output; lower values produce more deterministic output. Defaults to 1.

stream

boolean

Set to true to stream partial message deltas via SSE. Defaults to false.

tools

array

A list of tool definitions the model may call. Each entry must include type: "function" and a function object with name, description, and parameters.

tool_choice

string | object

Controls tool invocation. Pass "auto" to let the model decide, "none" to suppress tools, or {"type": "function", "function": {"name": "…"}} to force a specific function.

response_format

object

Pass {"type": "json_schema"} to enable structured JSON output mode.

​Endpoint

​Basic request

​Common features

​Streaming

​Tool calling

​JSON mode

​Fhddos extended fields

​usage.prompt_tokens_details

​Request parameters

Endpoint

Basic request

Common features

Streaming

Tool calling

JSON mode

Fhddos extended fields

`usage.prompt_tokens_details`

Request parameters