Skip to main content
The Chat Completions endpoint (POST /v1/chat/completions) is the most widely used interface in the OpenAI ecosystem. Through Fhddos, you can direct the same request to OpenAI, Claude, Gemini, or other supported models without changing your application code. This page covers basic usage, streaming, tool calling, JSON mode, and Fhddos’s extended billing fields.

Endpoint

POST /v1/chat/completions

Basic request

curl -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7
  }'
Fhddos automatically normalises common field differences across third-party-compatible implementations, so the same request body works correctly when you switch models.

Common features

Streaming

Set stream: true in the request body to receive incremental tokens over a server-sent event (SSE) stream. In curl, add the -N flag to disable buffering. In Python, pass stream=True and iterate over the chunks:
curl -N -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Tell me a joke."}],
    "stream": true
  }'

Tool calling

Describe callable functions in the tools array and set tool_choice to control when the model invokes them. Parse tool_calls from the response to execute your business logic and return the result in a follow-up message.
cURL
curl -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "What is the weather in Tokyo?"}],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a city.",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

JSON mode

Set response_format to {"type": "json_schema"} to instruct the model to return strictly structured JSON. Combine this with a detailed system prompt that defines the expected schema.
cURL
curl -X POST "$BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "Return a JSON object with keys: name, capital, population."
      },
      {"role": "user", "content": "Tell me about France."}
    ],
    "response_format": {"type": "json_schema"}
  }'

Fhddos extended fields

Fhddos enriches the standard OpenAI response body with additional billing detail inside usage.prompt_tokens_details. Use these fields to reconcile costs in real time without waiting for a separate billing report.

usage.prompt_tokens_details

FieldDescriptionSupported models
cached_tokensInput tokens that hit the cacheAll cache-capable models
cached_write_tokensTotal cache write volume (= 5 min + 1 hour)Claude only
cached_write_5m_tokensTokens written to the 5-minute cache tierClaude only
cached_write_1h_tokensTokens written to the 1-hour cache tierClaude only
prompt_tokens is the total input token count and includes both cache-hit tokens and cache-write tokens. The cached_write_* fields use omitempty — models without cache-write support (such as GPT) omit them entirely.
{
  "usage": {
    "prompt_tokens": 58518,
    "prompt_tokens_details": {
      "cached_tokens": 11944,
      "cached_write_tokens": 46109,
      "cached_write_5m_tokens": 29762,
      "cached_write_1h_tokens": 16347
    }
  }
}
Multiply cached_write_5m_tokens and cached_write_1h_tokens by their respective per-token prices from the Fhddos console to estimate the cache-write cost of a single request instantly.

Request parameters

model
string
required
The model ID to use, e.g. gpt-4o-mini, gpt-4o, or any model returned by GET /v1/models.
messages
array
required
An array of message objects. Each object must include role (system, user, or assistant) and content.
temperature
number
Sampling temperature between 0 and 2. Higher values produce more creative output; lower values produce more deterministic output. Defaults to 1.
stream
boolean
Set to true to stream partial message deltas via SSE. Defaults to false.
tools
array
A list of tool definitions the model may call. Each entry must include type: "function" and a function object with name, description, and parameters.
tool_choice
string | object
Controls tool invocation. Pass "auto" to let the model decide, "none" to suppress tools, or {"type": "function", "function": {"name": "…"}} to force a specific function.
response_format
object
Pass {"type": "json_schema"} to enable structured JSON output mode.