POST /v1/chat/completions) is the most widely used interface in the OpenAI ecosystem. Through Fhddos, you can direct the same request to OpenAI, Claude, Gemini, or other supported models without changing your application code. This page covers basic usage, streaming, tool calling, JSON mode, and Fhddos’s extended billing fields.
Endpoint
Basic request
Fhddos automatically normalises common field differences across third-party-compatible implementations, so the same request body works correctly when you switch models.
Common features
Streaming
Setstream: true in the request body to receive incremental tokens over a server-sent event (SSE) stream. In curl, add the -N flag to disable buffering. In Python, pass stream=True and iterate over the chunks:
Tool calling
Describe callable functions in thetools array and set tool_choice to control when the model invokes them. Parse tool_calls from the response to execute your business logic and return the result in a follow-up message.
cURL
JSON mode
Setresponse_format to {"type": "json_schema"} to instruct the model to return strictly structured JSON. Combine this with a detailed system prompt that defines the expected schema.
cURL
Fhddos extended fields
Fhddos enriches the standard OpenAI response body with additional billing detail insideusage.prompt_tokens_details. Use these fields to reconcile costs in real time without waiting for a separate billing report.
usage.prompt_tokens_details
| Field | Description | Supported models |
|---|---|---|
cached_tokens | Input tokens that hit the cache | All cache-capable models |
cached_write_tokens | Total cache write volume (= 5 min + 1 hour) | Claude only |
cached_write_5m_tokens | Tokens written to the 5-minute cache tier | Claude only |
cached_write_1h_tokens | Tokens written to the 1-hour cache tier | Claude only |
prompt_tokens is the total input token count and includes both cache-hit tokens and cache-write tokens. The cached_write_* fields use omitempty — models without cache-write support (such as GPT) omit them entirely.
Request parameters
The model ID to use, e.g.
gpt-4o-mini, gpt-4o, or any model returned by GET /v1/models.An array of message objects. Each object must include
role (system, user, or assistant) and content.Sampling temperature between
0 and 2. Higher values produce more creative output; lower values produce more deterministic output. Defaults to 1.Set to
true to stream partial message deltas via SSE. Defaults to false.A list of tool definitions the model may call. Each entry must include
type: "function" and a function object with name, description, and parameters.Controls tool invocation. Pass
"auto" to let the model decide, "none" to suppress tools, or {"type": "function", "function": {"name": "…"}} to force a specific function.Pass
{"type": "json_schema"} to enable structured JSON output mode.