Text-to-Speech — POST /v1/audio/speech
Send a text string and receive a binary audio stream. The response is a raw audio file you can save directly to disk or pipe into a player.Parameters
| Parameter | Type | Required | Description |
|---|
model | string | ✅ | TTS model to use — e.g. tts-1, tts-1-hd, gpt-4o-mini-tts, gpt-4o-audio-preview |
input | string | ✅ | The text to synthesise into speech |
voice | string | ✅ | Voice ID — compatible with all available OpenAI voice options (e.g. alloy, echo, fable, onyx, nova, shimmer) |
response_format | string | | Output audio format. Examples: mp3, opus, aac, pcm, wav, or format strings like mp3-1-32000-128000 |
speed | number | | Playback speed multiplier from 0.25 to 4.0. Default 1.0 |
Examples
curl -X POST "https://aiapi.fhddos.com/v1/audio/speech" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"voice": "alloy",
"input": "Welcome to Fhddos — your unified AI model gateway."
}' \
--output speech.mp3
High-Definition TTS
For content where audio quality matters — podcasts, voice-overs, or customer-facing audio — use tts-1-hd:curl -X POST "https://aiapi.fhddos.com/v1/audio/speech" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1-hd",
"voice": "nova",
"input": "This is a high-definition voice-over for a product video.",
"speed": 0.9
}' \
--output voiceover-hd.mp3
tts-1 is optimised for low latency and suits real-time use cases. tts-1-hd produces higher-quality audio at slightly higher cost and latency, making it better suited for pre-rendered content.