Audio: Speech, Transcription, and Translation API

Fhddos’s /v1/audio endpoints give you three complementary capabilities: convert text to spoken audio with TTS models, transcribe audio files to text with Whisper, and translate foreign-language audio directly into English or another target language. All three endpoints are OpenAI-compatible — point your SDK at https://aiapi.fhddos.com and swap your API key for your Fhddos token.

Text-to-Speech
Transcription
Translation

Text-to-Speech — `POST /v1/audio/speech`

Send a text string and receive a binary audio stream. The response is a raw audio file you can save directly to disk or pipe into a player.

Parameters

Parameter	Type	Required	Description
`model`	string	✅	TTS model to use — e.g. `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`, `gpt-4o-audio-preview`
`input`	string	✅	The text to synthesise into speech
`voice`	string	✅	Voice ID — compatible with all available OpenAI voice options (e.g. `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`)
`response_format`	string		Output audio format. Examples: `mp3`, `opus`, `aac`, `pcm`, `wav`, or format strings like `mp3-1-32000-128000`
`speed`	number		Playback speed multiplier from `0.25` to `4.0`. Default `1.0`

Examples

curl -X POST "https://aiapi.fhddos.com/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "voice": "alloy",
    "input": "Welcome to Fhddos — your unified AI model gateway."
  }' \
  --output speech.mp3

High-Definition TTS

For content where audio quality matters — podcasts, voice-overs, or customer-facing audio — use tts-1-hd:

curl -X POST "https://aiapi.fhddos.com/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "voice": "nova",
    "input": "This is a high-definition voice-over for a product video.",
    "speed": 0.9
  }' \
  --output voiceover-hd.mp3

tts-1 is optimised for low latency and suits real-time use cases. tts-1-hd produces higher-quality audio at slightly higher cost and latency, making it better suited for pre-rendered content.

Transcription — `POST /v1/audio/transcriptions`

Upload an audio file and receive a text transcript. Fhddos routes transcription requests through Whisper, OpenAI’s speech recognition model.The endpoint uses multipart/form-data — you upload the file as a form field alongside your other parameters.

Parameters

Parameter	Type	Required	Description
`file`	file	✅	Audio file to transcribe. Supported formats: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, `webm`
`model`	string	✅	Transcription model — use `whisper-1`
`language`	string		ISO-639-1 language code of the audio (e.g. `en`, `fr`, `zh`). Providing this improves accuracy and speed
`prompt`	string		Optional context or domain-specific vocabulary to guide the model. Use to improve recognition of proper nouns, technical terms, or unusual words
`response_format`	string		Output format: `json` (default), `text`, `srt`, `verbose_json`, or `vtt`
`temperature`	number		Sampling temperature from `0` to `1`. Lower values produce more deterministic output

Examples

curl -X POST "https://aiapi.fhddos.com/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOKEN" \
  -F "model=whisper-1" \
  -F "file=@sample.wav"

Response Formats

Format	Use case
`json`	Default — returns `{"text": "..."}`
`text`	Plain transcript string, no JSON wrapper
`srt`	Subtitle file with timestamps — ideal for video captions
`vtt`	WebVTT subtitle format for HTML5 video
`verbose_json`	Full segment-level detail including timestamps and confidence

Best Practices

Improve accuracy with a prompt

Pass a prompt containing domain vocabulary, proper nouns, or a partial transcript. Whisper uses it as context rather than transcribing it literally.

-F "prompt=Speakers: Alice Chen, Bob Martinez. Topics: Kubernetes, Terraform, CI/CD."

Always specify the language

When you know the audio language, pass language explicitly. This skips Whisper’s language-detection step and reduces latency, especially for short clips.

Pre-process large files

For files over a few minutes, consider compressing (lower bitrate) or splitting into chunks before uploading. Smaller payloads reduce time-to-first-token and improve reliability.

Translation — `POST /v1/audio/translations`

Upload audio in any supported language and receive an English translation as text. The endpoint translates directly from audio — you do not need to transcribe first.Like transcription, this endpoint uses multipart/form-data.

Parameters

Parameter	Type	Required	Description
`file`	file	✅	Audio file to translate. Supports the same formats as transcription
`model`	string	✅	Translation model — use `whisper-1`
`prompt`	string		Optional context to guide the translation. If you have a prior transcription, reusing it as a prompt can improve fluency
`response_format`	string		Output format: `json` (default), `text`, `srt`, or `vtt`
`temperature`	number		Sampling temperature from `0` to `1`

Examples

curl -X POST "https://aiapi.fhddos.com/v1/audio/translations" \
  -H "Authorization: Bearer $TOKEN" \
  -F "model=whisper-1" \
  -F "file=@interview-french.wav"

Output Format Guide

Format	Best for
`json`	Programmatic processing — returns `{"text": "..."}`
`text`	Simple plain-text output with no wrapper
`srt`	Adding translated subtitles to video editing tools
`vtt`	HTML5 `<track>` elements in web video players

Best Practices

Reuse a transcription as a prompt

If you already have a transcription of the audio, pass it as the prompt. Whisper uses it to improve translation consistency and handle domain-specific terms more accurately.

Choose the right output format upfront

Select srt or vtt if your downstream process expects subtitles. Converting from plain text to a timed format later requires a separate step.

Pre-process large files

Down-sample to a lower bitrate or split into segments before uploading large files. This reduces latency and improves stability for long recordings.

Authentication

All /v1/audio endpoints use Bearer token authentication:

-H "Authorization: Bearer $TOKEN"

Replace $TOKEN with your Fhddos API token. You can find it in the API Keys section of the Fhddos console.

​Text-to-Speech — POST /v1/audio/speech

​Parameters

​Examples

​High-Definition TTS

​Transcription — POST /v1/audio/transcriptions

​Parameters

​Examples

​Response Formats

​Best Practices

​Translation — POST /v1/audio/translations

​Parameters

​Examples

​Output Format Guide

​Best Practices

​Authentication

Text-to-Speech — `POST /v1/audio/speech`

Parameters

Examples

High-Definition TTS

Transcription — `POST /v1/audio/transcriptions`

Parameters

Examples

Response Formats

Best Practices

Translation — `POST /v1/audio/translations`

Parameters

Examples

Output Format Guide

Best Practices

Authentication