Fhddos proxies Volcengine OpenSpeech text-to-speech (TTS) requests through the /volcark/openspeech/* path. You call the endpoint with your Fhddos API key, and the platform injects the upstream VolcArk TTS credentials automatically — your Fhddos key never reaches Volcengine’s logs. Request and response bodies are identical to the official Volcengine OpenSpeech documentation.
export BASE_URL="https://aiapi.fhddos.com"
export TOKEN="oh-xxxxxxxxxxxxxxxx"
Authorization: Bearer <TOKEN>
Content-Type: application/json
All TTS endpoints require a channel_id query parameter pointing to an enabled VolcArk channel. Your administrator must configure the channel and optionally set the custom_parameter.tts credentials within it.
Supported Endpoints
HTTP Interfaces
| Version | Mode | Path |
|---|
| V1 | Non-streaming (full audio at once) | POST /volcark/openspeech/api/v1/tts?channel_id=<channel_id> |
| V3 | HTTP unidirectional streaming | POST /volcark/openspeech/api/v3/tts/unidirectional?channel_id=<channel_id> |
| V3 | Long-text async: submit | POST /volcark/openspeech/api/v3/tts/submit?channel_id=<channel_id> |
| V3 | Long-text async: query | POST /volcark/openspeech/api/v3/tts/query?channel_id=<channel_id> |
WebSocket Interfaces
| Version | Mode | Path |
|---|
| V1 | Binary unidirectional stream | GET /volcark/openspeech/api/v1/tts/ws_binary?channel_id=<channel_id> |
| V3 | Unidirectional stream | GET /volcark/openspeech/api/v3/tts/unidirectional/stream?channel_id=<channel_id> |
| V3 | Bidirectional stream | GET /volcark/openspeech/api/v3/tts/bidirection?channel_id=<channel_id> |
Credential Injection
Your administrator configures TTS credentials in the VolcArk channel’s custom_parameter.tts field:
{
"tts": {
"v1": {
"token": "<v1_access_token>"
},
"v3": {
"app_id": "<X-Api-App-Id>",
"access_key": "<X-Api-Access-Key>",
"resource_id": "seed-tts-1.1"
}
}
}
- V1 token: If
Authorization is absent from your request, Fhddos auto-sets Authorization: Bearer;<token> on the upstream call.
- V3 credentials: If
X-Api-App-Id, X-Api-Access-Key, or X-Api-Resource-Id are absent, Fhddos injects them from the channel config.
If you prefer to pass credentials directly in your request headers (e.g. for testing), Fhddos won’t overwrite headers you’ve already set.
V1 Non-Streaming HTTP
The V1 endpoint synthesizes the full audio in one shot and returns it as a base64-encoded string:
curl -X POST "$BASE_URL/volcark/openspeech/api/v1/tts?channel_id=123" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"app": {
"appid": "appid123",
"token": "any_non_empty_string",
"cluster": "volcano_tts"
},
"user": {
"uid": "uid123"
},
"audio": {
"voice_type": "zh_male_M392_conversation_wvae_bigtts",
"encoding": "mp3",
"speed_ratio": 1.0
},
"request": {
"reqid": "550e8400-e29b-41d4-a716-446655440000",
"text": "Hello from Volcengine TTS",
"operation": "query"
}
}'
The response body follows the official Volcengine format, containing code, message, data (base64 audio), sequence, and addition.
V3 HTTP Unidirectional Streaming
The V3 streaming endpoint delivers audio in multiple JSON chunks over an HTTP stream. Each chunk contains a base64-encoded audio segment:
curl -N "$BASE_URL/volcark/openspeech/api/v3/tts/unidirectional?channel_id=123" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "X-Control-Require-Usage-Tokens-Return: text_words" \
-d '{
"user": {
"uid": "12345"
},
"req_params": {
"text": "Welcome to Fhddos, your unified AI model gateway.",
"speaker": "zh_female_shuangkuaisisi_moon_bigtts",
"audio_params": {
"format": "mp3",
"sample_rate": 24000
}
}
}'
Set X-Control-Require-Usage-Tokens-Return: text_words to receive a usage field in the final chunk that shows the billable character count. Fhddos passes X-Tt-Logid through the response headers to help with debugging.
V3 Long-Text Async Tasks
For long texts, use the two-step submit/query flow.
Step 1: Submit
curl -X POST "$BASE_URL/volcark/openspeech/api/v3/tts/submit?channel_id=123" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"user": {"uid": "12345"},
"unique_id": "5dad8cff-aa5d-496d-a83e-e9c902f4d460",
"req_params": {
"text": "This is a longer text that will be synthesized asynchronously by Volcengine TTS.",
"speaker": "zh_male_bvlazysheep",
"audio_params": {
"format": "mp3",
"sample_rate": 24000
}
}
}'
Response:
{
"code": 20000000,
"data": {
"req_text_length": 11,
"task_id": "e7438a29-ed47-4ef8-98a6-0a10a503d8b0",
"task_status": 1
},
"message": "ok"
}
Step 2: Query
Poll using the task_id returned from submit:
curl -X POST "$BASE_URL/volcark/openspeech/api/v3/tts/query?channel_id=123" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"task_id": "e7438a29-ed47-4ef8-98a6-0a10a503d8b0"
}'
When complete, the response includes:
| Field | Description |
|---|
audio_url | Time-limited signed URL to download the synthesized audio file |
sentences | Sentence-level and character-level timestamps |
req_text_length | Original input character count |
synthesize_text_length | Actual synthesized character count |
task_status | 1 = Running, 2 = Success, 3 = Failure |
Fhddos does not modify any of these fields. Parse them directly using the Volcengine official documentation.
WebSocket Transparent Proxy
For WebSocket-based TTS (V1 binary or V3 streaming), Fhddos operates as a byte-level transparent proxy:
- At connection time, Fhddos uses
channel_id to select the VolcArk channel and injects TTS auth headers.
- During the session, all WebSocket frames are forwarded bidirectionally without parsing or modification.
- If either side disconnects, Fhddos closes the other connection immediately.
To migrate existing Volcengine WebSocket code to Fhddos, replace only the host and path:
V1 Binary
V3 Unidirectional
V3 Bidirectional
# Before (direct Volcengine)
wss://openspeech.bytedance.com/api/v1/tts/ws_binary
# After (via Fhddos)
wss://aiapi.fhddos.com/volcark/openspeech/api/v1/tts/ws_binary?channel_id=<channel_id>
# Before (direct Volcengine)
wss://openspeech.bytedance.com/api/v3/tts/unidirectional/stream
# After (via Fhddos)
wss://aiapi.fhddos.com/volcark/openspeech/api/v3/tts/unidirectional/stream?channel_id=<channel_id>
# Before (direct Volcengine)
wss://openspeech.bytedance.com/api/v3/tts/bidirection
# After (via Fhddos)
wss://aiapi.fhddos.com/volcark/openspeech/api/v3/tts/bidirection?channel_id=<channel_id>
All handshake headers, binary protocol framing (Protocol version, Message type, event, payload), and audio content remain unchanged.
Voice and Audio Configuration
Configure the voice and output format in the audio (V1) or req_params.audio_params (V3) fields of your request:
| Parameter | Description | Example Values |
|---|
voice_type / speaker | Voice ID from the Volcengine voice library | zh_female_shuangkuaisisi_moon_bigtts |
encoding / format | Audio codec | mp3, pcm, ogg_opus |
speed_ratio | Playback speed multiplier | 0.5 – 2.0, default 1.0 |
sample_rate | Output sample rate (Hz) | 8000, 16000, 24000 |
Refer to the Volcengine OpenSpeech documentation for the full list of available voice IDs and parameter constraints.
Security Note
When your request reaches Fhddos, Authorization: Bearer oh-xxxxxxxx is used to authenticate you with Fhddos. Before forwarding to Volcengine, Fhddos strips this header entirely so your Fhddos key never appears in Volcengine’s access logs.