Volcengine OpenSpeech TTS via Fhddos VolcArk Gateway

Fhddos proxies Volcengine OpenSpeech text-to-speech (TTS) requests through the /volcark/openspeech/* path. You call the endpoint with your Fhddos API key, and the platform injects the upstream VolcArk TTS credentials automatically — your Fhddos key never reaches Volcengine’s logs. Request and response bodies are identical to the official Volcengine OpenSpeech documentation.

export BASE_URL="https://aiapi.fhddos.com"
export TOKEN="oh-xxxxxxxxxxxxxxxx"

Authorization: Bearer <TOKEN>
Content-Type: application/json

All TTS endpoints require a channel_id query parameter pointing to an enabled VolcArk channel. Your administrator must configure the channel and optionally set the custom_parameter.tts credentials within it.

Supported Endpoints

HTTP Interfaces

Version	Mode	Path
V1	Non-streaming (full audio at once)	`POST /volcark/openspeech/api/v1/tts?channel_id=<channel_id>`
V3	HTTP unidirectional streaming	`POST /volcark/openspeech/api/v3/tts/unidirectional?channel_id=<channel_id>`
V3	Long-text async: submit	`POST /volcark/openspeech/api/v3/tts/submit?channel_id=<channel_id>`
V3	Long-text async: query	`POST /volcark/openspeech/api/v3/tts/query?channel_id=<channel_id>`

WebSocket Interfaces

Version	Mode	Path
V1	Binary unidirectional stream	`GET /volcark/openspeech/api/v1/tts/ws_binary?channel_id=<channel_id>`
V3	Unidirectional stream	`GET /volcark/openspeech/api/v3/tts/unidirectional/stream?channel_id=<channel_id>`
V3	Bidirectional stream	`GET /volcark/openspeech/api/v3/tts/bidirection?channel_id=<channel_id>`

Credential Injection

Your administrator configures TTS credentials in the VolcArk channel’s custom_parameter.tts field:

{
  "tts": {
    "v1": {
      "token": "<v1_access_token>"
    },
    "v3": {
      "app_id": "<X-Api-App-Id>",
      "access_key": "<X-Api-Access-Key>",
      "resource_id": "seed-tts-1.1"
    }
  }
}

V1 token: If Authorization is absent from your request, Fhddos auto-sets Authorization: Bearer;<token> on the upstream call.
V3 credentials: If X-Api-App-Id, X-Api-Access-Key, or X-Api-Resource-Id are absent, Fhddos injects them from the channel config.

If you prefer to pass credentials directly in your request headers (e.g. for testing), Fhddos won’t overwrite headers you’ve already set.

V1 Non-Streaming HTTP

The V1 endpoint synthesizes the full audio in one shot and returns it as a base64-encoded string:

curl -X POST "$BASE_URL/volcark/openspeech/api/v1/tts?channel_id=123" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "app": {
      "appid": "appid123",
      "token": "any_non_empty_string",
      "cluster": "volcano_tts"
    },
    "user": {
      "uid": "uid123"
    },
    "audio": {
      "voice_type": "zh_male_M392_conversation_wvae_bigtts",
      "encoding": "mp3",
      "speed_ratio": 1.0
    },
    "request": {
      "reqid": "550e8400-e29b-41d4-a716-446655440000",
      "text": "Hello from Volcengine TTS",
      "operation": "query"
    }
  }'

The response body follows the official Volcengine format, containing code, message, data (base64 audio), sequence, and addition.

V3 HTTP Unidirectional Streaming

The V3 streaming endpoint delivers audio in multiple JSON chunks over an HTTP stream. Each chunk contains a base64-encoded audio segment:

curl -N "$BASE_URL/volcark/openspeech/api/v3/tts/unidirectional?channel_id=123" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Control-Require-Usage-Tokens-Return: text_words" \
  -d '{
    "user": {
      "uid": "12345"
    },
    "req_params": {
      "text": "Welcome to Fhddos, your unified AI model gateway.",
      "speaker": "zh_female_shuangkuaisisi_moon_bigtts",
      "audio_params": {
        "format": "mp3",
        "sample_rate": 24000
      }
    }
  }'

Set X-Control-Require-Usage-Tokens-Return: text_words to receive a usage field in the final chunk that shows the billable character count. Fhddos passes X-Tt-Logid through the response headers to help with debugging.

V3 Long-Text Async Tasks

For long texts, use the two-step submit/query flow.

Step 1: Submit

curl -X POST "$BASE_URL/volcark/openspeech/api/v3/tts/submit?channel_id=123" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "user": {"uid": "12345"},
    "unique_id": "5dad8cff-aa5d-496d-a83e-e9c902f4d460",
    "req_params": {
      "text": "This is a longer text that will be synthesized asynchronously by Volcengine TTS.",
      "speaker": "zh_male_bvlazysheep",
      "audio_params": {
        "format": "mp3",
        "sample_rate": 24000
      }
    }
  }'

Response:

{
  "code": 20000000,
  "data": {
    "req_text_length": 11,
    "task_id": "e7438a29-ed47-4ef8-98a6-0a10a503d8b0",
    "task_status": 1
  },
  "message": "ok"
}

Step 2: Query

Poll using the task_id returned from submit:

curl -X POST "$BASE_URL/volcark/openspeech/api/v3/tts/query?channel_id=123" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "task_id": "e7438a29-ed47-4ef8-98a6-0a10a503d8b0"
  }'

When complete, the response includes:

Field	Description
`audio_url`	Time-limited signed URL to download the synthesized audio file
`sentences`	Sentence-level and character-level timestamps
`req_text_length`	Original input character count
`synthesize_text_length`	Actual synthesized character count
`task_status`	`1` = Running, `2` = Success, `3` = Failure

Fhddos does not modify any of these fields. Parse them directly using the Volcengine official documentation.

WebSocket Transparent Proxy

For WebSocket-based TTS (V1 binary or V3 streaming), Fhddos operates as a byte-level transparent proxy:

At connection time, Fhddos uses channel_id to select the VolcArk channel and injects TTS auth headers.
During the session, all WebSocket frames are forwarded bidirectionally without parsing or modification.
If either side disconnects, Fhddos closes the other connection immediately.

To migrate existing Volcengine WebSocket code to Fhddos, replace only the host and path:

V1 Binary
V3 Unidirectional
V3 Bidirectional

# Before (direct Volcengine)
wss://openspeech.bytedance.com/api/v1/tts/ws_binary

# After (via Fhddos)
wss://aiapi.fhddos.com/volcark/openspeech/api/v1/tts/ws_binary?channel_id=<channel_id>

# Before (direct Volcengine)
wss://openspeech.bytedance.com/api/v3/tts/unidirectional/stream

# After (via Fhddos)
wss://aiapi.fhddos.com/volcark/openspeech/api/v3/tts/unidirectional/stream?channel_id=<channel_id>

# Before (direct Volcengine)
wss://openspeech.bytedance.com/api/v3/tts/bidirection

# After (via Fhddos)
wss://aiapi.fhddos.com/volcark/openspeech/api/v3/tts/bidirection?channel_id=<channel_id>

All handshake headers, binary protocol framing (Protocol version, Message type, event, payload), and audio content remain unchanged.

Voice and Audio Configuration

Configure the voice and output format in the audio (V1) or req_params.audio_params (V3) fields of your request:

Parameter	Description	Example Values
`voice_type` / `speaker`	Voice ID from the Volcengine voice library	`zh_female_shuangkuaisisi_moon_bigtts`
`encoding` / `format`	Audio codec	`mp3`, `pcm`, `ogg_opus`
`speed_ratio`	Playback speed multiplier	`0.5` – `2.0`, default `1.0`
`sample_rate`	Output sample rate (Hz)	`8000`, `16000`, `24000`

Refer to the Volcengine OpenSpeech documentation for the full list of available voice IDs and parameter constraints.

Security Note

When your request reaches Fhddos, Authorization: Bearer oh-xxxxxxxx is used to authenticate you with Fhddos. Before forwarding to Volcengine, Fhddos strips this header entirely so your Fhddos key never appears in Volcengine’s access logs.

​Supported Endpoints

​HTTP Interfaces

​WebSocket Interfaces

​Credential Injection

​V1 Non-Streaming HTTP

​V3 HTTP Unidirectional Streaming

​V3 Long-Text Async Tasks

​Step 1: Submit

​Step 2: Query

​WebSocket Transparent Proxy

​Voice and Audio Configuration

​Security Note

Supported Endpoints

HTTP Interfaces

WebSocket Interfaces

Credential Injection

V1 Non-Streaming HTTP

V3 HTTP Unidirectional Streaming

V3 Long-Text Async Tasks

Step 1: Submit

Step 2: Query

WebSocket Transparent Proxy

Voice and Audio Configuration

Security Note