POST
/
v1
/
tts

Authorizations

X-API-KEY
string
headerrequired

Body

application/json
voice_id
string
required

The voice ID to use for text-to-speech conversion. See the available voices documentation 👉 GET /voices/get-voices

text
string
required

The text to convert to speech

output_format
enum<string>
default: raw
Available options:
raw,
wav,
mp3
encoding
enum<string>
default: pcm_f32le
Available options:
pcm_f32le,
pcm_s16le,
mulaw,
alaw,
mp3
sample_rate
integer
default: 16000

The sample rate of the audio file in Hertz (Hz). This determines the number of samples of audio carried per second. The minimum value is 8000 Hz, which is typically used for telephony. The default value is 16000 Hz, which provides a good balance between quality and file size. The maximum value is 48000 Hz, which is used for high-quality audio recordings.

Required range: 8000 < x < 48000
language
enum<string>
default: en
Available options:
en,
fr,
de,
es,
pt,
zh,
ja,
hi,
it,
ko,
nl,
pl,
ru,
sv,
tr
voice_speed
number
default: 0

Adjusts the speed of the voice. Acceptable values range from -1 (slowest) to 1 (fastest), with 0 being the default normal speed.

Required range: -1 < x < 1
voice_emotion
enum<string>[]

Array of voice emotions to apply. Acceptable values include various levels of anger, positivity, surprise, sadness, and curiosity.

Available options:
anger:lowest,
anger:low,
anger,
anger:high,
anger:highest,
positivity:lowest,
positivity:low,
positivity,
positivity:high,
positivity:highest,
surprise:lowest,
surprise:low,
surprise,
surprise:high,
surprise:highest,
sadness:lowest,
sadness:low,
sadness,
sadness:high,
sadness:highest,
curiosity:lowest,
curiosity:low,
curiosity,
curiosity:high,
curiosity:highest

Response

200 - audio/wav

The response is of type file.