Skip to main content
POST
/
voices
/
voice-clone
Clone a new voice from audio samples
curl --request POST \
  --url https://api.callers.ai/voices/voice-clone \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'name=<string>' \
  --form 'files=<string>' \
  --form 'description=<string>' \
  --form providerType=ELEVENLABS \
  --form supportedLocales=en-US \
  --form subAccounts=false \
  --form stability=0.5 \
  --form similarityBoost=0.5 \
  --form speed=0 \
  --form supportsAllLanguages=false \
  --form files.items='@example-file'
[
  {
    "id": "<string>",
    "voiceId": "<string>",
    "name": "<string>",
    "isMultilingual": true,
    "supportedLocales": [
      "<string>"
    ],
    "ttsClient": "<string>",
    "organizationId": "<string>",
    "settings": {
      "stability": 0.8,
      "style": 0.3
    }
  }
]

Authorizations

Authorization
string
header
required

Headers

x-sub-organization-id
string

Optional sub organization ID. Must belong to your main organization API key.

Body

multipart/form-data
name
string
required

Name of the voice

Minimum string length: 5
files
file[]
required

Audio files for voice cloning

description
string

Description of the voice

providerType
enum<string>

The voice provider type

Available options:
ELEVENLABS,
CARTESIA,
GOOGLE_CLOUD
supportedLocales
enum<string>[]

Supported locales for the voice

Available options:
en-US,
en-GB,
en-AU,
en-CA,
en-IE,
en-IN,
en-NZ,
en-ZA,
en-SG,
es-ES,
es-MX,
fr-FR,
pt-BR,
zh-CN,
cmn-EN,
nl-NL,
fi-FI,
de-DE,
de-AT,
de-CH,
de-LU,
de-LI,
gsw-CH,
gsw-LI,
el-GR,
hi-IN,
en-TA,
id-ID,
it-IT,
ja-JP,
ko-KR,
ms-MY,
en-MS,
ro-RO,
ru-RU,
uk-UA,
bg-BG,
cs-CZ,
da-DK,
pl-PL,
sk-SK,
sv-SE,
tr-TR,
ar-SA,
ary-MA,
arz-EG,
apc-LB,
apc-SY,
apc-PS,
apc-JO,
acm-IQ,
ar-AE,
afb-AE,
afb-BH,
afb-KW,
afb-QA,
afb-OM,
hr-HR,
hr-BA,
he-IL,
ur-PK,
hu-HU,
fil-PH,
th-TH
subAccounts
boolean
default:false

Flag indicating whether sub-accounts are enabled

stability
number

Stability value between 0 and 1

Required range: 0 <= x <= 1
similarityBoost
number

Similarity boost value between 0 and 1

Required range: 0 <= x <= 1
speed
number

Normalized speed between -1.0 (slowest) and 1.0 (fastest). 0 = normal, negative = slower, positive = faster

Required range: -1 <= x <= 1
supportsAllLanguages
boolean
default:false

Flag indicating whether this voice supports all languages

Response

Array of cloned Voice objects

id
string
required

The ID of the voice

voiceId
string
required

The provider-specific voice ID

name
string
required

The name of the voice

isMultilingual
boolean
required

Indicates if the voice supports multiple languages

supportedLocales
string[]
required

Array of supported locale strings

ttsClient
string
required

Identifier for the TTS (text-to-speech) client

organizationId
string

The organization ID that owns this voice

settings
object

Additional settings stored in JSON format

Example:
{ "stability": 0.8, "style": 0.3 }