Clone a new voice from audio samples

curl --request POST \ --url https://api.callers.ai/voices/voice-clone \ --header 'Authorization: <api-key>' \ --header 'Content-Type: multipart/form-data' \ --form 'name=<string>' \ --form 'files=<string>' \ --form 'description=<string>' \ --form supportedLocales=en-US \ --form subAccounts=false \ --form stability=0.5 \ --form similarityBoost=0.5 \ --form speed=0 \ --form supportsAllLanguages=false \ --form files.items='@example-file'

[ { "id": "<string>", "voiceId": "<string>", "name": "<string>", "isMultilingual": true, "supportedLocales": [ "<string>" ], "ttsClient": "<string>", "organizationId": "<string>", "settings": { "stability": 0.8, "style": 0.3 } } ]

Authorizations

Authorization

string

header

required

Headers

x-sub-organization-id

string

Optional sub organization ID. Must belong to your main organization API key.

Body

multipart/form-data

name

string

required

Name of the voice

Minimum string length: 5

files

file[]

required

Audio files for voice cloning

description

string

Description of the voice

providerType

enum<string>

The voice provider type

Available options:

ELEVENLABS,

CARTESIA,

GOOGLE_CLOUD

supportedLocales

enum<string>[]

Supported locales for the voice

Available options:

en-US,

en-GB,

en-AU,

en-CA,

en-IE,

en-IN,

en-NZ,

en-ZA,

en-SG,

es-ES,

es-MX,

fr-FR,

pt-BR,

pt-PT,

zh-CN,

cmn-EN,

nl-NL,

fi-FI,

de-DE,

de-AT,

de-CH,

de-LU,

de-LI,

gsw-CH,

gsw-LI,

el-GR,

hi-IN,

en-TA,

id-ID,

it-IT,

ja-JP,

ko-KR,

ms-MY,

en-MS,

ro-RO,

ru-RU,

uk-UA,

ka-GE,

bg-BG,

cs-CZ,

da-DK,

pl-PL,

sk-SK,

sv-SE,

tr-TR,

ar-SA,

ary-MA,

arz-EG,

apc-LB,

apc-SY,

apc-PS,

apc-JO,

acm-IQ,

ar-AE,

afb-AE,

afb-BH,

afb-KW,

afb-QA,

afb-OM,

hr-HR,

hr-BA,

he-IL,

ur-PK,

hu-HU,

fil-PH,

th-TH,

vi-VN

subAccounts

boolean

default:false

Flag indicating whether sub-accounts are enabled

stability

number

Stability value between 0 and 1

Required range: 0 <= x <= 1

similarityBoost

number

Similarity boost value between 0 and 1

Required range: 0 <= x <= 1

speed

number

Normalized speed between -1.0 (slowest) and 1.0 (fastest). 0 = normal, negative = slower, positive = faster

Required range: -1 <= x <= 1

supportsAllLanguages

boolean

default:false

Flag indicating whether this voice supports all languages

Response

Array of cloned Voice objects

string

required

The ID of the voice

voiceId

string

required

The provider-specific voice ID

name

string

required

The name of the voice

isMultilingual

boolean

required

Indicates if the voice supports multiple languages

supportedLocales

string[]

required

Array of supported locale strings

ttsClient

string

required

Identifier for the TTS (text-to-speech) client

gender

enum<string>

required

The gender of the voice

Available options:

MALE,

FEMALE

organizationId

string

The organization ID that owns this voice

settings

object

Additional settings stored in JSON format

Example:

{ "stability": 0.8, "style": 0.3 }