Skip to main content
POST
/
api
/
open-api
/
v1
/
soundCloning
/
audios
SoundClone - Create Audio Task
curl --request POST \
  --url https://www.jimmyai.cn/api/open-api/v1/soundCloning/audios \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "modelId": "model_123456789",
  "contentText": "Welcome to voice cloning. This is production audio text.",
  "language": "Chinese",
  "soundVersion": "v1",
  "emotion": "neutral",
  "speed": 1,
  "vol": 1,
  "pitch": 0,
  "subtitleEnable": false
}
'
{
  "code": 20000,
  "msg": "ok",
  "data": {
    "id": "audio_764fb3e7-8af2-4d2d-89a0-e58f8f3eb9c4",
    "object": "audio",
    "created": 1781777280,
    "model": "soundCloningAudio",
    "status": "queued",
    "error": null
  }
}
Use the modelId from a completed preview task. Poll Query SoundClone Task for the final audioUrl.modelId is valid for 3 days. The first successful call to this endpoint within that window permanently activates the voice for future generation.Response envelope: { "code": 20000, "msg": "ok", "data": { ... } }.

Request body

FieldTypeRequiredDescription
modelIdstringYesVoice model ID from preview query result.
contentTextstringYesText to synthesize, max 10,000 characters. Insert <#x#> between words for pauses (x in seconds, 0.01–99.99).
soundVersionstringNov1 or v2.
languagestringNoLanguage code, default auto.
emotionstringNoDefault neutral. Values: happy, sad, angry, fearful, disgusted, surprised, neutral.
speednumberNoSpeech rate [0.5, 2], default 1.0.
volnumberNoVolume (0, 10], default 1.0.
pitchintegerNoPitch [-12, 12], default 0.
subtitleEnablebooleanNoGenerate subtitles, default false.
subtitleTypestringNoWhen subtitles enabled, word for word-level; omit for sentence-level.

Billing

Production audio has two fee components:
Model configDescription
sound-cloning-audioCharacter fee per 10,000 characters from contentText
sound-cloning-voiceVoice fee: charged on every production audio submission (per task)
Character counting matches preview: Unicode runes, excluding <#x#> markers.

Example

curl --request POST \
  --url 'https://www.jimmyai.cn/api/open-api/v1/soundCloning/audios' \
  --header 'Authorization: Bearer sk_xxx' \
  --header 'Content-Type: application/json' \
  --data '{
    "modelId": "model_123456789",
    "contentText": "Welcome to voice cloning. This is production audio text.",
    "language": "Chinese",
    "soundVersion": "v1",
    "emotion": "neutral",
    "speed": 1.0,
    "vol": 1.0,
    "pitch": 0,
    "subtitleEnable": false
  }'

Response example

{
  "code": 20000,
  "msg": "ok",
  "data": {
    "id": "audio_764fb3e7-8af2-4d2d-89a0-e58f8f3eb9c4",
    "object": "audio",
    "created": 1781777280,
    "model": "soundCloningAudio",
    "status": "queued",
    "error": null
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
modelId
string
required
contentText
string
required

Text to synthesize, max 10000 characters

soundVersion
enum<string>
Available options:
v1,
v2
language
string
default:auto
emotion
enum<string>
default:neutral
Available options:
happy,
sad,
angry,
fearful,
disgusted,
surprised,
neutral
speed
number
default:1
Required range: 0.5 <= x <= 2
vol
number
default:1
Required range: x <= 10
pitch
integer
default:0
Required range: -12 <= x <= 12
subtitleEnable
boolean
default:false
subtitleType
enum<string>
Available options:
word

Response

200 - application/json

Task created

code
integer
Example:

20000

msg
string
Example:

"ok"

data
object
Example:
{
"id": "audio_16b635ba-5889-4fa5-bbcc-bf67a38c353a",
"object": "audio",
"created": 1781777280,
"model": "soundCloningClone",
"status": "queued",
"error": null
}