ElevenLabs voice AI skill. Use when:
ElevenLabs is a voice AI platform providing ultra-realistic text-to-speech, voice cloning, speech-to-text, and conversational AI agents. In the 2nth.ai stack it fills the voice layer — adding spoken interaction to Cloudflare Worker-based agents, AWS Connect contact flows, or any web/mobile surface.
Stub — full skill pending. Core patterns documented below.
| Feature | API | Models | Use case |
|---|---|---|---|
| Text-to-Speech | POST /v1/text-to-speech/:voice_id | eleven_v3, flash_v2_5, turbo_v2_5, multilingual_v2 | Narration, IVR prompts, notifications |
| Streaming TTS | POST /v1/text-to-speech/:voice_id/stream | Same | Real-time voice agents, low-latency playback |
| Speech-to-Text | POST /v1/speech-to-text | Scribe v1 | Transcription, meeting notes, call analytics |
| Voice Cloning | POST /v1/voices/add | — | Brand voice, character voices, personalisation |
| Speech-to-Speech | POST /v1/speech-to-speech/:voice_id | — | Voice conversion with emotion preservation |
| Conversational AI | Agents API + WebSocket | flash_v2_5 | Phone bots, web chat agents, WhatsApp |
| Dubbing | POST /v1/dubbing | — | Localise video/audio to other languages |
| Sound Effects | POST /v1/sound-generation | — | UI sounds, game audio, video production |
| Music | POST /v1/music | — | Background music, jingles |
| Voice Library | GET /v1/voices | — | 10,000+ pre-built voices |
# All requests use xi-api-key header
export ELEVENLABS_API_KEY="your-api-key-here"
curl -H "xi-api-key: $ELEVENLABS_API_KEY" \
https://api.elevenlabs.io/v1/voices
| Model | Latency | Languages | Best for |
|---|---|---|---|
eleven_v3 | ~500ms | 70+ | Highest expressiveness, narration, long-form |
eleven_flash_v2_5 | ~75ms | 70+ | Real-time agents, IVR, live interaction |
eleven_turbo_v2_5 | ~250ms | 70+ | Interactive use, balanced quality/speed |
eleven_multilingual_v2 | ~400ms | 32+ | Consistent multilingual quality, up to 10K chars |
import ElevenLabs from 'elevenlabs';
const client = new ElevenLabs({ apiKey: process.env.ELEVENLABS_API_KEY });
// Generate and save to file
const audio = await client.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', {
text: 'Hello from ElevenLabs.',
model_id: 'eleven_flash_v2_5',
voice_settings: {
stability: 0.5, // 0–1: lower = more expressive variation
similarity_boost: 0.75, // 0–1: how closely to match the cloned voice
style: 0.0,
use_speaker_boost: true,
},
output_format: 'mp3_44100_128',
});
// audio is a ReadableStream — pipe to file or Response
const fs = await import('fs');
const writer = fs.createWriteStream('output.mp3');
for await (const chunk of audio) writer.write(chunk);
writer.end();
// Stream directly to the client — no buffering
export default {
async fetch(req: Request, env: Env): Promise<Response> {
const { text } = await req.json() as { text: string };
const upstream = await fetch(
'https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb/stream',
{
method: 'POST',
headers: {
'xi-api-key': env.ELEVENLABS_API_KEY,
'Content-Type': 'application/json',
'Accept': 'audio/mpeg',
},
body: JSON.stringify({
text,
model_id: 'eleven_flash_v2_5',
output_format: 'mp3_44100_128',
}),
}
);
return new Response(upstream.body, {
headers: {
'Content-Type': 'audio/mpeg',
'Transfer-Encoding': 'chunked',
'Cache-Control': 'no-store',
},
});
},
};
from elevenlabs import ElevenLabs
client = ElevenLabs(api_key="your-api-key")
with open("audio.mp3", "rb") as f:
transcript = client.speech_to_text.convert(
file=f,
model_id="scribe_v1",
language_code="en", # omit for auto-detect
diarize=True, # speaker labels
timestamps_granularity="word",
)
for utterance in transcript.utterances:
print(f"[{utterance.speaker}] {utterance.text}")
// Instant clone — 30s+ audio sample, results in seconds
const formData = new FormData();
formData.append('name', 'My Brand Voice');
formData.append('description', 'Cloned from our CEO recording');
formData.append('files', new Blob([audioBuffer], { type: 'audio/mp3' }), 'sample.mp3');
formData.append('remove_background_noise', 'true');
const response = await fetch('https://api.elevenlabs.io/v1/voices/add', {
method: 'POST',
headers: { 'xi-api-key': process.env.ELEVENLABS_API_KEY! },
body: formData,
});
const { voice_id } = await response.json();
// Use voice_id in subsequent TTS calls
import { ElevenLabs } from '@elevenlabs/react'; // React SDK
// or use raw WebSocket for non-React environments
const ws = new WebSocket(
`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${AGENT_ID}`
);
ws.onopen = () => {
// Send audio chunks (PCM 16kHz mono) or text input
ws.send(JSON.stringify({ type: 'user_audio_chunk', user_audio_chunk: base64Audio }));
};
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
if (msg.type === 'audio') {
// Play base64-encoded audio response
playAudio(msg.audio_event.audio_base_64);
}
if (msg.type === 'agent_response') {
console.log('Agent said:', msg.agent_response_event.agent_response);
}
};
| Model | Per 1,000 characters |
|---|---|
Flash / Turbo (flash_v2_5, turbo_v2_5) | $0.06 |
Standard / v3 (multilingual_v2, eleven_v3) | $0.12 |
| Speech-to-text | per audio minute (see dashboard) |
Free tier: 10,000 credits/month (~10 min high-quality TTS), no commercial rights.
remove_background_noise: true and use a clean 60s+ sample for best resultseleven_v3 10K char limit — multilingual_v2 supports up to 10,000 characters per request; v3 has a lower limit — split long content at sentence boundarieseleven_flash_v2_5 (~75ms) for Amazon Connect or Twilio IVR; standard models add perceptible delaytech/aws/connect — Amazon Connect IVR; swap Connect's Polly TTS for ElevenLabs via Lambdatech/cloudflare/workers — stream ElevenLabs audio directly from a Workertech/cisco/collaboration — add voice AI layer to Webex/CUCM contact flows