Speech to text
Speaker diarization: This endpoint transcribes audio AND identifies who said what. For simple transcription without speaker labels, this is still the endpoint to use.
POST /v1/audio/diarize
Request body
| Parameter | Type | Description |
|---|---|---|
model |
string | Transcription model. Use whisperx. Required |
file |
file | Audio file to transcribe. MP3, WAV, FLAC supported. Required |
language |
string | Language code (e.g., nl, en). Auto-detected if not specified. |
Example request
The diarization endpoint is not part of the OpenAI SDK, so use a direct HTTP request.
import os
import requests
response = requests.post(
"https://router.appelon.ai/v1/audio/diarize",
headers={"Authorization": f"Bearer {os.environ['APPELON_API_KEY']}"},
files={"file": open("interview.mp3", "rb")},
data={"model": "whisperx", "language": "nl"}
)
result = response.json()
print(result["text"])
Response with speaker labels
WhisperX provides speaker diarization: it identifies different speakers in the audio.
{
"text": "Welkom bij dit interview. Dank je wel voor de uitnodiging.",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Welkom bij dit interview.",
"speaker": "SPEAKER_00"
},
{
"start": 2.8,
"end": 5.1,
"text": "Dank je wel voor de uitnodiging.",
"speaker": "SPEAKER_01"
}
]
}
Supported formats
- MP3
- WAV
- FLAC
- M4A
- OGG
Language support
WhisperX supports 90+ languages including Dutch, English, German, French, Spanish, and more. Language is auto-detected, but specifying it improves accuracy.