
Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.
Please upload an audio file
This document specifies the API for transcribing audio files using a hosted Whisper model, mimicking the OpenAI Whisper API. The endpoint handles multipart form data for both file uploads and URL-based audio transcription.
| Method | URL | Summary |
|---|---|---|
POST |
/v1/audio/transcriptions |
Transcribe audio using the Whisper model. |
The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.
| Header | Example | Description |
|---|---|---|
Authorization |
Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxx |
The API_KEY provided by the server. |
x-request-id |
UUID_string |
An optional unique identifier for the request, for logging and tracking. If not provided, the server will generate one. |
The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
File or string |
Yes | The audio file to transcribe. It can be a direct file upload or a URL to an audio file. |
model |
string |
Yes | The name of the transcription model. Must match the model name served by the API (e.g., "openai/whisper-large-v3-turbo"). |
response_format |
string |
No | The format of the response. Supported formats are json, text, srt, vtt, and verbose_json. Defaults to json. |
temperature |
number |
No | A value from 0.0 to 1.0 that controls randomness. Defaults to 0.0. |
language |
string |
No | The language of the audio to assist with transcription. |
prompt |
string |
No | An optional initial prompt to guide the model. |
condition_on_previous_text |
boolean |
No | Whether to condition the transcription on previous text. Defaults to True. |
curl --raw -s \
-H "Authorization: Bearer " \
-F "model=openai/whisper-large-v3-turbo" \
-F "file=@/path/to/your/audio.mp3" \
-F "response_format=verbose_json" \
"https://api.bytecompute.ai/v1/audio/transcriptions"
curl --raw -s \
-H "Authorization: Bearer " \
-F "model=openai/whisper-large-v3-turbo" \
-F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
-F "response_format=verbose_json" \
"https://gateway.staging.oke-us-1.bytecompute.ai/v1/audio/transcriptions"
200 OK
json format: Returns a JSON object with a single text field.
{"text": "This is a good story, I like it very much."}
text format: Returns a plain text string.
This is a good story, I like it very much.
srt or vtt format: Returns a plain text string in the specified subtitle format.
verbose_json format: Returns a detailed JSON object.
{
"task": "transcribe",
"language": "english",
"text": "This is a good story, I like it very much.",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": " This is a good story, I like it very much.",
}
],
"duration": 2.5,
"usage": {
"type": "duration",
"seconds": 2.5
}
}
stream format: Returns a Server-Sent Event (SSE) stream. Each event contains a JSON object for a segment or word. The stream concludes with a final event containing usage information and a [DONE] message.
| Status Code | Description | Detail |
|---|---|---|
400 Bad Request |
The request is malformed or invalid. | "No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format". |
401 Unauthorized |
The API key is missing or invalid. | "Authorization header is missing or invalid." or "Invalid API key." |
500 Internal Server Error |
A server-side issue occurred during transcription. |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.