• featured
openai/whisper-large-v3-turbo Robot

openai/whisper-large-v3-turbo

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

0.0004$ / min

Input

Please upload an audio file

Output

API Documentation: Whisper Audio Transcription 🎙️

This document specifies the API for transcribing audio files using a hosted Whisper model, mimicking the OpenAI Whisper API. The endpoint handles multipart form data for both file uploads and URL-based audio transcription.


Endpoint

Method URL Summary
POST /v1/audio/transcriptions Transcribe audio using the Whisper model.

Authentication

The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.

Header Example Description
Authorization Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxx The API_KEY provided by the server.
x-request-id UUID_string An optional unique identifier for the request, for logging and tracking. If not provided, the server will generate one.

Request

The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.

Parameter Type Required Description
file File or string Yes The audio file to transcribe. It can be a direct file upload or a URL to an audio file.
model string Yes The name of the transcription model. Must match the model name served by the API (e.g., "openai/whisper-large-v3-turbo").
response_format string No The format of the response. Supported formats are json, text, srt, vtt, and verbose_json. Defaults to json.
temperature number No A value from 0.0 to 1.0 that controls randomness. Defaults to 0.0.
language string No The language of the audio to assist with transcription.
prompt string No An optional initial prompt to guide the model.
condition_on_previous_text boolean No Whether to condition the transcription on previous text. Defaults to True.

Example Requests

1. Transcribing a File Upload

bash Copy
curl --raw -s \
     -H "Authorization: Bearer " \
     -F "model=openai/whisper-large-v3-turbo" \
     -F "file=@/path/to/your/audio.mp3" \
     -F "response_format=verbose_json" \
     "https://api.bytecompute.ai/v1/audio/transcriptions"

2. Transcribing a URL

bash Copy
curl --raw -s \
     -H "Authorization: Bearer " \
     -F "model=openai/whisper-large-v3-turbo" \
     -F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
     -F "response_format=verbose_json" \
     "https://gateway.staging.oke-us-1.bytecompute.ai/v1/audio/transcriptions"

Response

Successful Responses

  • 200 OK
    • json format: Returns a JSON object with a single text field.

      json Copy
      {"text": "This is a good story, I like it very much."}
    • text format: Returns a plain text string.

      Copy
      This is a good story, I like it very much.
    • srt or vtt format: Returns a plain text string in the specified subtitle format.

    • verbose_json format: Returns a detailed JSON object.

      json Copy
      {
        "task": "transcribe",
        "language": "english",
        "text": "This is a good story, I like it very much.",
        "segments": [
          {
            "start": 0.0,
            "end": 2.5,
            "text": " This is a good story, I like it very much.",
          }
        ],
        "duration": 2.5,
        "usage": {
          "type": "duration",
          "seconds": 2.5
        }
      }
    • stream format: Returns a Server-Sent Event (SSE) stream. Each event contains a JSON object for a segment or word. The stream concludes with a final event containing usage information and a [DONE] message.


Error Responses

Status Code Description Detail
400 Bad Request The request is malformed or invalid. "No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format".
401 Unauthorized The API key is missing or invalid. "Authorization header is missing or invalid." or "Invalid API key."
500 Internal Server Error A server-side issue occurred during transcription.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales