openai/whisper-large-v3-turbo

API Documentation: Whisper Audio Transcription 🎙️

This document specifies the API for transcribing audio files using a hosted Whisper model, mimicking the OpenAI Whisper API. The endpoint handles multipart form data for both file uploads and URL-based audio transcription.

Endpoint

Method	URL	Summary
`POST`	`/v1/audio/transcriptions`	Transcribe audio using the Whisper model.

Authentication

The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.

Header	Example	Description
`Authorization`	`Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxx`	The `API_KEY` provided by the server.
`x-request-id`	`UUID_string`	An optional unique identifier for the request, for logging and tracking. If not provided, the server will generate one.

Request

The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.

Parameter	Type	Required	Description
`file`	`File` or `string`	Yes	The audio file to transcribe. It can be a direct file upload or a URL to an audio file.
`model`	`string`	Yes	The name of the transcription model. Must match the model name served by the API (e.g., `"openai/whisper-large-v3-turbo"`).
`response_format`	`string`	No	The format of the response. Supported formats are `json`, `text`, `srt`, `vtt`, and `verbose_json`. Defaults to `json`.
`temperature`	`number`	No	A value from `0.0` to `1.0` that controls randomness. Defaults to `0.0`.
`language`	`string`	No	The language of the audio to assist with transcription.
`prompt`	`string`	No	An optional initial prompt to guide the model.
`condition_on_previous_text`	`boolean`	No	Whether to condition the transcription on previous text. Defaults to `True`.

Example Requests

1. Transcribing a File Upload

bash Copy

curl --raw -s \
     -H "Authorization: Bearer " \
     -F "model=openai/whisper-large-v3-turbo" \
     -F "file=@/path/to/your/audio.mp3" \
     -F "response_format=verbose_json" \
     "https://api.bytecompute.ai/v1/audio/transcriptions"

2. Transcribing a URL

bash Copy

curl --raw -s \
     -H "Authorization: Bearer " \
     -F "model=openai/whisper-large-v3-turbo" \
     -F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
     -F "response_format=verbose_json" \
     "https://gateway.staging.oke-us-1.bytecompute.ai/v1/audio/transcriptions"

Response

Successful Responses

200 OK
- json format: Returns a JSON object with a single text field.
  json Copy
```
{"text": "This is a good story, I like it very much."}
```
- text format: Returns a plain text string.
  Copy
```
This is a good story, I like it very much.
```
- srt or vtt format: Returns a plain text string in the specified subtitle format.
- verbose_json format: Returns a detailed JSON object.
  json Copy
```
{
  "task": "transcribe",
  "language": "english",
  "text": "This is a good story, I like it very much.",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": " This is a good story, I like it very much.",
    }
  ],
  "duration": 2.5,
  "usage": {
    "type": "duration",
    "seconds": 2.5
  }
}
```
- stream format: Returns a Server-Sent Event (SSE) stream. Each event contains a JSON object for a segment or word. The stream concludes with a final event containing usage information and a [DONE] message.

Error Responses

Status Code	Description	Detail
`400 Bad Request`	The request is malformed or invalid.	"No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format".
`401 Unauthorized`	The API key is missing or invalid.	"Authorization header is missing or invalid." or "Invalid API key."
`500 Internal Server Error`	A server-side issue occurred during transcription.