End-to-end platform for developing your AI applications

No matter where you start, build and scale your AI with ByteCompute.

Book a Demo Contact Sales

Explore AI Models Directory

All categories and models you can try out and seamlessly integrate in your projects

Featured

openai/whisper-large-v3-turbo

automatic-speech-recognition

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

- 0.0004$ / min

Qwen3-VL-235B-A22B-Instruct-AWQ

TEXT

Qwen3-VL-235B-A22B-Instruct-AWQ

235GB

235B-A22B0.0025 $ / 1k

MiniMax-M2.5-NVFP4

TEXT

MiniMax-M2.5-NVFP4

Featured

LTX-2.3

VIDEO

LTX-2.3 is a 22B-parameter DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model.

32B

22B$0.02/s, $0.04/s

Qwen3.5-27B-FP8

TEXT

Qwen3.5-27B-FP8

-$0.295/M input tokens;$2.27/M output tokens

Featured

FLUX[schnell]

IMAGE

A fast text-to-image model optimized for rapid image generation. FLUX.1 [schnell] delivers high-quality visual results with low latency, making it ideal for real-time creative workflows, quick prototyping, and interactive image generation.

25GB

19BStarting from $0.003 per image

Featured

FLUX[klein]

IMAGE

The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second.

21GB

4bStarting from $0.003 per image

Featured

Higgs Audio V2.5

AUDIO

Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model, featuring the DualFFN architecture for efficient acoustic token modeling. It uses a unified audio tokenizer running at 25 FPS with 12 codebooks at 2000 bps, outputting 24kHz audio. Trained on 10M+ hours of audio data (AudioVerse dataset) with GRPO alignment for naturalness.

1Baudio generation : $0.045/audio min

Featured

Higgs-Audio-v3-Speech-to-Text

automatic-speech-recognition

Higgs-Audio-v3-Speech-to-Text is a high-performance automatic speech recognition (ASR) model developed by BosonAI. Built on a 1.7B parameter architecture, it delivers accurate transcription across 60+ languages with an OpenAI Whisper-compatible API interface.

-$0.006 per minute

Ready to Accelerate AI in Your Organization?

Contact our sales team to discuss your enterprise needs and deployment options.

Get Started