End-to-end platform for developing your AI applications

No matter where you start, build and scale your AI with ByteCompute.

Explore AI Models Directory

All categories and models you can try out and seamlessly integrate in your projects

openai/whisper-large-v3-turboStar Featured

openai/whisper-large-v3-turbo

automatic-speech-recognition

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

Memory -Setting - 0.0004$ / min
Qwen3-VL-235B-A22B-Instruct-AWQ

Qwen3-VL-235B-A22B-Instruct-AWQ

TEXT

Qwen3-VL-235B-A22B-Instruct-AWQ

Memory 235GBSetting 235B-A22B0.0025 $ / 1k
MiniMax-M2.5-NVFP4

MiniMax-M2.5-NVFP4

TEXT

MiniMax-M2.5-NVFP4

Memory Setting -
Lightricks/LTX-2.3Star Featured

LTX-2.3

VIDEO

LTX-2.3 is a 22B-parameter DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model.

Memory 32BSetting 22B$0.02/s, $0.04/s
Qwen3.5-27B-FP8

Qwen3.5-27B-FP8

TEXT

Qwen3.5-27B-FP8

Memory -Setting -$0.295/M input tokens;$2.27/M output tokens
flux1-schnellStar Featured

FLUX[schnell]

IMAGE

A fast text-to-image model optimized for rapid image generation. FLUX.1 [schnell] delivers high-quality visual results with low latency, making it ideal for real-time creative workflows, quick prototyping, and interactive image generation.

Memory 25GBSetting 19BStarting from $0.003 per image
flux2-klein-4bStar Featured

FLUX[klein]

IMAGE

The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second.

Memory 21GBSetting 4bStarting from $0.003 per image
boson-audio-multimodal-checkpoint-1200Star Featured

Higgs Audio V2.5

AUDIO

Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model, featuring the DualFFN architecture for efficient acoustic token modeling. It uses a unified audio tokenizer running at 25 FPS with 12 codebooks at 2000 bps, outputting 24kHz audio. Trained on 10M+ hours of audio data (AudioVerse dataset) with GRPO alignment for naturalness.

Memory Setting 1Baudio generation : $0.045/audio min
higgs-asrStar Featured

Higgs-Audio-v3-Speech-to-Text

automatic-speech-recognition

Higgs-Audio-v3-Speech-to-Text is a high-performance automatic speech recognition (ASR) model developed by BosonAI. Built on a 1.7B parameter architecture, it delivers accurate transcription across 60+ languages with an OpenAI Whisper-compatible API interface.

Memory -Setting -$0.006 per minute

Ready to Accelerate AI in Your Organization?

Contact our sales team to discuss your enterprise needs and deployment options.

Get Started