No matter where you start, build and scale your AI with ByteCompute.
All categories and models you can try out and seamlessly integrate in your projects

automatic-speech-recognition
Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

TEXT
Qwen3-VL-235B-A22B-Instruct-AWQ

TEXT
MiniMax-M2.5-NVFP4

VIDEO
LTX-2.3 is a 22B-parameter DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model.

TEXT
Qwen3.5-27B-FP8

IMAGE
A fast text-to-image model optimized for rapid image generation. FLUX.1 [schnell] delivers high-quality visual results with low latency, making it ideal for real-time creative workflows, quick prototyping, and interactive image generation.

IMAGE
The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second.

AUDIO
Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model, featuring the DualFFN architecture for efficient acoustic token modeling. It uses a unified audio tokenizer running at 25 FPS with 12 codebooks at 2000 bps, outputting 24kHz audio. Trained on 10M+ hours of audio data (AudioVerse dataset) with GRPO alignment for naturalness.

automatic-speech-recognition
Higgs-Audio-v3-Speech-to-Text is a high-performance automatic speech recognition (ASR) model developed by BosonAI. Built on a 1.7B parameter architecture, it delivers accurate transcription across 60+ languages with an OpenAI Whisper-compatible API interface.
Contact our sales team to discuss your enterprise needs and deployment options.
Get Started