API reference / Models

Models

GRONINGEN · NL

One API, multiple capabilities. Chat, embeddings, image generation, and speech. All running on GPUs in Groningen.

Models

GET /v1/models

Lists all available models. Returns model IDs and their capabilities.

Chat models

Text generation and conversation. Use with /v1/chat/completions.

qwen

Qwen 3.6 35B (MoE, 3B active). Fast interactive model for conversations, analysis, and text generation.

  • ~85 tok/s
  • 128K context
  • A40 GPU

Recommended for most use cases.

gemma

Gemma 4 31B (Dense). Deep analysis model with strong reasoning. Best for complex tasks.

  • ~85 tok/s
  • 128K context
  • Blackwell GPU

Embedding models

Convert text to vectors for search, similarity, and RAG. Use with /v1/embeddings.

bge-m3

BGE-M3 multilingual embeddings. State-of-the-art for retrieval, supports 100+ languages including Dutch.

  • 1024 dimensions
  • 8K tokens max

Image generation

Generate images from text. Use with /v1/images/generations.

flux-schnell

FLUX Schnell. Fast generation (~2s per image) for rapid iteration and prototyping.

  • 1024×1024
  • ~2s

flux-dev

FLUX Dev. Higher quality output with more detail and better prompt adherence.

  • 1024×1024
  • ~8s

Speech to text

Transcription with speaker diarization. Use with /v1/audio/diarize.

whisperx

WhisperX with speaker diarization. Transcribes audio and identifies who said what. Supports Dutch and 90+ other languages.

  • MP3, WAV, FLAC
  • Speaker labels

Model aliases

Use short aliases or full model names interchangeably.

Alias Model
qwen Qwen/Qwen3.6-35B-A3B-FP8
qwen-fast Qwen/Qwen3.6-35B-A3B-FP8
gemma RedHatAI/gemma-4-31B-it-FP8-block
gemma-4 RedHatAI/gemma-4-31B-it-FP8-block
flux-schnell schnell
flux-dev dev
diarize whisperx

Need a different model? We can deploy additional models on request. Contact us at support@appelon.ai