API v1 — live now
Real-time, production-grade speech-to-text.
Stream microphone input over WebSockets with sub-second latency, transcribe audio files, and translate speech — over a small, OpenAI-compatible REST + WebSocket API. Plug it into any stack in minutes, no SDK required.
Prefer Python? The official SDK wraps every endpoint with a fully-typed, async-friendly client.
Quickstart
Two env vars, one curl call. You're live.
# 1. Set your key + base URL once
export USF_API_KEY="<the key we sent you>"
export USF_BASE_URL="https://api-prod-usf.us.inc"
# 2. Transcribe a file
curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
-H "Authorization: Bearer $USF_API_KEY" \
-F file=@audio.wav \
-F model=usf-mini-asrEverything you need, nothing in the way.
A handful of REST endpoints and a single WebSocket — documented in detail, with copy-pastable examples in curl, Python, and JavaScript.
File & real-time transcription
Multipart uploads for batch jobs, raw WebSockets for live audio — same model, same accuracy across both paths.
Built for production
Hosted on dedicated GPUs in Stockholm with autoscaling. Typical 100–200 ms end-to-end for a short utterance.
Per-tester keys, audited
Every key is scoped to one tester and revocable in seconds. Every request is logged with user, IP, route, and status.
Transcribe + translate
Two endpoints, one model: /v1/audio/transcriptions for source-language text, /v1/audio/translations for English.
VAD + diarization
Opt-in voice activity detection (file + streaming) and speaker diarization — each toggleable per request. Standalone audio enhancement is on the roadmap.
OpenAI-compatible
Endpoints under /v1/audio/* mirror the Whisper API. Point existing client code at this base URL and it usually just works.
Start transcribing in 60 seconds.
Get an API key by email, copy the curl from the docs, and ship transcription into your product today. Real-time streaming, VAD, and diarization are one parameter away.
