API Reference

Real-time, production-grade speech-to-text. Stream microphone input over WebSockets with sub-second latency, transcribe audio files, and translate speech — over a small, OpenAI-compatible REST + WebSocket surface. Plug it into any stack in minutes, no SDK required.

Prefer Python? The official SDK wraps every endpoint below with a fully-typed, async-friendly client — same model, same accuracy, same guarantees. Its full guide lives further down this same page.

Get an API key

You'll need an API key to use the API. Request one from your UltraSafe AI contact. Keys look like:

usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your key secret — anyone with it can make transcription requests against your account. Don't commit it to source control or paste it into client-side code.

Base URL

All endpoints are served under a single base URL. For the current beta:

https://api-prod-usf.us.inc

For WebSocket endpoints (only /v1/audio/transcriptions/stream so far) replace https:// with wss://:

wss://api-prod-usf.us.inc

Authentication

Every authenticated REST request needs an API key in the Authorization header:

Authorization: Bearer usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The WebSocket endpoint accepts the key as a query-string parameter instead — see Real-time streaming below.

A few endpoints are public (no key needed): GET /health, GET /ready, and GET /v1/capabilities.

Quick start

# 1. Set your key + base URL once
export USF_API_KEY="usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export USF_BASE_URL="https://api-prod-usf.us.inc"

# 2. Transcribe a file
curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@audio.wav \
  -F model=usf-mini-asr

Response:

{
  "text": "Hello, this is a test of the UltraSafe ASR transcription service.",
  "duration": 4.13,
  "processing_time_s": 0.31,
  "model_inference_s": 0.142,
  "timing": {
    "total_s": 0.31,
    "model_inference_ms": 141.8,
    "audio_decode_ms": 1.0
  }
}

That's the whole flow for a basic call. The rest of this document covers every parameter and every endpoint in detail.

Endpoints at a glance

Method	Path	Auth	Status	Purpose
`GET`	`/health`	none	Live	Liveness — is the proxy alive?
`GET`	`/ready`	none	Live	Readiness — is upstream reachable?
`GET`	`/v1/capabilities`	none	Live	Feature flags and model info for this deployment
`GET`	`/v1/models`	bearer	Live	List available models
`GET`	`/v1/models/{id}`	bearer	Live	Inspect a single model
`POST`	`/v1/audio/transcriptions`	bearer	Live	Transcribe an audio file
`POST`	`/v1/audio/translations`	bearer	Live	Translate audio to English text
`POST`	`/v1/audio/enhance`	bearer	Under development	Standalone audio enhancement (denoise, BWE) with optional transcript — currently returns `503`
`WS`	`/v1/audio/transcriptions/stream`	query-string key	Live	Real-time streaming transcription

Status legend. Live — fully working in production today. Under development — the endpoint is reachable and the request shape is stable, but the underlying capability is not yet enabled on the production deployment. Sections below flag any individual fields that share the same status.

Capabilities

Not every deployment has every feature enabled. Call this first to see what's available. Public — no key required.

curl -sS "$USF_BASE_URL/v1/capabilities"

Response (truncated):

{
  "model": { "id": "usf-mini-asr", "language": "en" },
  "features": {
    "vad": { "enabled": true },
    "noise_reduction": { "enabled": false, "level": "medium" },
    "audio_enhancement": { "enabled": false },
    "diarization": { "enabled": false, "method": "clustering" },
    "speaker_identification": { "enabled": false }
  },
  "streaming": { "websocket": true, "sse": false },
  "response_formats": ["json", "verbose_json", "text"]
}

If features.diarization.enabled is false, passing diarization=clustering to a transcription call will return a 400 error. Always check first.

Models

curl -sS "$USF_BASE_URL/v1/models" \
  -H "Authorization: Bearer $USF_API_KEY"

{
  "data": [
    {
      "id": "usf-mini-asr",
      "object": "model",
      "owned_by": "ultrasafe",
      "created": 1714000000
    }
  ],
  "object": "list"
}

Or fetch a single model by id:

curl -sS "$USF_BASE_URL/v1/models/usf-mini-asr" \
  -H "Authorization: Bearer $USF_API_KEY"

File transcription

POST /v1/audio/transcriptions — converts an audio file into text.

Request

Field	Type	Where	Default	Description
`file`	binary	multipart	required	The audio file. WAV, MP3, FLAC, M4A, OGG. Buffered into memory before sending.
`model`	string	multipart	`usf-mini-asr`	Model id (see `/v1/models`).
`response_format`	string	multipart	`json`	One of `json`, `verbose_json`, `text`.
`language`	string	multipart	auto-detect	ISO 639-1 code, e.g. `en`, `es`, `fr`.
`prompt`	string	multipart	—	Bias the decoder with domain words ("Medical terminology.").
`temperature`	number	multipart	`0`	Sampling temperature, 0 to 1.
`timestamp_granularities`	json array	multipart	—	Subset of `["segment", "word"]` (JSON-encoded). Under development — the parameter is accepted but `segments` / `words` are not yet returned in the response.

Voice activity detection (when `features.vad.enabled` is true)

Field	Type	Default	Description
`enable_vad`	boolean	server	Force VAD on or off.
`vad_threshold`	number	`0.5`	Probability threshold for "speech."
`vad_min_speech_duration_ms`	int	`250`	Drop speech segments shorter than this.
`vad_min_silence_duration_ms`	int	`300`	Merge across silences shorter than this.

Audio enhancement (when `features.audio_enhancement.enabled` is true)

Under development noise_reduction, enable_background_suppression, enable_voice_extraction, and enable_audio_upscale are listed as capabilities by /v1/capabilities but are not yet wired up on the production deployment. The fields are accepted by the API for forward compatibility — currently they have no effect. VAD (above) and diarization (below) are live.

Field	Type	Description
`noise_reduction`	string	`off`, `low`, `medium`, `high`.
`enable_background_suppression`	boolean	Drop background noise before ASR.
`enable_voice_extraction`	boolean	Isolate the dominant voice.
`enable_audio_upscale`	boolean	Bandwidth extension before ASR.

Diarization (when `features.diarization.enabled` is true)

Field	Type	Description
`diarization`	string	`off`, `pyannote`, `clustering`, `spectral`.
`num_speakers`	int	Hint: exact number of speakers.
`min_speakers`	int	Hint: minimum.
`max_speakers`	int	Hint: maximum.

Speaker separation

Field	Type	Description
`enable_speaker_separation`	boolean	Split overlapping speakers before ASR.
`speaker_similarity_threshold`	number	0–1; lower = more aggressive splits.

Examples

Minimal:

curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@audio.wav \
  -F model=usf-mini-asr

With language hint and a prompt:

curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@meeting.wav \
  -F model=usf-mini-asr \
  -F language=en \
  -F prompt="Quarterly earnings call. Acme Corp." \
  -F response_format=verbose_json

With VAD enabled:

curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@audio.wav \
  -F model=usf-mini-asr \
  -F response_format=verbose_json \
  -F enable_vad=true

Word- and segment-level timestamps are under development. The timestamp_granularities parameter is accepted, but the batch endpoint does not yet emit segments / words in the response. Per-segment timestamps are available today on the real-time streaming endpoint (the transcript event includes segment.start / segment.end / confidence).

Response (`response_format=json`)

{
  "text": "Hello, this is a test of the UltraSafe ASR transcription service.",
  "duration": 4.13,
  "processing_time_s": 0.31,
  "model_inference_s": 0.142,
  "timing": {
    "total_s": 0.31,
    "model_inference_ms": 141.8
  }
}

Response (`response_format=verbose_json`)

Adds detected language, audio duration, and timing breakdown:

{
  "task": "transcribe",
  "text": "Hello, this is a test of the UltraSafe ASR transcription service.",
  "language": "en",
  "duration": 4.13,
  "processing_time_s": 0.31,
  "model_inference_s": 0.142,
  "timing": {
    "total_s": 0.31,
    "model_inference_ms": 141.8
  }
}

segments and words are under development for the batch endpoint and will be added in a future release. If you need per-segment timestamps today, use the real-time streaming endpoint — every transcript event includes a segment block with start, end, and confidence.

Response (`response_format=text`)

Plain-text body, no JSON wrapping:

Hello, this is a test of the UltraSafe ASR transcription service.

Python (no SDK, just `requests`)

import os, requests

url = f"{os.environ['USF_BASE_URL']}/v1/audio/transcriptions"
headers = {"Authorization": f"Bearer {os.environ['USF_API_KEY']}"}

with open("audio.wav", "rb") as fh:
    r = requests.post(
        url,
        headers=headers,
        files={"file": ("audio.wav", fh, "application/octet-stream")},
        data={"model": "usf-mini-asr", "response_format": "json"},
        timeout=300,
    )
r.raise_for_status()
print(r.json()["text"])

JavaScript / Node 18+ (`fetch` + `FormData`)

import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("audio.wav")]), "audio.wav");
form.append("model", "usf-mini-asr");
form.append("response_format", "json");

const r = await fetch(`${process.env.USF_BASE_URL}/v1/audio/transcriptions`, {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.USF_API_KEY}` },
  body: form,
});
if (!r.ok) throw new Error(`HTTP ${r.status}: ${await r.text()}`);
console.log((await r.json()).text);

Browser (`fetch` from a `<input type="file">`)

const file = document.querySelector("#audio").files[0];
const form = new FormData();
form.append("file", file);
form.append("model", "usf-mini-asr");

const r = await fetch("/api-proxy/v1/audio/transcriptions", { // call YOUR backend
  method: "POST",
  body: form,
});
const { text } = await r.json();

Don't put the API key in browser code. Front your API with your own backend that injects the Authorization header.

Translation

POST /v1/audio/translations — transcribes audio in any supported language and returns English text.

Same request shape as transcription, but the language parameter is ignored (the model auto-detects the source language).

curl -sS "$USF_BASE_URL/v1/audio/translations" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@french_audio.wav \
  -F model=usf-mini-asr

{
  "text": "Hello everyone, welcome to today's meeting.",
  "duration": 5.12
}

Audio enhancement

Under development POST /v1/audio/enhance is documented for reference but is not currently enabled on the production API — calls return 503 Service Unavailable until the upstream upscale service is turned on. For voice activity detection, use enable_vad on /v1/audio/transcriptions; for live VAD events, use the streaming endpoint (it emits speech_activity events).

POST /v1/audio/enhance — standalone audio enhancement pipeline. Returns enhanced audio (base64-encoded WAV by default) with optional transcripts of the before/after.

Field	Type	Description
`file`	binary	Source audio (multipart).
`model`	string	Model id; `usf-mini-asr` works for the default pipeline.
`enable_denoise`	boolean	Run the denoiser.
`enable_bwe`	boolean	Bandwidth extension (upscale narrowband audio to 16 kHz/24 kHz).
`output_format`	string	`wav` (default) or `pcm`.
`output_sample_rate`	int	Target sample rate of the enhanced audio.
`include_transcription`	boolean	Also transcribe the enhanced audio and return both texts.

curl -sS "$USF_BASE_URL/v1/audio/enhance" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@noisy.wav \
  -F model=usf-mini-asr \
  -F enable_denoise=true \
  -F enable_bwe=true \
  -F include_transcription=true

{
  "audio_base64": "UklGRi…",
  "audio_format": "wav",
  "transcription": {
    "text": "Hello everyone."
  }
}

To save the enhanced audio:

curl -sS "$USF_BASE_URL/v1/audio/enhance" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@noisy.wav \
  -F enable_denoise=true \
  -F enable_bwe=true \
  | jq -r .audio_base64 | base64 -d > clean.wav

Real-time streaming

WSS /v1/audio/transcriptions/stream — open a WebSocket, push raw PCM audio chunks, and receive live transcripts as they're produced.

Connect URL

The streaming endpoint takes the API key in the query string (most browser/CLI WebSocket clients can't send custom headers easily).

wss://api-prod-usf.us.inc/v1/audio/transcriptions/stream
    ?api_key=usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    &model=usf-mini-asr
    &sample_rate=16000
    &audio_format=pcm_s16le

Query param	Default	Description
`api_key`	required	Your `usf_…` key.
`model`	`usf-mini-asr`	Model id.
`sample_rate`	`16000`	Audio sample rate in Hz (8000–48000 typical).
`audio_format`	`pcm_s16le`	One of `pcm_s16le`, `pcm_f32le`, `wav`.

Privacy note: the API key appears in the URL, which means it'll show up in any reverse-proxy access log between you and the server. Treat your key as a secret and don't paste streaming URLs into shared logs / chat / etc.

Frame protocol

Client → server: raw audio bytes (binary frames). Each frame is one chunk of PCM at the rate/format you negotiated in the URL. ~100 ms per chunk is a good default.
Server → client: JSON text frames, one event per frame, each tagged with a type field.

When you're done sending audio, send the JSON control message:

{ "type": "control", "action": "stop" }

The server finalises, emits a done event, and (a few hundred ms later) a higher-quality retranscribe_result event before closing the connection.

Event types

`type`	Sent when	Useful fields
`ready`	Immediately after the handshake.	—
`speech_activity`	Voice-activity detector flips on/off.	`is_speech`
`transcript`	A new interim or finalised segment.	`is_final`, `segment.text`, `segment.start`, `segment.end`
`done`	All audio processed (real-time pass complete).	`response.text`, `response.duration`
`retranscribe_result`	Best-quality full-context re-pass complete.	`response.full_context_text`
`error`	Something went wrong.	`error.message`, `error.code`

The lifecycle is normally:

ready
  ↓ (one or more)
speech_activity → transcript (is_final=false) → transcript (is_final=true)
  ↓
done
  ↓
retranscribe_result   ← prefer this for the final transcript

Python (no SDK, raw `websockets`)

import asyncio, json, wave, websockets

URL = (
    "wss://api-prod-usf.us.inc/v1/audio/transcriptions/stream"
    "?api_key=usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    "&model=usf-mini-asr&sample_rate=16000&audio_format=pcm_s16le"
)

async def stream(path: str) -> None:
    with wave.open(path, "rb") as w:
        sr, sw = w.getframerate(), w.getsampwidth()
        pcm = w.readframes(w.getnframes())

    async with websockets.connect(URL, max_size=None) as ws:
        chunk = int(sr * 0.1) * sw  # 100ms chunks
        for i in range(0, len(pcm), chunk):
            await ws.send(pcm[i:i+chunk])
        await ws.send(json.dumps({"type": "control", "action": "stop"}))

        async for raw in ws:
            evt = json.loads(raw)
            t = evt.get("type")
            if t == "transcript" and evt.get("segment", {}).get("is_final"):
                print("FINAL:", evt["segment"]["text"])
            elif t == "retranscribe_result":
                print("BEST:", evt["response"]["full_context_text"])
                break

asyncio.run(stream("audio.wav"))

JavaScript / Node (`ws` package)

import WebSocket from "ws";
import fs from "node:fs";

const url = `wss://api-prod-usf.us.inc/v1/audio/transcriptions/stream`
  + `?api_key=${process.env.USF_API_KEY}`
  + `&model=usf-mini-asr&sample_rate=16000&audio_format=pcm_s16le`;

const ws = new WebSocket(url);

ws.on("open", () => {
  const pcm = fs.readFileSync("audio.pcm"); // raw 16-bit mono PCM @ 16 kHz
  const chunkBytes = 16000 * 2 * 0.1; // 100ms
  for (let i = 0; i < pcm.length; i += chunkBytes) {
    ws.send(pcm.slice(i, i + chunkBytes));
  }
  ws.send(JSON.stringify({ type: "control", action: "stop" }));
});

ws.on("message", (raw) => {
  const evt = JSON.parse(raw.toString());
  if (evt.type === "retranscribe_result") {
    console.log("FINAL:", evt.response.full_context_text);
    ws.close();
  }
});

Health & readiness

curl -sS "$USF_BASE_URL/health"
# → {"status":"healthy"}

curl -sS "$USF_BASE_URL/ready"
# → {"status":"ready","upstream_status":200}

/health reflects the proxy itself; /ready additionally probes the upstream. Both are public (no key) and safe to use in load-balancer health checks.

Errors

The API returns conventional HTTP status codes and an OpenAI-style error envelope. Inspect the body to tell apart "you sent something wrong" from "the server is having a bad day."

HTTP	Meaning	Common causes
`400`	Validation error	Unsupported `model`, missing `file`, bad query params, feature disabled in this deployment.
`401`	Missing or invalid key	Wrong `Authorization` header, key disabled, key revoked.
`403`	Forbidden	Auth header missing on a protected endpoint.
`413`	Payload too large	File exceeds the deployment's per-request limit. Split it.
`429`	Rate-limited	Slow down and retry with backoff.
`5xx`	Server-side failure	Upstream broken, GPU OOM, transient. Retry with exponential backoff.

Example error body:

{
  "detail": {
    "error": {
      "message": "The specified model is not available.",
      "type": "invalid_request_error",
      "param": "model",
      "code": "model_not_found"
    }
  }
}

For some endpoints the envelope is flatter ({"error": "..."}); always check the HTTP status code first, then look at the body for context.

Suggested retry policy

Transport-level errors (TCP reset, TLS handshake failed, DNS): retry up to 3 times with exponential backoff (250 ms, 500 ms, 1 s).
HTTP 5xx: same retry budget. Use jitter to avoid thundering-herd retries when a brief upstream blip recovers.
HTTP 429: read Retry-After if present; otherwise back off ≥ 1 s.
HTTP 4xx other than 429: do not retry. Fix the request.

Rate limits

There is no hard rate limit during the beta; we rely on per-key quotas configured server-side. If you see sustained 429s, contact us and we'll raise your quota or investigate.

Privacy & retention

Audio is processed in-memory and not stored on the server.
Transcripts are not persisted.
Logs record user_label, IP, path, status, and duration on every request — no audio, no transcript content. We use these for billing and debugging.

OpenAI compatibility

The endpoints under /v1/audio/* are intentionally close to OpenAI's Whisper API surface (same paths, mostly the same form fields, very similar response shapes). Most existing OpenAI Whisper code can be pointed at this API by changing the base_url and api_key.

Differences worth knowing:

The streaming endpoint (/v1/audio/transcriptions/stream) is WebSocket-based, not SSE.
Some optional parameters (enable_vad, diarization, enable_denoise, …) are UltraSafe-specific and have no OpenAI equivalent.
model ids are different (usf-mini-asr, …).

Support

For API-key requests, increased quotas, or any issue with the API, contact your UltraSafe AI account manager.

Python SDK

BETA

The Python SDK is a typed convenience wrapper around the same endpoints documented above. It's currently in beta — public method signatures may still change before v2.0, so pin a specific version in your requirements. If you need a fully stable surface today, use the HTTP API directly.

The official Python client for the UltraSafe AI ASR API. Transcribe audio files, stream microphone audio in real time, and translate speech — all with a small, fully-typed client.

Synchronous REST client for file transcription, translation, and enhancement
Async WebSocket client for real-time streaming
Typed responses and a clean exception hierarchy
OpenAI-compatible under the hood

Get an API key

You'll need an API key to use this SDK. Request one from your UltraSafe AI account manager or sign in to the dashboard. Keys look like:

sk-usf-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your key secret — anyone with it can make transcription requests against your account. Don't commit it to source control.

Installation

The SDK is published as a wheel served from a CloudFront-backed S3 bucket. Install (or upgrade) in one line:

pip install https://d3a9v9y9w3meag.cloudfront.net/wheels/1.1.0/ultrasafe_asr-1.1.0-py3-none-any.whl

If you've been sent the wheel file directly (e.g. for an air-gapped environment) or you built it locally from dist/, install from the path:

pip install ./ultrasafe_asr-1.1.0-py3-none-any.whl

Requires Python 3.9+. The only runtime dependencies are httpx and websockets, both installed automatically.

Usage

from ultrasafe_asr import UsfClient

client = UsfClient(
    api_key="sk-usf-...",                       # your API key
    base_url="https://api-prod-usf.us.inc",     # your ASR server URL
)

result = client.transcribe("audio.wav")
print(result.text)

Or configure via environment variables and skip the constructor arguments:

export USF_API_KEY="sk-usf-..."
export USF_BASE_URL="https://api-prod-usf.us.inc"

from ultrasafe_asr import UsfClient

client = UsfClient()
print(client.transcribe("audio.wav").text)

The client can be used as a context manager to ensure the HTTP connection is closed cleanly:

with UsfClient() as client:
    print(client.transcribe("audio.wav").text)

File transcription

result = client.transcribe(
    "audio.wav",
    model="usf-mini-asr",
    response_format="verbose_json",     # json | text | verbose_json
    language="en",
    prompt="Medical terminology.",
    temperature=0.0,
)

print(result.text)           # "Hello, this is a test..."
print(result.language)       # "en"
print(result.duration)       # 4.13 (seconds)
print(result.timing)         # {"total_s": 0.31, "model_inference_ms": 308.7, ...}
print(result.raw)            # full server payload

The file argument accepts a path (str / pathlib.Path) or an open binary file-like object:

with open("audio.wav", "rb") as fh:
    result = client.transcribe(fh)

Advanced transcription options

All optional server parameters are exposed as keyword arguments. Check client.capabilities() first — most of these require the feature to be enabled on the deployment.

result = client.transcribe(
    "audio.wav",
    # Voice-activity detection
    enable_vad=True,
    vad_threshold=0.5,
    vad_min_speech_duration_ms=250,
    vad_min_silence_duration_ms=300,
    # Enhancement
    noise_reduction="medium",            # off | low | medium | high
    enable_background_suppression=True,
    enable_voice_extraction=True,
    enable_audio_upscale=True,
    # Diarization
    diarization="clustering",            # off | pyannote | clustering | spectral
    num_speakers=2,
    # Speaker separation
    enable_speaker_separation=True,
    speaker_similarity_threshold=0.7,
)

Under development The four enhancement parameters (noise_reduction, enable_background_suppression, enable_voice_extraction, enable_audio_upscale) are accepted by the API for forward compatibility but are not yet wired up on the production deployment — passing them today is a no-op. VAD and diarization are live.

Translation

Translate audio in any supported language to English text:

result = client.translate("french_audio.wav")
print(result.text)

Audio enhancement

Under development client.enhance(...) calls POST /v1/audio/enhance, which currently returns 503 Service Unavailable on the production deployment. The SDK method below is documented for reference — it will start working as soon as the upstream upscale service is enabled. Until then, use VAD and diarization parameters on client.transcribe(...) for the closest in-pipeline equivalent.

Run the standalone enhancement pipeline. Returns enhanced audio (base64-encoded WAV) with an optional transcription of the before/after:

result = client.enhance(
    "noisy.wav",
    enable_denoise=True,
    enable_bwe=True,
    include_transcription=True,
)

result.save("cleaned.wav")
print(result.transcription.text if result.transcription else "(no transcription)")

Real-time streaming

Stream audio to the ASR server over a WebSocket and receive interim and final transcripts as they are produced. Uses asyncio.

import asyncio
from ultrasafe_asr import UsfClient, TranscriptEvent, RetranscribeEvent

async def main():
    client = UsfClient()
    async with client.stream() as session:
        async for event in session.send_file("audio.wav"):
            if isinstance(event, TranscriptEvent) and event.is_final:
                print("FINAL:", event.text)
            elif isinstance(event, RetranscribeEvent):
                print("Best-quality:", event.text)

asyncio.run(main())

Push your own PCM chunks (e.g. from a microphone):

async with client.stream(sample_rate=16000, audio_format="pcm_s16le") as session:
    for chunk in mic_chunks:             # bytes of 16-bit mono PCM
        await session.send_bytes(chunk)
    await session.stop()
    final_transcript = await session.final_text()

Streaming event types

Event	Emitted when	Key fields
`ReadyEvent`	Immediately after the WebSocket handshake	—
`SpeechActivityEvent`	VAD detects speech start/stop	`is_speech`
`TranscriptEvent`	Interim or finalized transcript segment	`is_final`, `text`, `committed_text`, `full_text`
`DoneEvent`	All audio processed (real-time path)	`text`, `duration`
`RetranscribeEvent`	Best-quality full-context re-pass completed	`text`
`ErrorEvent`	Server-side error	`message`, `code`

Prefer RetranscribeEvent.text as the final transcript — it's a full-context re-pass with higher accuracy than the real-time DoneEvent.text.

Capabilities

Not every deployment has every feature enabled. Check before you build:

caps = client.capabilities()

if caps.feature_enabled("diarization"):
    result = client.transcribe("audio.wav", diarization="clustering")

print(caps.model)           # {"id": "usf-mini-asr", "language": "en"}
print(caps.streaming)       # {"websocket": True, "sse": False, ...}
print(caps.response_formats)   # ["json", "verbose_json", "text"]

Error handling

All SDK exceptions inherit from UsfError:

from ultrasafe_asr import (
    UsfClient,
    UsfError,
    AuthenticationError,
    RateLimitError,
    ValidationError,
    ServerError,
    UsfConnectionError,   # canonical name; preferred in new code
)

client = UsfClient()

try:
    client.transcribe("audio.wav")
except AuthenticationError:
    print("Invalid or missing API key.")
except RateLimitError:
    print("Capacity exceeded — back off and retry.")
except ValidationError as e:
    print(f"Bad request: {e}")
except ServerError as e:
    print(f"Server failure (HTTP {e.status_code}): {e}")
except UsfConnectionError as e:
    print(f"Network issue: {e}")
except UsfError as e:
    print(f"Other SDK error: {e}")

UsfConnectionError was introduced in 1.1.0. The older name ConnectionError still works as a backward-compatible alias, but it shadows the Python builtin of the same name — prefer UsfConnectionError in new code.

Configuration

Timeouts

The default per-request timeout is 300 seconds (generous, to accommodate long audio files). Override globally on the client:

client = UsfClient(timeout=60)           # 60 second timeout

Custom HTTP client

Pass your own httpx.Client to control proxies, TLS, retries, or connection pooling:

import httpx
from ultrasafe_asr import UsfClient

http = httpx.Client(
    proxies="http://proxy.internal:8080",
    transport=httpx.HTTPTransport(retries=3),
    timeout=httpx.Timeout(60.0, connect=5.0),
)

client = UsfClient(http_client=http)

When you pass your own http_client, the SDK will not close it for you — you own its lifecycle.

Environment variables

Variable	Effect
`USF_API_KEY`	Default API key. Used when `api_key=` is not passed explicitly.
`USF_BASE_URL`	Default base URL. Used when `base_url=` is not passed explicitly.

Requirements

Python 3.9+
httpx >= 0.27
websockets >= 12.0

Licence

Proprietary. See LICENSE.

API Reference

Get an API key

Base URL

Authentication

Quick start

Endpoints at a glance

Capabilities

Models

File transcription

Request

Voice activity detection (when features.vad.enabled is true)

Audio enhancement (when features.audio_enhancement.enabled is true)

Diarization (when features.diarization.enabled is true)

Speaker separation

Examples

Response (response_format=json)

Response (response_format=verbose_json)

Response (response_format=text)

Python (no SDK, just requests)

JavaScript / Node 18+ (fetch + FormData)

Browser (fetch from a <input type="file">)

Translation

Audio enhancement

Real-time streaming

Connect URL

Frame protocol

Event types

Python (no SDK, raw websockets)

JavaScript / Node (ws package)

Health & readiness

Errors

Suggested retry policy

Rate limits

Privacy & retention

OpenAI compatibility

Support

Get an API key

Installation

Usage

File transcription

Advanced transcription options

Translation

Audio enhancement

Real-time streaming

Streaming event types

Capabilities

Error handling

Configuration

Timeouts

Custom HTTP client

Environment variables

Requirements

Licence

Voice activity detection (when `features.vad.enabled` is true)

Audio enhancement (when `features.audio_enhancement.enabled` is true)

Diarization (when `features.diarization.enabled` is true)

Response (`response_format=json`)

Response (`response_format=verbose_json`)

Response (`response_format=text`)

Python (no SDK, just `requests`)

JavaScript / Node 18+ (`fetch` + `FormData`)

Browser (`fetch` from a `<input type="file">`)

Python (no SDK, raw `websockets`)

JavaScript / Node (`ws` package)