UltraSafe ASR

API Reference

Real-time, production-grade speech-to-text. Stream microphone input over WebSockets with sub-second latency, transcribe audio files, and translate speech — over a small, OpenAI-compatible REST + WebSocket surface. Plug it into any stack in minutes, no SDK required.

Prefer Python? The official SDK wraps every endpoint below with a fully-typed, async-friendly client — same model, same accuracy, same guarantees. Its full guide lives further down this same page.

Get an API key

You'll need an API key to use the API. Request one from your UltraSafe AI contact. Keys look like:

usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your key secret — anyone with it can make transcription requests against your account. Don't commit it to source control or paste it into client-side code.

Base URL

All endpoints are served under a single base URL. For the current beta:

https://api-prod-usf.us.inc

For WebSocket endpoints (only /v1/audio/transcriptions/stream so far) replace https:// with wss://:

wss://api-prod-usf.us.inc

Authentication

Every authenticated REST request needs an API key in the Authorization header:

Authorization: Bearer usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The WebSocket endpoint accepts the key as a query-string parameter instead — see Real-time streaming below.

A few endpoints are public (no key needed): GET /health, GET /ready, and GET /v1/capabilities.

Quick start

# 1. Set your key + base URL once
export USF_API_KEY="usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export USF_BASE_URL="https://api-prod-usf.us.inc"

# 2. Transcribe a file
curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@audio.wav \
  -F model=usf-mini-asr

Response:

{
  "text": "Hello, this is a test of the UltraSafe ASR transcription service.",
  "duration": 4.13,
  "processing_time_s": 0.31,
  "model_inference_s": 0.142,
  "timing": {
    "total_s": 0.31,
    "model_inference_ms": 141.8,
    "audio_decode_ms": 1.0
  }
}

That's the whole flow for a basic call. The rest of this document covers every parameter and every endpoint in detail.


Endpoints at a glance

MethodPathAuthStatusPurpose
GET/healthnoneLiveLiveness — is the proxy alive?
GET/readynoneLiveReadiness — is upstream reachable?
GET/v1/capabilitiesnoneLiveFeature flags and model info for this deployment
GET/v1/modelsbearerLiveList available models
GET/v1/models/{id}bearerLiveInspect a single model
POST/v1/audio/transcriptionsbearerLiveTranscribe an audio file
POST/v1/audio/translationsbearerLiveTranslate audio to English text
POST/v1/audio/enhancebearerUnder developmentStandalone audio enhancement (denoise, BWE) with optional transcript — currently returns 503
WS/v1/audio/transcriptions/streamquery-string keyLiveReal-time streaming transcription

Status legend. Live — fully working in production today. Under development — the endpoint is reachable and the request shape is stable, but the underlying capability is not yet enabled on the production deployment. Sections below flag any individual fields that share the same status.


Capabilities

Not every deployment has every feature enabled. Call this first to see what's available. Public — no key required.

curl -sS "$USF_BASE_URL/v1/capabilities"

Response (truncated):

{
  "model": { "id": "usf-mini-asr", "language": "en" },
  "features": {
    "vad": { "enabled": true },
    "noise_reduction": { "enabled": false, "level": "medium" },
    "audio_enhancement": { "enabled": false },
    "diarization": { "enabled": false, "method": "clustering" },
    "speaker_identification": { "enabled": false }
  },
  "streaming": { "websocket": true, "sse": false },
  "response_formats": ["json", "verbose_json", "text"]
}

If features.diarization.enabled is false, passing diarization=clustering to a transcription call will return a 400 error. Always check first.


Models

curl -sS "$USF_BASE_URL/v1/models" \
  -H "Authorization: Bearer $USF_API_KEY"
{
  "data": [
    {
      "id": "usf-mini-asr",
      "object": "model",
      "owned_by": "ultrasafe",
      "created": 1714000000
    }
  ],
  "object": "list"
}

Or fetch a single model by id:

curl -sS "$USF_BASE_URL/v1/models/usf-mini-asr" \
  -H "Authorization: Bearer $USF_API_KEY"

File transcription

POST /v1/audio/transcriptions — converts an audio file into text.

Request

FieldTypeWhereDefaultDescription
filebinarymultipartrequiredThe audio file. WAV, MP3, FLAC, M4A, OGG. Buffered into memory before sending.
modelstringmultipartusf-mini-asrModel id (see /v1/models).
response_formatstringmultipartjsonOne of json, verbose_json, text.
languagestringmultipartauto-detectISO 639-1 code, e.g. en, es, fr.
promptstringmultipartBias the decoder with domain words ("Medical terminology.").
temperaturenumbermultipart0Sampling temperature, 0 to 1.
timestamp_granularitiesjson arraymultipartSubset of ["segment", "word"] (JSON-encoded). Under development — the parameter is accepted but segments / words are not yet returned in the response.

Voice activity detection (when features.vad.enabled is true)

FieldTypeDefaultDescription
enable_vadbooleanserverForce VAD on or off.
vad_thresholdnumber0.5Probability threshold for "speech."
vad_min_speech_duration_msint250Drop speech segments shorter than this.
vad_min_silence_duration_msint300Merge across silences shorter than this.

Audio enhancement (when features.audio_enhancement.enabled is true)

Under development noise_reduction, enable_background_suppression, enable_voice_extraction, and enable_audio_upscale are listed as capabilities by /v1/capabilities but are not yet wired up on the production deployment. The fields are accepted by the API for forward compatibility — currently they have no effect. VAD (above) and diarization (below) are live.

FieldTypeDescription
noise_reductionstringoff, low, medium, high.
enable_background_suppressionbooleanDrop background noise before ASR.
enable_voice_extractionbooleanIsolate the dominant voice.
enable_audio_upscalebooleanBandwidth extension before ASR.

Diarization (when features.diarization.enabled is true)

FieldTypeDescription
diarizationstringoff, pyannote, clustering, spectral.
num_speakersintHint: exact number of speakers.
min_speakersintHint: minimum.
max_speakersintHint: maximum.

Speaker separation

FieldTypeDescription
enable_speaker_separationbooleanSplit overlapping speakers before ASR.
speaker_similarity_thresholdnumber0–1; lower = more aggressive splits.

Examples

Minimal:

curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@audio.wav \
  -F model=usf-mini-asr

With language hint and a prompt:

curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@meeting.wav \
  -F model=usf-mini-asr \
  -F language=en \
  -F prompt="Quarterly earnings call. Acme Corp." \
  -F response_format=verbose_json

With VAD enabled:

curl -sS "$USF_BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@audio.wav \
  -F model=usf-mini-asr \
  -F response_format=verbose_json \
  -F enable_vad=true

Word- and segment-level timestamps are under development. The timestamp_granularities parameter is accepted, but the batch endpoint does not yet emit segments / words in the response. Per-segment timestamps are available today on the real-time streaming endpoint (the transcript event includes segment.start / segment.end / confidence).

Response (response_format=json)

{
  "text": "Hello, this is a test of the UltraSafe ASR transcription service.",
  "duration": 4.13,
  "processing_time_s": 0.31,
  "model_inference_s": 0.142,
  "timing": {
    "total_s": 0.31,
    "model_inference_ms": 141.8
  }
}

Response (response_format=verbose_json)

Adds detected language, audio duration, and timing breakdown:

{
  "task": "transcribe",
  "text": "Hello, this is a test of the UltraSafe ASR transcription service.",
  "language": "en",
  "duration": 4.13,
  "processing_time_s": 0.31,
  "model_inference_s": 0.142,
  "timing": {
    "total_s": 0.31,
    "model_inference_ms": 141.8
  }
}

segments and words are under development for the batch endpoint and will be added in a future release. If you need per-segment timestamps today, use the real-time streaming endpoint — every transcript event includes a segment block with start, end, and confidence.

Response (response_format=text)

Plain-text body, no JSON wrapping:

Hello, this is a test of the UltraSafe ASR transcription service.

Python (no SDK, just requests)

import os, requests

url = f"{os.environ['USF_BASE_URL']}/v1/audio/transcriptions"
headers = {"Authorization": f"Bearer {os.environ['USF_API_KEY']}"}

with open("audio.wav", "rb") as fh:
    r = requests.post(
        url,
        headers=headers,
        files={"file": ("audio.wav", fh, "application/octet-stream")},
        data={"model": "usf-mini-asr", "response_format": "json"},
        timeout=300,
    )
r.raise_for_status()
print(r.json()["text"])

JavaScript / Node 18+ (fetch + FormData)

import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("audio.wav")]), "audio.wav");
form.append("model", "usf-mini-asr");
form.append("response_format", "json");

const r = await fetch(`${process.env.USF_BASE_URL}/v1/audio/transcriptions`, {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.USF_API_KEY}` },
  body: form,
});
if (!r.ok) throw new Error(`HTTP ${r.status}: ${await r.text()}`);
console.log((await r.json()).text);

Browser (fetch from a <input type="file">)

const file = document.querySelector("#audio").files[0];
const form = new FormData();
form.append("file", file);
form.append("model", "usf-mini-asr");

const r = await fetch("/api-proxy/v1/audio/transcriptions", { // call YOUR backend
  method: "POST",
  body: form,
});
const { text } = await r.json();

Don't put the API key in browser code. Front your API with your own backend that injects the Authorization header.


Translation

POST /v1/audio/translations — transcribes audio in any supported language and returns English text.

Same request shape as transcription, but the language parameter is ignored (the model auto-detects the source language).

curl -sS "$USF_BASE_URL/v1/audio/translations" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@french_audio.wav \
  -F model=usf-mini-asr
{
  "text": "Hello everyone, welcome to today's meeting.",
  "duration": 5.12
}

Audio enhancement

Under development POST /v1/audio/enhance is documented for reference but is not currently enabled on the production API — calls return 503 Service Unavailable until the upstream upscale service is turned on. For voice activity detection, use enable_vad on /v1/audio/transcriptions; for live VAD events, use the streaming endpoint (it emits speech_activity events).

POST /v1/audio/enhance — standalone audio enhancement pipeline. Returns enhanced audio (base64-encoded WAV by default) with optional transcripts of the before/after.

FieldTypeDescription
filebinarySource audio (multipart).
modelstringModel id; usf-mini-asr works for the default pipeline.
enable_denoisebooleanRun the denoiser.
enable_bwebooleanBandwidth extension (upscale narrowband audio to 16 kHz/24 kHz).
output_formatstringwav (default) or pcm.
output_sample_rateintTarget sample rate of the enhanced audio.
include_transcriptionbooleanAlso transcribe the enhanced audio and return both texts.
curl -sS "$USF_BASE_URL/v1/audio/enhance" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@noisy.wav \
  -F model=usf-mini-asr \
  -F enable_denoise=true \
  -F enable_bwe=true \
  -F include_transcription=true
{
  "audio_base64": "UklGRi…",
  "audio_format": "wav",
  "transcription": {
    "text": "Hello everyone."
  }
}

To save the enhanced audio:

curl -sS "$USF_BASE_URL/v1/audio/enhance" \
  -H "Authorization: Bearer $USF_API_KEY" \
  -F file=@noisy.wav \
  -F enable_denoise=true \
  -F enable_bwe=true \
  | jq -r .audio_base64 | base64 -d > clean.wav

Real-time streaming

WSS /v1/audio/transcriptions/stream — open a WebSocket, push raw PCM audio chunks, and receive live transcripts as they're produced.

Connect URL

The streaming endpoint takes the API key in the query string (most browser/CLI WebSocket clients can't send custom headers easily).

wss://api-prod-usf.us.inc/v1/audio/transcriptions/stream
    ?api_key=usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    &model=usf-mini-asr
    &sample_rate=16000
    &audio_format=pcm_s16le
Query paramDefaultDescription
api_keyrequiredYour usf_… key.
modelusf-mini-asrModel id.
sample_rate16000Audio sample rate in Hz (8000–48000 typical).
audio_formatpcm_s16leOne of pcm_s16le, pcm_f32le, wav.

Privacy note: the API key appears in the URL, which means it'll show up in any reverse-proxy access log between you and the server. Treat your key as a secret and don't paste streaming URLs into shared logs / chat / etc.

Frame protocol

  • Client → server: raw audio bytes (binary frames). Each frame is one chunk of PCM at the rate/format you negotiated in the URL. ~100 ms per chunk is a good default.
  • Server → client: JSON text frames, one event per frame, each tagged with a type field.

When you're done sending audio, send the JSON control message:

{ "type": "control", "action": "stop" }

The server finalises, emits a done event, and (a few hundred ms later) a higher-quality retranscribe_result event before closing the connection.

Event types

typeSent whenUseful fields
readyImmediately after the handshake.
speech_activityVoice-activity detector flips on/off.is_speech
transcriptA new interim or finalised segment.is_final, segment.text, segment.start, segment.end
doneAll audio processed (real-time pass complete).response.text, response.duration
retranscribe_resultBest-quality full-context re-pass complete.response.full_context_text
errorSomething went wrong.error.message, error.code

The lifecycle is normally:

ready
  ↓ (one or more)
speech_activity → transcript (is_final=false) → transcript (is_final=true)
  ↓
done
  ↓
retranscribe_result   ← prefer this for the final transcript

Python (no SDK, raw websockets)

import asyncio, json, wave, websockets

URL = (
    "wss://api-prod-usf.us.inc/v1/audio/transcriptions/stream"
    "?api_key=usf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    "&model=usf-mini-asr&sample_rate=16000&audio_format=pcm_s16le"
)

async def stream(path: str) -> None:
    with wave.open(path, "rb") as w:
        sr, sw = w.getframerate(), w.getsampwidth()
        pcm = w.readframes(w.getnframes())

    async with websockets.connect(URL, max_size=None) as ws:
        chunk = int(sr * 0.1) * sw  # 100ms chunks
        for i in range(0, len(pcm), chunk):
            await ws.send(pcm[i:i+chunk])
        await ws.send(json.dumps({"type": "control", "action": "stop"}))

        async for raw in ws:
            evt = json.loads(raw)
            t = evt.get("type")
            if t == "transcript" and evt.get("segment", {}).get("is_final"):
                print("FINAL:", evt["segment"]["text"])
            elif t == "retranscribe_result":
                print("BEST:", evt["response"]["full_context_text"])
                break

asyncio.run(stream("audio.wav"))

JavaScript / Node (ws package)

import WebSocket from "ws";
import fs from "node:fs";

const url = `wss://api-prod-usf.us.inc/v1/audio/transcriptions/stream`
  + `?api_key=${process.env.USF_API_KEY}`
  + `&model=usf-mini-asr&sample_rate=16000&audio_format=pcm_s16le`;

const ws = new WebSocket(url);

ws.on("open", () => {
  const pcm = fs.readFileSync("audio.pcm"); // raw 16-bit mono PCM @ 16 kHz
  const chunkBytes = 16000 * 2 * 0.1; // 100ms
  for (let i = 0; i < pcm.length; i += chunkBytes) {
    ws.send(pcm.slice(i, i + chunkBytes));
  }
  ws.send(JSON.stringify({ type: "control", action: "stop" }));
});

ws.on("message", (raw) => {
  const evt = JSON.parse(raw.toString());
  if (evt.type === "retranscribe_result") {
    console.log("FINAL:", evt.response.full_context_text);
    ws.close();
  }
});

Health & readiness

curl -sS "$USF_BASE_URL/health"
# → {"status":"healthy"}

curl -sS "$USF_BASE_URL/ready"
# → {"status":"ready","upstream_status":200}

/health reflects the proxy itself; /ready additionally probes the upstream. Both are public (no key) and safe to use in load-balancer health checks.


Errors

The API returns conventional HTTP status codes and an OpenAI-style error envelope. Inspect the body to tell apart "you sent something wrong" from "the server is having a bad day."

HTTPMeaningCommon causes
400Validation errorUnsupported model, missing file, bad query params, feature disabled in this deployment.
401Missing or invalid keyWrong Authorization header, key disabled, key revoked.
403ForbiddenAuth header missing on a protected endpoint.
413Payload too largeFile exceeds the deployment's per-request limit. Split it.
429Rate-limitedSlow down and retry with backoff.
5xxServer-side failureUpstream broken, GPU OOM, transient. Retry with exponential backoff.

Example error body:

{
  "detail": {
    "error": {
      "message": "The specified model is not available.",
      "type": "invalid_request_error",
      "param": "model",
      "code": "model_not_found"
    }
  }
}

For some endpoints the envelope is flatter ({"error": "..."}); always check the HTTP status code first, then look at the body for context.

Suggested retry policy

  • Transport-level errors (TCP reset, TLS handshake failed, DNS): retry up to 3 times with exponential backoff (250 ms, 500 ms, 1 s).
  • HTTP 5xx: same retry budget. Use jitter to avoid thundering-herd retries when a brief upstream blip recovers.
  • HTTP 429: read Retry-After if present; otherwise back off ≥ 1 s.
  • HTTP 4xx other than 429: do not retry. Fix the request.

Rate limits

There is no hard rate limit during the beta; we rely on per-key quotas configured server-side. If you see sustained 429s, contact us and we'll raise your quota or investigate.

Privacy & retention

  • Audio is processed in-memory and not stored on the server.
  • Transcripts are not persisted.
  • Logs record user_label, IP, path, status, and duration on every request — no audio, no transcript content. We use these for billing and debugging.

OpenAI compatibility

The endpoints under /v1/audio/* are intentionally close to OpenAI's Whisper API surface (same paths, mostly the same form fields, very similar response shapes). Most existing OpenAI Whisper code can be pointed at this API by changing the base_url and api_key.

Differences worth knowing:

  • The streaming endpoint (/v1/audio/transcriptions/stream) is WebSocket-based, not SSE.
  • Some optional parameters (enable_vad, diarization, enable_denoise, …) are UltraSafe-specific and have no OpenAI equivalent.
  • model ids are different (usf-mini-asr, …).

Support

For API-key requests, increased quotas, or any issue with the API, contact your UltraSafe AI account manager.


Python SDK

BETA

The Python SDK is a typed convenience wrapper around the same endpoints documented above. It's currently in beta — public method signatures may still change before v2.0, so pin a specific version in your requirements. If you need a fully stable surface today, use the HTTP API directly.

PyPI-style License

The official Python client for the UltraSafe AI ASR API. Transcribe audio files, stream microphone audio in real time, and translate speech — all with a small, fully-typed client.

  • Synchronous REST client for file transcription, translation, and enhancement
  • Async WebSocket client for real-time streaming
  • Typed responses and a clean exception hierarchy
  • OpenAI-compatible under the hood

Get an API key

You'll need an API key to use this SDK. Request one from your UltraSafe AI account manager or sign in to the dashboard. Keys look like:

sk-usf-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your key secret — anyone with it can make transcription requests against your account. Don't commit it to source control.

Installation

The SDK is published as a wheel served from a CloudFront-backed S3 bucket. Install (or upgrade) in one line:

pip install https://d3a9v9y9w3meag.cloudfront.net/wheels/1.1.0/ultrasafe_asr-1.1.0-py3-none-any.whl

If you've been sent the wheel file directly (e.g. for an air-gapped environment) or you built it locally from dist/, install from the path:

pip install ./ultrasafe_asr-1.1.0-py3-none-any.whl

Requires Python 3.9+. The only runtime dependencies are httpx and websockets, both installed automatically.

Usage

from ultrasafe_asr import UsfClient

client = UsfClient(
    api_key="sk-usf-...",                       # your API key
    base_url="https://api-prod-usf.us.inc",     # your ASR server URL
)

result = client.transcribe("audio.wav")
print(result.text)

Or configure via environment variables and skip the constructor arguments:

export USF_API_KEY="sk-usf-..."
export USF_BASE_URL="https://api-prod-usf.us.inc"
from ultrasafe_asr import UsfClient

client = UsfClient()
print(client.transcribe("audio.wav").text)

The client can be used as a context manager to ensure the HTTP connection is closed cleanly:

with UsfClient() as client:
    print(client.transcribe("audio.wav").text)

File transcription

result = client.transcribe(
    "audio.wav",
    model="usf-mini-asr",
    response_format="verbose_json",     # json | text | verbose_json
    language="en",
    prompt="Medical terminology.",
    temperature=0.0,
)

print(result.text)           # "Hello, this is a test..."
print(result.language)       # "en"
print(result.duration)       # 4.13 (seconds)
print(result.timing)         # {"total_s": 0.31, "model_inference_ms": 308.7, ...}
print(result.raw)            # full server payload

The file argument accepts a path (str / pathlib.Path) or an open binary file-like object:

with open("audio.wav", "rb") as fh:
    result = client.transcribe(fh)

Advanced transcription options

All optional server parameters are exposed as keyword arguments. Check client.capabilities() first — most of these require the feature to be enabled on the deployment.

result = client.transcribe(
    "audio.wav",
    # Voice-activity detection
    enable_vad=True,
    vad_threshold=0.5,
    vad_min_speech_duration_ms=250,
    vad_min_silence_duration_ms=300,
    # Enhancement
    noise_reduction="medium",            # off | low | medium | high
    enable_background_suppression=True,
    enable_voice_extraction=True,
    enable_audio_upscale=True,
    # Diarization
    diarization="clustering",            # off | pyannote | clustering | spectral
    num_speakers=2,
    # Speaker separation
    enable_speaker_separation=True,
    speaker_similarity_threshold=0.7,
)

Under development The four enhancement parameters (noise_reduction, enable_background_suppression, enable_voice_extraction, enable_audio_upscale) are accepted by the API for forward compatibility but are not yet wired up on the production deployment — passing them today is a no-op. VAD and diarization are live.

Translation

Translate audio in any supported language to English text:

result = client.translate("french_audio.wav")
print(result.text)

Audio enhancement

Under development client.enhance(...) calls POST /v1/audio/enhance, which currently returns 503 Service Unavailable on the production deployment. The SDK method below is documented for reference — it will start working as soon as the upstream upscale service is enabled. Until then, use VAD and diarization parameters on client.transcribe(...) for the closest in-pipeline equivalent.

Run the standalone enhancement pipeline. Returns enhanced audio (base64-encoded WAV) with an optional transcription of the before/after:

result = client.enhance(
    "noisy.wav",
    enable_denoise=True,
    enable_bwe=True,
    include_transcription=True,
)

result.save("cleaned.wav")
print(result.transcription.text if result.transcription else "(no transcription)")

Real-time streaming

Stream audio to the ASR server over a WebSocket and receive interim and final transcripts as they are produced. Uses asyncio.

import asyncio
from ultrasafe_asr import UsfClient, TranscriptEvent, RetranscribeEvent

async def main():
    client = UsfClient()
    async with client.stream() as session:
        async for event in session.send_file("audio.wav"):
            if isinstance(event, TranscriptEvent) and event.is_final:
                print("FINAL:", event.text)
            elif isinstance(event, RetranscribeEvent):
                print("Best-quality:", event.text)

asyncio.run(main())

Push your own PCM chunks (e.g. from a microphone):

async with client.stream(sample_rate=16000, audio_format="pcm_s16le") as session:
    for chunk in mic_chunks:             # bytes of 16-bit mono PCM
        await session.send_bytes(chunk)
    await session.stop()
    final_transcript = await session.final_text()

Streaming event types

EventEmitted whenKey fields
ReadyEventImmediately after the WebSocket handshake
SpeechActivityEventVAD detects speech start/stopis_speech
TranscriptEventInterim or finalized transcript segmentis_final, text, committed_text, full_text
DoneEventAll audio processed (real-time path)text, duration
RetranscribeEventBest-quality full-context re-pass completedtext
ErrorEventServer-side errormessage, code

Prefer RetranscribeEvent.text as the final transcript — it's a full-context re-pass with higher accuracy than the real-time DoneEvent.text.

Capabilities

Not every deployment has every feature enabled. Check before you build:

caps = client.capabilities()

if caps.feature_enabled("diarization"):
    result = client.transcribe("audio.wav", diarization="clustering")

print(caps.model)           # {"id": "usf-mini-asr", "language": "en"}
print(caps.streaming)       # {"websocket": True, "sse": False, ...}
print(caps.response_formats)   # ["json", "verbose_json", "text"]

Error handling

All SDK exceptions inherit from UsfError:

from ultrasafe_asr import (
    UsfClient,
    UsfError,
    AuthenticationError,
    RateLimitError,
    ValidationError,
    ServerError,
    UsfConnectionError,   # canonical name; preferred in new code
)

client = UsfClient()

try:
    client.transcribe("audio.wav")
except AuthenticationError:
    print("Invalid or missing API key.")
except RateLimitError:
    print("Capacity exceeded — back off and retry.")
except ValidationError as e:
    print(f"Bad request: {e}")
except ServerError as e:
    print(f"Server failure (HTTP {e.status_code}): {e}")
except UsfConnectionError as e:
    print(f"Network issue: {e}")
except UsfError as e:
    print(f"Other SDK error: {e}")

UsfConnectionError was introduced in 1.1.0. The older name ConnectionError still works as a backward-compatible alias, but it shadows the Python builtin of the same name — prefer UsfConnectionError in new code.

Configuration

Timeouts

The default per-request timeout is 300 seconds (generous, to accommodate long audio files). Override globally on the client:

client = UsfClient(timeout=60)           # 60 second timeout

Custom HTTP client

Pass your own httpx.Client to control proxies, TLS, retries, or connection pooling:

import httpx
from ultrasafe_asr import UsfClient

http = httpx.Client(
    proxies="http://proxy.internal:8080",
    transport=httpx.HTTPTransport(retries=3),
    timeout=httpx.Timeout(60.0, connect=5.0),
)

client = UsfClient(http_client=http)

When you pass your own http_client, the SDK will not close it for you — you own its lifecycle.

Environment variables

VariableEffect
USF_API_KEYDefault API key. Used when api_key= is not passed explicitly.
USF_BASE_URLDefault base URL. Used when base_url= is not passed explicitly.

Requirements

  • Python 3.9+
  • httpx >= 0.27
  • websockets >= 12.0

Licence

Proprietary. See LICENSE.