A detailed technical overview describing the architecture, objectives, workflows, and downloadable documentation of the
R2 Mechanics offline transcription system.
Per-project isolation with role-based access (owner/contributor/viewer) and optional 2FA/MFA. No subprocessors. All handovers are logged and signed for provenance.
A defined lifecycle governs all materials: ingest → offline processing → review → delivery → retention/deletion. Retention windows (30 / 60 / 90 days or client-defined) and secure deletion on request ensure full compliance with institutional data policies.
Each project run records exact model versions (WhisperX large-v3, pyannote.audio 4.x), CUDA/Torch stack, configuration hashes and timestamps in a run manifest, guaranteeing deterministic re-runs and audit-grade traceability.
Inputs: WAV/BWF, FLAC, MP3, MP4/MKV/ProRes. Outputs: HTML, DOCX, Markdown (optional PDF or WARC). Supported languages: English (EN), German (DE), French (FR), and Polish (PL); others available on request.
All processing takes place exclusively on R2 Mechanics infrastructure within the European Union (Poland), with no data transfers to third parties or external cloud providers. Data Processing Agreements (AVV/DPA) and Technical & Organizational Measures (TOMs) are available upon request for institutional partners.
Quarterly releases ensure controlled evolution of the system. Each project version remains frozen until explicitly approved for upgrade, maintaining full reproducibility and audit continuity.
R2 Mechanics supports fully offline workflows and can operate under a Non-Disclosure Agreement (NDA) for sensitive or unpublished materials. All processing stages are verifiable and audit-ready, while the operational code remains private for security and integrity reasons.
All downloads and documentation are intended for transparency and auditability. No personal data is collected or processed via this site.
R2 Mechanics operates on a fully offline, modular AI infrastructure built for precision, transparency, and verifiable data sovereignty. Each processing step — from Nextcloud intake to HTML export — runs locally, air-gapped and telemetry-free.
The current architecture combines WhisperX (large-v3),
pyannote.audio (4.x), and local LLM-based analysis (LM Studio / Ollama) within the r2_asr4 environment — powered by CUDA 12.x / Torch 2.4 for maximum GPU throughput, reproducibility, and long-term stability across platforms.
Our technology is not a product — it is the backbone of our service.
All projects begin within a secure Nextcloud environment. Integrity checks and metadata mapping verify input before processing. Optional intake forms pre-define project parameters and automatically trigger the offline workflow after encrypted upload.
GPU-accelerated WhisperX (large-v3) and pyannote.audio (4.x) perform transcription and diarization, followed by local LLM-based semantic analysis for topics, entities, summaries, and multilingual context layers — all within a version-locked environment ensuring full auditability and deterministic re-runs.
Structured, navigable outputs (HTML, DOCX, Markdown) include timestamps, speaker labels, and chapter-based navigation. Optional SDXL image generation enriches chapters with contextual visuals — entirely offline.
Every run produces timestamped logs, WARC-standard archives, and reproducible performance reports — enabling institutional auditability and long-term provenance tracking.
Every system is energy-autonomous, powered by renewable sources with UPS-buffered redundancy for 24/7 operation. Components are modular, field-repairable, and designed for 10-year duty cycles.
Every system uses enterprise-grade NVMe storage for sustained throughput, low latency, and maximum reliability during continuous 24/7 workloads. Powered by renewable sources with UPS-buffered redundancy, all components are modular, field-repairable, and designed for 10-year duty cycles.
Engineering precision, energy independence, and data ethics — the foundation of R2 Mechanics.