Skip to content

Overview

Capabilities

  • ISM-based static and dynamic RIR simulation for 2D/3D shoebox rooms.
  • Directivity patterns (omni, cardioid, hypercardioid, subcardioid, bidir) with per-source/mic orientation handling.
  • Acoustic parameters via beta or t60 (Sabine), optional diffuse tail via tdiff.
  • Dynamic convolution via torchrir.signal.DynamicConvolver (trajectory or hop modes).
  • Explicit scene models via torchrir.models.StaticScene and torchrir.models.DynamicScene.
  • CPU/CUDA/MPS execution with optional torch.compile acceleration for ISM accumulation (when enabled; MPS disables LUT).
  • Standard array geometries (linear, circular, polyhedron, binaural, Eigenmike) and trajectory sampling utilities.
  • Dataset utilities (CMU ARCTIC, LibriSpeech, template stubs) plus DataLoader collate helpers. See Datasets for accepted options, directory layouts, and invalid-input handling.
  • Plotting utilities for static/dynamic scenes and GIF animation.
  • Metadata export helpers for time axis, DOA, array attributes, and trajectories (JSON-ready).
  • Explicit audio metadata I/O container via torchrir.io.AudioData (load_audio_data / save_audio_data).
  • Explicit split between WAV-only and backend-format audio I/O:
    • wav-only: torchrir.io.load_wav / save_wav / info_wav
    • backend-supported formats: torchrir.io.load_audio / save_audio / info_audio
  • Dataset examples can emit per-source reference audio (RIR-convolved premix) and record it in metadata.
  • Unified CLI example with JSON/YAML config and deterministic flag support.

Module layout

  • torchrir.sim: Simulation engines and configuration for RIR generation.
  • torchrir.signal: Signal processing utilities for static and dynamic RIR convolution.
  • torchrir.geometry: Geometry helpers for arrays, trajectories, and sampling.
  • torchrir.viz: Visualization helpers for scenes and trajectories.
  • torchrir.models: Core data models for rooms, sources, microphones, scenes, and results.
  • torchrir.io: I/O helpers for audio files and metadata serialization (wav-only load_wav/save_wav/info_wav, backend-format load_audio/save_audio/info_audio; explicit metadata-preserving audio I/O via torchrir.io.AudioData and torchrir.io.load_audio_data).
  • torchrir.util: General-purpose math, device, and tensor utilities for TorchRIR.
  • torchrir.logging: Logging configuration and helpers.
  • torchrir.config: Simulation configuration objects.
  • torchrir.datasets: Dataset helpers and collate utilities. See Datasets for practical usage guidance.
  • torchrir.experimental: Work-in-progress APIs (ray tracing, FDTD, template datasets).

Device selection

  • device="cpu": CPU execution
  • device="mps": Apple Silicon GPU via Metal (MPS) if available, otherwise fallback to CPU
  • device="cuda": CUDA execution (validated in CI on CUDA runners; requires a CUDA-enabled PyTorch environment)
  • device="auto": backend is selected by internal priority
from torchrir.util import DeviceSpec

device, dtype = DeviceSpec(device="auto").resolve()

Limitations and Failure Modes

  • Experimental ray tracing and FDTD simulators (torchrir.experimental) are placeholders and raise NotImplementedError.
  • Experimental dataset stubs (torchrir.experimental) are not implemented and raise NotImplementedError.
  • torchrir.models.Scene is deprecated; use StaticScene/DynamicScene.
  • DynamicScene normalizes tensor-like trajectories to tensors during initialization.
  • Scene.validate() does not re-emit deprecation warnings.
  • ISMSimulator raises ValueError if max_order or tmax conflicts with the provided SimulationConfig.
  • torchrir.load/save and torchrir.io.load/save/info are deprecated aliases.
  • torchrir.sim.simulate_rir/torchrir.sim.simulate_dynamic_rir require max_order (or torchrir.config.SimulationConfig.max_order) and either nsample or tmax.
  • Non-omni directivity requires orientation; mismatched shapes raise ValueError.
  • beta must have 4 (2D) or 6 (3D) elements; invalid sizes raise ValueError.
  • simulate_dynamic_rir requires src_traj and mic_traj to have matching time steps.
  • torchrir.signal.DynamicConvolver with 3D dynamic RIR input ((T, n_mic, rir_len)) is treated as single-source only; multi-source dynamic convolution must use 4D RIR input ((T, n_src, n_mic, rir_len)).
  • Dynamic simulation currently loops per time step; very long trajectories can be slow.
  • MPS disables the sinc LUT path (falls back to direct sinc), which can be slower and slightly different numerically.
  • HPF requires SciPy and currently applies filtering via CPU-domain processing, which can add host/device transfer overhead on CUDA/MPS runs.
  • Deterministic mode is best-effort; some backends may still be non-deterministic.
  • YAML configs require PyYAML; otherwise a ModuleNotFoundError is raised.
  • Downloading CMU ARCTIC requires network access when download=True; local preloaded datasets are supported with download=False.
  • Dataset option validation and error behavior are documented in Datasets.
  • GIF output requires Pillow (via Matplotlib's animation writer).
  • Dataclass models are frozen but hold mutable tensors (shallow immutability).

Specification (current)

Purpose

  • Provide room impulse response (RIR) simulation on PyTorch with CPU/CUDA/MPS support.
  • Support static and dynamic scenes with a maintainable, modern API.

Room model

  • Shoebox (rectangular) room model.
  • 2D or 3D.
  • Image Source Method (ISM) implementation.

Inputs

  • Room size: [Lx, Ly, Lz] (2D uses [Lx, Ly]).
  • Source positions: (n_src, dim).
  • Microphone positions: (n_mic, dim).
  • Reflection order: max_order.
  • Sample rate: fs.
  • Speed of sound: c (default 343.0 m/s).
  • Wall reflection coefficients: beta (4 faces for 2D, 6 for 3D) or t60 (Sabine).
  • Output length: nsample or tmax.

Outputs

  • Static RIR shape: (n_src, n_mic, nsample).
  • Dynamic RIR shape: (T, n_src, n_mic, nsample).
  • Preserves dtype/device.