Examples

Choosing a dynamic time reference

Motion	`time_reference`	RIR frame boundary
Moving source, fixed microphones	`"emission"`	Input/emission time
Fixed sources, moving microphones	`"observation"`	Output/observation time
Moving source and microphones	Not supported	Requires a two-time retarded propagation model

DynamicConvolver requires the time reference explicitly. Frame boundaries are a separate FrameSchedule: use uniform, fixed_hop, from_samples, or from_seconds. Attach it as DynamicScene.schedule to let the resulting RIRResult provide it automatically, or omit it from the scene and pass it to convolve(..., schedule=...). A schedule made with from_seconds must use the same sample rate as the room. When an RIRResult is supplied, TorchRIR also rejects incompatible motion/time-reference combinations and competing scene and call-level schedules. A raw RIR tensor has no scene metadata, so those checks are unavailable. See Dynamic convolution time conventions for the equations and schedule contract.

Every dynamic example creates the schedule before its geometry. It derives progress = schedule.normalized_progress(stop_sample=...) and passes that explicit grid to linear_trajectory(..., progress=progress). Thus geometry frame i belongs to the exact boundary schedule.starts[i]; the nominal path endpoint at progress one is not applied one frame early. In observation-time examples, the last sampled microphone geometry is held through the convolution tail.

Static CMU ARCTIC (fixed sources, fixed microphones)

This example mixes multiple CMU ARCTIC utterances using a static ISM RIR and produces a multi-microphone output (default: binaural).

Key arguments

--num-sources: number of source speakers to mix.
--num-mics: number of microphones in the fixed array.
--duration: length (seconds) of each source signal.
--order: ISM reflection order.
--tmax: RIR length in seconds.
--room: room size (Lx Ly Lz).
--plot: save layout plots and GIFs.
--out-dir: output directory for WAV/metadata/plots/GIFs.

Example runs

uv run python examples/static.py --num-sources 1 --duration 5 --plot

Expected outputs:

static.wav
static_ref01.wav, static_ref02.wav, ...
static_metadata.json
ATTRIBUTION.txt
static_static_2d.png (and static_static_3d.png if 3D)
static.gif (and static_3d.gif if 3D)

uv run python examples/static.py --order 12 --tmax 0.6 --device auto

Expected outputs:

static.wav
static_ref01.wav, static_ref02.wav, ...
static_metadata.json
ATTRIBUTION.txt

Dynamic CMU ARCTIC (moving sources, fixed microphones)

This example generates moving source trajectories while the microphones remain fixed. It uses time_reference="emission": each schedule boundary selects the RIR for input samples emitted in that interval, and each segment's convolution tail continues into later output intervals.

The script uses save_scene_plots and save_scene_gifs for visualization output.

Key arguments

--steps: number of RIR time steps for the trajectory.
--order: ISM reflection order.
--tmax: RIR length in seconds.
--out-dir: output directory for WAV/metadata/plots/GIFs.

Example runs

uv run python examples/dynamic_src.py --steps 24 --plot

Expected outputs:

dynamic_src.wav
dynamic_src_ref01.wav, dynamic_src_ref02.wav, ...
dynamic_src_metadata.json
ATTRIBUTION.txt
dynamic_src_static_2d.png / dynamic_src_dynamic_2d.png
dynamic_src.gif (and dynamic_src_3d.gif if 3D)

uv run python examples/dynamic_src.py --num-sources 3 --duration 8 --order 10

Expected outputs:

dynamic_src.wav
dynamic_src_ref01.wav, dynamic_src_ref02.wav, ...
dynamic_src_metadata.json
ATTRIBUTION.txt

Dynamic CMU ARCTIC (fixed sources, moving microphones)

This example keeps sources fixed and moves the microphone array along a linear path. It uses DynamicConvolver(time_reference="observation"), which selects the piecewise RIR by output/observation time. The final frame remains active through the complete convolution tail. Emission-time overlap-add is not valid for a moving receiver.

Key arguments

--steps: number of RIR time steps for the trajectory.
--plot: save layout plots and GIFs.
--out-dir: output directory for WAV/metadata/plots/GIFs.

Example runs

uv run python examples/dynamic_mic.py --steps 20 --plot

Expected outputs:

dynamic_mic.wav
dynamic_mic_ref01.wav, dynamic_mic_ref02.wav, ...
dynamic_mic_metadata.json
ATTRIBUTION.txt
dynamic_mic_static_2d.png / dynamic_mic_dynamic_2d.png
dynamic_mic.gif (and dynamic_mic_3d.gif if 3D)

uv run python examples/dynamic_mic.py --order 12 --tmax 0.6 --device auto

Expected outputs:

dynamic_mic.wav
dynamic_mic_ref01.wav, dynamic_mic_ref02.wav, ...
dynamic_mic_metadata.json
ATTRIBUTION.txt

Unified CLI (static/dynamic)

The unified CLI wraps the three scenarios above and supports JSON/YAML configuration files. JSON uses the core installation. YAML requires pip install "torchrir[cli]".

Key arguments

--mode: static, dynamic_src, or dynamic_mic.
--config-in: load settings from JSON/YAML.
--config-out: write current settings to JSON/YAML.
--deterministic: enable deterministic kernels (best-effort).
--out-dir: output directory for WAV/metadata/plots/GIFs.

Example runs

uv run python examples/cli.py --mode static --plot

Expected outputs:

static_binaural.wav
static_binaural_metadata.json
ATTRIBUTION.txt
static_static_2d.png (and 3D variant if room is 3D)

uv run python examples/cli.py --mode dynamic_src --gif --steps 24

Expected outputs:

dynamic_src_binaural.wav
dynamic_src_binaural_metadata.json
ATTRIBUTION.txt
dynamic_src.gif (and 3D variant if room is 3D)

uv run python examples/cli.py --mode dynamic_mic --gif --steps 24

Expected outputs:

dynamic_mic_binaural.wav
dynamic_mic_binaural_metadata.json
ATTRIBUTION.txt
dynamic_mic.gif (and 3D variant if room is 3D)

Benchmark (CPU vs GPU)

This script benchmarks static ISM and, optionally, a fixed-source, moving- microphone simulation with observation-time convolution.

Key arguments

--repeats: number of iterations to average.
--gpu: cuda, mps, or auto.
--dynamic: benchmark the moving-microphone simulation and observation-time convolution as well.

Note

CUDA paths are validated in CI on CUDA runners. Runtime and numerical behavior still depend on your local CUDA/PyTorch environment.

MP4 output requires a system ffmpeg. If audio is muxed into a video, also install torchrir[audio].

Example runs

uv run python examples/benchmark_device.py --repeats 10 --gpu auto

Expected output (logs):

cpu avg: ... ms
<device> avg: ... ms
speedup: ...x

uv run python examples/benchmark_device.py --dynamic --repeats 5 --gpu mps

Expected output (logs):

cpu dynamic avg: ... ms
mps dynamic avg: ... ms
speedup: ...x

Dynamic dataset builder (fixed room, fixed mic, moving sources)

This example generates a small dynamic dataset inspired by Cross3D: the room and mic array are fixed, while source positions and trajectories are randomized per scene. Each scene produces a convolved mixture and metadata. You can choose CMU ARCTIC or LibriSpeech from the command line.

What it does

Uses CMU ARCTIC or LibriSpeech utterances as source signals.
Samples random source trajectories (linear or zigzag) within a fixed room.
Keeps the microphone array fixed across all scenes.
Simulates dynamic RIRs and convolves the sources.
Saves mixture + per-source reference audio, plus JSON metadata (and plots/GIFs if enabled).

Output files

For each scene index k:

scene_k.wav — multi-microphone mixture
scene_k_refXX.wav — per-source reference audio after RIR convolution (premix)
scene_k_metadata.json — room size, trajectories, DOA, array attributes, etc.
ATTRIBUTION.txt — dataset attribution and redistribution note for the run
scene_k_static_2d.png / scene_k_dynamic_2d.png — layout plots (3D variants are saved when the room is 3D)
scene_k.gif — animation (and scene_k_3d.gif when the room is 3D)

Run (CMU ARCTIC)

uv run python examples/build_dynamic_dataset.py \
  --dataset cmu_arctic \
  --num-scenes 4 \
  --num-sources 2 \
  --duration 6

Run (LibriSpeech)

uv run python examples/build_dynamic_dataset.py \
  --dataset librispeech \
  --subset train-clean-100 \
  --num-scenes 4 \
  --num-sources 2 \
  --duration 6

Additional examples

# CMU ARCTIC: only 1 moving source, plotting enabled
uv run python examples/build_dynamic_dataset.py \
  --dataset cmu_arctic \
  --num-scenes 2 \
  --num-sources 3 \
  --num-moving-sources 1 \
  --plot

# LibriSpeech: more steps, fewer scenes
uv run python examples/build_dynamic_dataset.py \
  --dataset librispeech \
  --subset dev-clean \
  --num-scenes 2 \
  --num-sources 2 \
  --steps 96

Key arguments

--dataset: dataset backend (cmu_arctic / librispeech).
--subset: LibriSpeech subset (e.g., train-clean-100).
--num-scenes: number of scenes to generate.
--num-sources: number of sources per scene.
--num-moving-sources: number of sources that move (others stay fixed).
--num-mics: number of microphones in the fixed array.
--duration: target duration (seconds) of each source signal.
--steps: number of RIR steps (trajectory resolution).
--order: ISM reflection order.
--tmax: RIR length in seconds.
--seed: RNG seed for reproducibility.
--dataset-dir: dataset root path.
--out-dir: output directory for per-scene WAV/JSON/plots/GIFs.
--plot: enable plotting + GIFs (default: off).
--download: explicitly authorize dataset download when files are missing.
--device: cpu/cuda/mps/auto.

Dataset option validity and error handling

--dataset accepts only cmu_arctic or librispeech (argparse choices).
--subset is only used for librispeech; unsupported values raise ValueError in LibriSpeechDataset.
--dataset-dir is the dataset root passed to the loader.
CMU ARCTIC expects ARCTIC/cmu_us_<speaker>_arctic/... under that root.
LibriSpeech expects LibriSpeech/<subset>/<speaker>/<chapter>/... under that root.
Without --download, a missing or incomplete dataset raises FileNotFoundError and no network request is made. For offline runs, pre-populate --dataset-dir with at least one canonical transcript/audio pair per selected speaker.
In LibriSpeech mode, malformed utterance IDs passed to load_audio (not in speaker-chapter-utterance format) raise ValueError.

Note

cuda is available and validated in CI. Actual runtime behavior still depends on your local CUDA/PyTorch environment.

Implementation notes

The example is implemented in examples/build_dynamic_dataset.py and uses:

torchrir.datasets.load_dataset_sources to build fixed-length signals from multiple utterances.
torchrir.sim.simulate with a DynamicScene to generate an RIRResult.
torchrir.signal.DynamicConvolver(time_reference="emission") with one reusable integer-sample FrameSchedule, attached to DynamicScene, to produce the final mixture.
save_scene_audio + save_result_metadata to store audio and result metadata. Metadata includes a reference_audio list describing the saved scene_k_refXX.wav files (each entry corresponds to a single source convolved with its dynamic RIR), plus dataset_license and modifications fields for attribution tracking.

Additional example

uv run python examples/build_dynamic_dataset.py --dataset cmu_arctic --num-scenes 2 --out-dir outputs/ds_small

Expected outputs:

outputs/ds_small/scene_000.wav, scene_001.wav
outputs/ds_small/scene_000_metadata.json, scene_001_metadata.json
outputs/ds_small/ATTRIBUTION.txt