API

Public API

This reference contains implemented public modules only. Planned backends and dataset integrations are documented as future work rather than exposed as placeholder classes.

For the physical meaning of dynamic convolution time references, frame schedules, and simulation settings, read Overview. Released behavior changes are recorded in the Changelog.

Modules

`torchrir`

The top-level facade re-exports Room, Source, MicrophoneArray, StaticScene, DynamicScene, and RIRResult. Their definitions appear once under torchrir.models below.

`torchrir.sim`

torchrir.sim

Scene-oriented image-source simulation and directivity utilities.

directivity_gain

directivity_gain(pattern, cos_theta)

Compute directivity gain for a pattern given cos(theta).

simulate

simulate(scene, config)

Simulate a static or dynamic scene with the image-source method.

`torchrir.signal`

torchrir.signal

Signal processing utilities for static and dynamic RIR convolution.

DynamicConvolver `dataclass`

Convolve signals with piecewise time-varying RIRs.

time_reference="emission" selects an RIR frame from the input sample's emission time. It models moving sources observed by fixed microphones. time_reference="observation" selects one RIR frame from each output sample's observation time. It models fixed sources observed by moving microphones.

Frame segmentation is independent of this physical convention and is supplied to each convolve call as a FrameSchedule. A dynamic RIRResult whose dynamic scene contains a frame schedule supplies it directly; passing another schedule in that case is rejected.

Attributes:

Name	Type	Description
`time_reference`	`TimeReference`	`"emission"` or `"observation"`. There is no implicit default because choosing the wrong reference changes the physical model.

convolve

convolve(signal, rirs, *, schedule=None)

Convolve dry signals with a dynamic RIR sequence.

Parameters:

Name	Type	Description	Default
`signal`	`Tensor`	Dry signal with shape `(samples,)` or `(n_sources, samples)`.	required
`rirs`	`Tensor \| RIRResult`	Dynamic RIR tensor with shape `(frames, rir_samples)`, `(frames, n_microphones, rir_samples)` for one source, or `(frames, n_sources, n_microphones, rir_samples)`. A dynamic RIRResult may be passed instead.	required
`schedule`	`FrameSchedule \| None`	One start sample per RIR frame. It is required for a raw tensor and for an `RIRResult` without a frame schedule. It must be omitted when the result's scene already contains one.	`None`

Returns:

Type	Description
`Tensor`	Tensor with shape `(n_microphones, output_samples)`, where
`Tensor`	`output_samples = signal_samples + rir_samples - 1`. The output
`Tensor`	keeps the input signal's dtype and device, including for one
`Tensor`	microphone.

FrameSchedule `dataclass`

Immutable start samples for a sequence of dynamic RIR frames.

One start is stored for each frame. The first start is zero and subsequent starts increase strictly. starts returns a new CPU int64 tensor on every access, so callers cannot mutate the schedule through that view.

starts `property`

starts

Return a mutation-safe CPU int64 snapshot.

fixed_hop `classmethod`

fixed_hop(*, stop_sample, hop_size)

Start frames every hop_size samples before stop_sample.

from_samples `classmethod`

from_samples(starts)

Build a schedule from exact frame starts.

from_seconds `classmethod`

from_seconds(times, *, sample_rate)

Floor second-based frame times once into exact sample starts.

normalized_progress

normalized_progress(*, stop_sample, dtype=torch.float32, device=None)

Return exact frame starts normalized to [0, stop_sample).

This is the canonical interpolation grid for geometry whose nominal endpoint occurs at stop_sample. The endpoint itself is not a frame start, so every returned value is strictly smaller than one.

uniform `classmethod`

uniform(*, frame_count, stop_sample)

Partition [0, stop_sample) uniformly into frame_count frames.

convolve_rir

convolve_rir(signal, rirs)

Convolve signals with static RIRs (supports multi-source/mic).

Parameters:

Name	Type	Description	Default
`signal`	`Tensor`	(n_src, n_samples) or (n_samples,) tensor.	required
`rirs`	`Tensor`	`(rir_len,)`, `(n_mic, rir_len)`, or `(n_src, n_mic, rir_len)` tensor.	required

Returns:

Type	Description
`Tensor`	`(n_mic, n_samples + rir_len - 1)` tensor. The microphone axis is
`Tensor`	retained when there is only one microphone.

Examples:

y = convolve_rir(signal, rirs)

fft_convolve

fft_convolve(signal, rir)

Convolve a 1D signal with a 1D RIR using FFT.

Parameters:

Name	Type	Description	Default
`signal`	`Tensor`	1D signal tensor.	required
`rir`	`Tensor`	1D impulse response.	required

Returns:

Type	Description
`Tensor`	1D tensor of length len(signal) + len(rir) - 1.

Examples:

y = fft_convolve(signal, rir)

FrameSchedule stores exact CPU int64 starts. A scene-owned schedule is consumed automatically from a dynamic RIRResult; otherwise pass it to DynamicConvolver.convolve. FrameSchedule.from_seconds retains its conversion sample rate and rejects use with a different room sample rate.

`torchrir.geometry`

torchrir.geometry

Geometry helpers for arrays, trajectories, and sampling.

Includes standard array layouts (linear, circular, polyhedron, binaural, Eigenmike) plus position sampling utilities.

binaural_array

binaural_array(center, *, offset=0.08, device=None, dtype=None)

Create a two-mic binaural layout around a center point.

circular_array

circular_array(center, *, num, radius, plane='xy', normal=None, device=None, dtype=None)

Create an equally spaced circular microphone array.

clamp_positions

clamp_positions(positions, room_size, margin=0.1)

Clamp positions to remain inside the room with a margin.

eigenmike_em32

eigenmike_em32(center, *, radius=0.042, azimuth_offset_deg=0.0, device=None, dtype=None)

Create the mh acoustics Eigenmike em32 geometry (3D only).

eigenmike_em64

eigenmike_em64(center, *, radius=0.042, azimuth_offset_deg=0.0, device=None, dtype=None)

Create the mh acoustics Eigenmike em64 geometry (3D only).

linear_array

linear_array(center, *, num, spacing, axis=0, direction=None, device=None, dtype=None)

Create an equally spaced linear microphone array.

linear_trajectory

linear_trajectory(start, end, *, progress)

Interpolate a line at explicit normalized progress values.

progress makes the trajectory's time grid explicit. For a dynamic RIR scene, obtain it from FrameSchedule.normalized_progress so every geometry frame is evaluated at its exact sample start.

polyhedron_array

polyhedron_array(center, *, kind='tetrahedron', radius=0.1, device=None, dtype=None)

Create a regular polyhedron microphone array (3D only).

sample_positions

sample_positions(*, num, room_size, rng, margin=0.5, device=None, dtype=None)

Sample random positions within a room with a safety margin.

sample_positions_min_distance

sample_positions_min_distance(*, num, room_size, rng, center, min_distance, z_range=(1.5, 1.8), margin=0.5, max_attempts=1000, device=None, dtype=None)

Sample random positions with a minimum distance from a center point.

`torchrir.viz`

torchrir.viz

Visualization helpers for scenes and trajectories.

Provides static/dynamic plotting plus GIF/MP4 animation utilities.

animate_scene_gif

animate_scene_gif(*, out_path, room, sources, mics, src_traj=None, mic_traj=None, step=1, fps=None, signal_len=None, fs=None, duration_s=None, plot_2d=True, plot_3d=False, annotate_sources=True, annotation_lines=None)

Render a GIF showing source/mic trajectories.

animate_scene_mp4

animate_scene_mp4(*, out_path, room, sources, mics, src_traj=None, mic_traj=None, step=1, fps=None, signal_len=None, fs=None, duration_s=None, plot_2d=True, plot_3d=False, annotate_sources=True, annotation_lines=None, mixture_path=None, mux_audio=True, audio_channels=(0, 1))

Render an MP4 showing source/mic trajectories.

When mux_audio is enabled and mixture_path is given, a stereo track is added with ffmpeg using the requested channel indices. The video canvas defaults to HD (1280x720).

plot_scene_dynamic

plot_scene_dynamic(*, room, src_traj, mic_traj, step=1, src_pos=None, mic_pos=None, ax=None, title=None, show=False, annotate_sources=True, annotation_lines=None)

Plot source and mic trajectories within a room.

If trajectories are static, only positions are plotted.

Examples:

ax = plot_scene_dynamic(
    room=[6.0, 4.0, 3.0],
    src_traj=src_traj,
    mic_traj=mic_traj,
)

plot_scene_static

plot_scene_static(*, room, sources, mics, ax=None, title=None, show=False, annotate_sources=True, annotation_lines=None)

Plot a static room with source and mic positions.

Examples:

ax = plot_scene_static(
    room=[6.0, 4.0, 3.0],
    sources=[[1.0, 2.0, 1.5]],
    mics=[[2.0, 2.0, 1.5]],
)

render_scene_plots

render_scene_plots(*, out_dir, room, sources, mics, src_traj=None, mic_traj=None, prefix='scene', step=1, show=False, plot_2d=True, plot_3d=True, annotate_sources=True, annotation_lines=None)

Plot static and dynamic scenes and save images to disk.

save_scene_gifs

save_scene_gifs(*, out_dir, room, sources, mics, src_traj, mic_traj, prefix, signal_len, fs, gif_fps, logger, annotate_sources=True, annotation_lines=None)

Render trajectory GIFs.

save_scene_layout_images

save_scene_layout_images(*, out_dir, room, sources, mics, logger, src_traj=None, mic_traj=None, save_2d=True, save_3d=True, annotate_sources=True, annotation_lines=None, show=False)

Save static layout images with explicit 2D/3D filenames.

save_scene_plots

save_scene_plots(*, out_dir, room, sources, mics, src_traj=None, mic_traj=None, prefix, show, logger, plot_2d=True, plot_3d=True, annotate_sources=True, annotation_lines=None)

Plot and save scene images.

save_scene_videos

save_scene_videos(*, out_dir, room, sources, mics, src_traj, mic_traj, signal_len, fs, logger, mp4_fps=None, save_3d=True, mixture_path=None, mux_audio=True, annotate_sources=True, annotation_lines=None)

Render trajectory MP4 videos.

Output names follow oobss-compatible conventions: - room_layout_2d.mp4 - room_layout_3d.mp4 (3D rooms when save_3d is enabled)

`torchrir.models`

torchrir.models

Core data models for rooms, sources, microphones, scenes, and results.

Examples:

from torchrir import DynamicScene
from torchrir.config import SimulationConfig
from torchrir.sim import simulate
scene = DynamicScene(room=room, sources=sources, mics=mics, src_traj=src_traj, mic_traj=mic_traj)
result = simulate(scene, SimulationConfig(max_order=4, tmax=0.3))

Room `dataclass`

Room geometry and acoustic parameters.

Reflection coefficients use wall order [x-low, x-high, y-low, y-high] in 2D and append [z-low, z-high] in 3D.

Examples:

room = Room.shoebox(size=[6.0, 4.0, 3.0], fs=16000, beta=[0.9] * 6)

replace

replace(**kwargs)

Return a new Room with updated fields.

shoebox `staticmethod`

shoebox(size, *, fs, c=343.0, beta=None, t60=None, device=None, dtype=None)

Create a rectangular (shoebox) room.

Examples:

room = Room.shoebox(size=[6.0, 4.0, 3.0], fs=16000, beta=[0.9] * 6)

Source `dataclass`

Source geometry, orientation, and directivity.

Orientations are normalized to unit vectors with the same (n, dim) shape as positions. 2D angles use a scalar or (n, 1) radians; vectors use (2,) or (n, 2). 3D angles use azimuth/elevation pairs.

Examples:

sources = Source.from_positions([[1.0, 2.0, 1.5]])

from_positions `classmethod`

from_positions(positions, *, orientation=None, directivity='omni', device=None, dtype=None)

Convert positions/orientation to tensors and build a Source.

replace

replace(**kwargs)

Return a new Source with updated fields.

MicrophoneArray `dataclass`

Microphone-array geometry, orientation, and directivity.

Orientations follow the same canonical unit-vector contract as Source.

Examples:

mics = MicrophoneArray.from_positions([[2.0, 2.0, 1.5]])

from_positions `classmethod`

from_positions(positions, *, orientation=None, directivity='omni', device=None, dtype=None)

Convert positions/orientation to tensors and build a MicrophoneArray.

replace

replace(**kwargs)

Return a new MicrophoneArray with updated fields.

StaticScene `dataclass`

Container for static scene simulation inputs.

Examples:

scene = StaticScene(room=room, sources=sources, mics=mics)

validate

validate()

Revalidate geometry tensors after possible external mutation.

DynamicScene `dataclass`

Container for dynamic scene simulation inputs.

An optional FrameSchedule stores exact sample-domain frame starts. It can be consumed directly by dynamic convolution without a lossy seconds-to-samples round trip.

Examples:

scene = DynamicScene(
    room=room,
    sources=sources,
    mics=mics,
    src_traj=src_traj,
    mic_traj=mic_traj,
    schedule=schedule,
)

validate

validate()

Revalidate geometry and trajectory tensors after external mutation.

RIRResult `dataclass`

Container for RIRs with metadata.

Examples:

from torchrir.config import SimulationConfig
from torchrir.sim import simulate

config = SimulationConfig(max_order=6, tmax=0.3)
result = simulate(scene, config)
rirs = result.rirs

validate

validate()

Revalidate this shallow-immutable result at a consumer boundary.

`torchrir.io`

torchrir.io

I/O helpers for audio files and metadata serialization.

AudioData `dataclass`

Channel-preserving audio plus sample-rate and file metadata.

save_audio_data reuses subtype only when the destination container matches format. format describes a loaded file, while the destination path selects the saved container format.

AudioInfo `dataclass`

Basic audio file metadata.

build_metadata

build_metadata(*, room, sources, mics, rirs, src_traj=None, mic_traj=None, schedule=None, time_reference=None, signal_len=None, source_info=None, extra=None)

Build JSON-serializable metadata for a simulation output.

Examples:

metadata = build_metadata(
    room=room,
    sources=sources,
    mics=mics,
    rirs=rirs,
    src_traj=src_traj,
    mic_traj=mic_traj,
    signal_len=signal.shape[-1],
)
save_metadata_json(Path("outputs/scene_metadata.json"), metadata)

build_result_metadata

build_result_metadata(result, *, schedule=None, time_reference=None, signal_len=None, source_info=None, extra=None)

Build metadata from a scene-oriented simulation result.

info_audio

info_audio(path)

Return metadata for an audio file (wav/flac/other supported by soundfile).

info_wav

info_wav(path)

Return metadata for a wav file.

This entry point is wav-only. For non-wav formats, use torchrir.io.info_audio.

load_audio

load_audio(source)

Load an audio file (wav/flac/other supported by soundfile).

Notes

Multichannel input uses channel 0 only (warns).
A caller-owned binary stream remains open after this function returns.

load_audio_data

load_audio_data(source)

Load audio and metadata without closing a caller-owned stream.

load_wav

load_wav(path)

Load a wav file and return mono audio and sample rate.

This entry point is wav-only. For non-wav formats, use torchrir.io.load_audio.

save_attribution_file

save_attribution_file(*, out_dir, dataset_attribution, modifications, attribution_name='ATTRIBUTION.txt', logger=None)

Save dataset attribution and modification notes to a text file.

save_audio

save_audio(path, audio, sample_rate, *, normalize=False, peak=1.0, subtype=None)

Save a mono or multi-channel audio file without changing its gain.

Use normalize=True only when independent peak normalization is intended. WAV destinations default to the FLOAT subtype. Other formats reject out-of-range samples before their integer/compressed encoding can clip. For related stems and mixtures, apply one common scale before saving.

save_audio_data

save_audio_data(path, data, *, normalize=False, peak=1.0, subtype=None)

Save audio, preserving subtype only when the container is unchanged.

save_metadata_json

save_metadata_json(path, metadata)

Save metadata as JSON to the given path.

Examples:

save_metadata_json(Path("outputs/scene_metadata.json"), metadata)

save_result_metadata

save_result_metadata(*, out_dir, result, metadata_name='metadata.json', schedule=None, time_reference=None, signal_len=None, source_info=None, extra=None, logger=None)

Build and save metadata from an RIRResult.

save_scene_audio

save_scene_audio(*, out_dir, audio, fs, audio_name, logger=None)

Save scene audio without applying implicit gain normalization.

save_scene_metadata

save_scene_metadata(*, out_dir, metadata_name, room, sources, mics, rirs, src_traj=None, mic_traj=None, schedule=None, time_reference=None, signal_len=None, source_info=None, extra=None, logger=None)

Build and save scene metadata JSON to the output directory.

save_wav

save_wav(path, audio, sample_rate, *, normalize=False, peak=1.0, subtype=None)

Save a wav file without changing its gain.

With no explicit subtype, WAV output uses 32-bit floating-point samples so values outside [-1, 1] are not clipped. Set normalize=True explicitly to peak-normalize. This entry point is wav-only; for non-wav formats, use torchrir.io.save_audio.

Audio save functions preserve gain by default (normalize=False). load_audio and load_audio_data accept either a Path or a caller-owned open, seekable binary stream. Strings and arbitrary objects raise TypeError; closed or non-seekable streams raise ValueError. One SoundFile handle supplies both metadata and samples, and a supplied stream remains open after the call. AudioInfo normalizes Python/NumPy integer metadata: sample rate is limited to 1..2**31-1, frame count to non-negative int64, and channel count to positive int32. save_audio_data reuses its stored subtype only when the destination container matches the loaded format; otherwise WAV output without an explicit subtype uses FLOAT. Non-floating subtypes reject samples outside [-1, 1]. Metadata builders use explicit schedule and time_reference arguments, emit torchrir.scene schema version 1 with generator provenance and compact RIR/sample axes, and reject a time reference inconsistent with scene motion; JSON publication is finite-only and atomic. See Metadata schema version 1.

`torchrir.logging`

torchrir.logging

Logging helpers for torchrir.

LoggingConfig `dataclass`

Configuration for torchrir logging.

Examples:

config = LoggingConfig(level="INFO")
logger = setup_logging(config)

replace

replace(**kwargs)

Return a new config with updated fields.

resolve_level

resolve_level()

Resolve level to a logging integer constant.

get_logger

get_logger(name=None)

Return a torchrir logger, namespaced under the torchrir root.

Examples:

logger = get_logger("examples.static")

setup_logging

setup_logging(config)

Configure and return the root torchrir logger.

Examples:

logger = setup_logging(LoggingConfig(level="DEBUG"))
logger.info("ready")

`torchrir.config`

torchrir.config

Validated configuration objects for TorchRIR simulation.

RIRHighPassConfig `dataclass`

Optional IIR high-pass filter applied after RIR generation.

High-pass filtering is opt-in. Use SimulationConfig(high_pass=...) to enable it. phase="causal" preserves the physical time origin; phase="zero_phase" is an explicit forward-backward operation that can introduce pre-ringing and depends on the finite RIR endpoint.

filter_family accepts "bessel", "butter", "cheby1", "cheby2", or "ellip". Install torchrir[hpf] to enable this SciPy CPU post-process, which detaches the RIR from PyTorch autograd.

ResolvedSimulationConfig `dataclass`

Validated effective settings shared by a kernel and result metadata.

This record is produced internally by torchrir.sim.simulate. tmax is exactly nsample / fs after resolution.

SimulationConfig `dataclass`

Complete request for one image-source simulation.

Exactly one image limit (max_order or nb_img) and exactly one output limit (tmax or nsample) are required at construction. Room sampling rate, endpoint directivity, positions, and orientations belong to the scene rather than this algorithm configuration.

Attributes:

Name	Type	Description
`max_order`	`int \| None`	Maximum L1 reflection order, mutually exclusive with `nb_img`.
`nb_img`	`Tensor \| tuple[int, ...] \| None`	Non-negative image-grid half-width per room dimension, mutually exclusive with `max_order`.
`tmax`	`float \| None`	Requested duration in seconds, mutually exclusive with `nsample`. Resolution uses `ceil(tmax * fs)`.
`nsample`	`int \| None`	Exact output sample count, mutually exclusive with `tmax`.
`tdiff`	`float \| None`	Optional strictly positive diffuse-tail handoff time.
`device`	`device \| str \| None`	Execution device. `None` inherits the scene; `"auto"` selects CUDA, then compatible MPS, then CPU.
`dtype`	`dtype \| None`	Execution dtype. `None` inherits the scene dtype.
`min_source_mic_distance`	`float`	Minimum permitted endpoint separation in metres.
`seed`	`int \| None`	Optional diffuse-tail random seed.
`use_lut`	`bool`	Use sinc lookup-table interpolation where supported.
`frac_delay_length`	`int`	Positive odd fractional-delay tap count.
`sinc_lut_granularity`	`int`	Lookup-table subdivisions per sample.
`image_chunk_size`	`int`	Images processed per geometry chunk.
`accumulate_chunk_size`	`int`	Image contributions accumulated per chunk.
`use_compile`	`bool`	Compile accelerator accumulation with `torch.compile`.
`high_pass`	`RIRHighPassConfig \| None`	Optional explicit post-simulation high-pass configuration.

replace

replace(**kwargs)

Return a validated configuration with selected fields replaced.

SimulationConfig contains requested values. Scene-oriented simulation stores the fully resolved ResolvedSimulationConfig in RIRResult.config, including backend-effective LUT and compile flags.

`torchrir.util`

torchrir.util

General-purpose math, device, and tensor utilities for torchrir.

DeviceSpec `dataclass`

Resolve device + dtype defaults consistently.

Examples:

spec = DeviceSpec(device="auto", dtype=torch.float32)
device, dtype = spec.resolve(tensor)

resolve

resolve(*values)

Resolve device/dtype from inputs with overrides.

add_output_args

add_output_args(parser, *, out_dir_default, plot_default=False, include_plot=True, include_show=True, include_gif=False)

Add common output/plot/GIF arguments to a parser.

as_float_tensor

as_float_tensor(value, *, device=None, dtype=None, name='value')

Convert numeric input to a real floating-point tensor.

Integer inputs are promoted to PyTorch's default floating dtype when no dtype is requested. Explicit non-floating and complex dtypes are rejected because the geometry and acoustic kernels require real floating values.

as_tensor

as_tensor(value, *, device=None, dtype=None)

Convert a value to a tensor while preserving device/dtype when possible.

attenuation_db_to_time_sabine

attenuation_db_to_time_sabine(att_db, t60)

Convert attenuation (dB) to time based on T60.

Note

This function corresponds to gpuRIR's att2t_SabineEstimation. TorchRIR uses snake_case naming for consistency.

Examples:

t = attenuation_db_to_time_sabine(att_db=60.0, t60=0.4)

ensure_dim

ensure_dim(size)

Validate room size dimensionality (2D or 3D).

estimate_beta_from_t60

estimate_beta_from_t60(size, t60, *, c=_DEF_SPEED_OF_SOUND, device=None, dtype=None)

Estimate uniform wall reflection coefficients with Sabine's formula.

The 2D model uses 12 ln(10) / c with room area and perimeter. The 3D model uses 24 ln(10) / c with room volume and surface area. A requested T60 that would require an energy absorption coefficient greater than one is physically infeasible and raises ValueError instead of being clipped.

Note

This function corresponds to gpuRIR's beta_SabineEstimation. TorchRIR uses snake_case naming for consistency.

Examples:

beta = estimate_beta_from_t60(torch.tensor([6.0, 4.0, 3.0]), t60=0.4)

estimate_image_counts_from_tmax

estimate_image_counts_from_tmax(tmax, room_size, c=_DEF_SPEED_OF_SOUND)

Estimate image counts per dimension needed to cover tmax.

Note

This function uses TorchRIR's per-axis image-index half-width. gpuRIR's t2n returns a total count whose rough equivalent is 2 * n + 1; the two helpers intentionally do not return the same values.

Examples:

nb_img = estimate_image_counts_from_tmax(0.3, torch.tensor([6.0, 4.0, 3.0]))

estimate_t60_from_beta

estimate_t60_from_beta(size, beta, *, c=_DEF_SPEED_OF_SOUND, device=None, dtype=None)

Estimate T60 from reflection coefficients using Sabine's formula.

Reflection coefficients are pressure-amplitude ratios in [0, 1]. The corresponding energy absorption is 1 - beta**2. Perfectly reflecting walls therefore produce an infinite T60.

Examples:

t60 = estimate_t60_from_beta(torch.tensor([6.0, 4.0, 3.0]), beta=torch.full((6,), 0.9))

extend_size

extend_size(size, dim)

Extend 2D room size to 3D by adding a dummy z dimension.

normalize_orientation

normalize_orientation(orientation, *, eps=1e-08)

Normalize non-zero orientation vectors.

orientation_to_unit

orientation_to_unit(orientation, dim)

Convert unambiguous angle/vector representations to unit vectors.

In 2D, angles are scalar or have shape (..., 1) and vectors have shape (..., 2). In 3D, azimuth/elevation pairs have shape (..., 2) and vectors have shape (..., 3). A one-dimensional 2D tensor with two elements is therefore always one vector, never two per-entity angles.

resolve_device

resolve_device(device, *, prefer=('cuda', 'mps', 'cpu'))

Resolve a device string (including 'auto') into a torch.device.

Falls back to CPU when the requested backend is unavailable.

Examples:

device = resolve_device("auto")

DeviceSpec and resolve_device are the canonical device utilities.

`torchrir.datasets`

torchrir.datasets

Dataset helpers for torchrir.

Includes CMU ARCTIC and LibriSpeech dataset wrappers plus collate utilities for DataLoader usage. Use load_dataset_sources to build fixed-length source signals from random utterances. Dynamic CMU ARCTIC scene generation is available via build_dynamic_cmu_arctic.

Examples:

from torch.utils.data import DataLoader
from torchrir.datasets import CmuArcticDataset, collate_dataset_items
dataset = CmuArcticDataset("datasets/cmu_arctic", speaker="bdl", download=True)
loader = DataLoader(dataset, batch_size=4, collate_fn=collate_dataset_items)

from pathlib import Path
from torchrir.datasets import LibriSpeechDataset
librispeech = LibriSpeechDataset(Path("datasets/librispeech"), subset="train-clean-100")

BaseDataset

Bases: Dataset[DatasetItem], ABC

Base dataset class compatible with torch.utils.data.Dataset.

attribution_info `abstractmethod`

attribution_info()

Return attribution and license information for this dataset.

available_sentences `abstractmethod`

available_sentences()

Return sentence entries that have audio available.

list_speakers `abstractmethod`

list_speakers()

Return available speaker IDs.

load_audio `abstractmethod`

load_audio(utterance_id)

Load audio for an utterance and return (audio, sample_rate).

CmuArcticDataset

Bases: BaseDataset

CMU ARCTIC dataset loader.

Examples:

dataset = CmuArcticDataset(Path("datasets/cmu_arctic"), speaker="bdl", download=True)
audio, fs = dataset.load_audio("arctic_a0001")

audio_dir `property`

audio_dir

Return the directory containing audio files.

text_path `property`

text_path

Return the path to txt.done.data.

attribution_info

attribution_info()

Return attribution and license information for CMU ARCTIC.

audio_path

audio_path(utterance_id)

Return the audio path for an utterance ID.

available_sentences

available_sentences()

Return sentences that have a corresponding wav file.

list_speakers

list_speakers()

Return speaker IDs with a usable local dataset directory.

load_audio

load_audio(utterance_id)

Load mono audio for the given utterance ID.

sentences

sentences()

Parse all sentence metadata.

CmuArcticSentence `dataclass`

Sentence metadata from CMU ARCTIC.

CollateBatch `dataclass`

Collated batch of dataset items.

Fields

audio: Padded audio tensor of shape (batch, max_len).
lengths: Original lengths for each item.
sample_rate: Sample rate shared across the batch.
utterance_ids: Utterance IDs per item.
texts: Optional text per item.
speakers: Optional speaker IDs per item.
metadata: Optional per-item metadata (pass-through).

DatasetAttribution `dataclass`

Structured attribution info used for redistribution notices.

to_dict

to_dict()

Return a JSON-serializable mapping.

DatasetItem `dataclass`

Validated mono dataset item for DataLoader consumption.

validate

validate()

Revalidate the shallow-mutable tensor payload.

DynamicCmuArcticBuildConfig `dataclass`

Validated, immutable request accepted by build_dynamic_cmu_arctic.

Paths are normalized to pathlib.Path. Every sequence field is copied into a tuple, so mutating a caller-owned list after construction cannot change a build request.

DynamicDatasetBuildResult `dataclass`

Published artifacts and dimensions of a completed dataset build.

LibriSpeechDataset

Bases: BaseDataset

LibriSpeech dataset loader.

Examples:

dataset = LibriSpeechDataset(Path("datasets/librispeech"), subset="train-clean-100", download=True)
audio, fs = dataset.load_audio("103-1240-0000")

attribution_info

attribution_info()

Return attribution and license information for LibriSpeech.

available_sentences

available_sentences()

Return sentences that have a corresponding audio file.

list_speakers

list_speakers()

Return available speaker IDs.

load_audio

load_audio(utterance_id)

Load mono audio for the given utterance ID.

LibriSpeechSentence `dataclass`

Sentence metadata from LibriSpeech.

SentenceLike

Bases: Protocol

Minimal sentence interface for dataset entries.

attribution_for

attribution_for(dataset, subset=None)

Return attribution info for a supported dataset key.

build_dynamic_cmu_arctic

build_dynamic_cmu_arctic(config, *, logger=None)

Build and publish a staged dynamic CMU ARCTIC dataset with rollback.

Builds for the same target are serialized before any expensive scene work. Every scene is written below an owned sibling workspace, and the target is created or replaced only after the complete build succeeds. If publication fails, the previous target is restored before the error is reported.

choose_speakers

choose_speakers(speakers, num_sources, rng)

Select unique speakers for the requested number of sources.

Examples:

rng = random.Random(0)
speakers = choose_speakers(available_speakers, num_sources=2, rng=rng)

cmu_arctic_speakers

cmu_arctic_speakers(root=None)

Return supported speakers, or locally installed speakers below root.

collate_dataset_items

collate_dataset_items(items, *, pad_value=0.0, keep_metadata=False)

Collate DatasetItem entries into a padded batch.

Parameters:

Name	Type	Description	Default
`items`	`Iterable[DatasetItem]`	Iterable of DatasetItem.	required
`pad_value`	`float`	Value used for padding.	`0.0`
`keep_metadata`	`bool`	Preserve item-level metadata field if present.	`False`

Returns:

Type	Description
`CollateBatch`	CollateBatch with padded audio and immutable metadata tuples.

default_modification_notes

default_modification_notes(*, dynamic)

Return concise modification notes for generated outputs.

load_dataset_sources

load_dataset_sources(*, dataset_factory, speakers, num_sources, duration_s, rng)

Load and concatenate utterances for each speaker into fixed-length signals.

Examples:

from pathlib import Path
from torchrir.datasets import CmuArcticDataset, cmu_arctic_speakers
rng = random.Random(0)
root = Path("datasets/cmu_arctic")
signals, fs, info = load_dataset_sources(
    dataset_factory=lambda spk: CmuArcticDataset(root, speaker=spk, download=True),
    speakers=cmu_arctic_speakers(),
    num_sources=2,
    duration_s=10.0,
    rng=rng,
)

DatasetItem is a keyword-only validated mono-audio record with an explicit metadata payload. collate_dataset_items revalidates each item, requires one sample rate/dtype/device, and returns immutable metadata sequences. Dataset downloads verify pinned checksums and extract only regular files/directories. Secure corpus filesystem operations are available only on Linux and macOS with the required POSIX descriptor-walk and atomic rename primitives; unsupported platforms/filesystems raise NotImplementedError.

API

Public API

Modules

torchrir

torchrir.sim

torchrir.sim

directivity_gain

simulate

torchrir.signal

torchrir.signal

DynamicConvolver dataclass

convolve

FrameSchedule dataclass

starts property

fixed_hop classmethod

from_samples classmethod

from_seconds classmethod

normalized_progress

uniform classmethod

convolve_rir

fft_convolve

torchrir.geometry

torchrir.geometry

binaural_array

circular_array

clamp_positions

eigenmike_em32

eigenmike_em64

linear_array

linear_trajectory

polyhedron_array

sample_positions

sample_positions_min_distance

torchrir.viz

torchrir.viz

animate_scene_gif

animate_scene_mp4

plot_scene_dynamic

plot_scene_static

render_scene_plots

save_scene_gifs

save_scene_layout_images

save_scene_plots

save_scene_videos

torchrir.models

torchrir.models

Room dataclass

replace

shoebox staticmethod

Source dataclass

from_positions classmethod

replace

MicrophoneArray dataclass

from_positions classmethod

replace

StaticScene dataclass

validate

DynamicScene dataclass

validate

RIRResult dataclass

validate

torchrir.io

torchrir.io

AudioData dataclass

AudioInfo dataclass

build_metadata

build_result_metadata

info_audio

info_wav

load_audio

load_audio_data

load_wav

save_attribution_file

save_audio

save_audio_data

save_metadata_json

save_result_metadata

save_scene_audio

save_scene_metadata

save_wav

`torchrir`

`torchrir.sim`

`torchrir.signal`

DynamicConvolver `dataclass`

FrameSchedule `dataclass`

starts `property`

fixed_hop `classmethod`

from_samples `classmethod`

from_seconds `classmethod`

uniform `classmethod`

`torchrir.geometry`

`torchrir.viz`

`torchrir.models`

Room `dataclass`

shoebox `staticmethod`

Source `dataclass`

from_positions `classmethod`

MicrophoneArray `dataclass`

from_positions `classmethod`

StaticScene `dataclass`

DynamicScene `dataclass`

RIRResult `dataclass`

`torchrir.io`

AudioData `dataclass`

AudioInfo `dataclass`

`torchrir.logging`

LoggingConfig `dataclass`

`torchrir.config`

RIRHighPassConfig `dataclass`

ResolvedSimulationConfig `dataclass`

SimulationConfig `dataclass`

`torchrir.util`

DeviceSpec `dataclass`

`torchrir.datasets`

attribution_info `abstractmethod`

available_sentences `abstractmethod`

list_speakers `abstractmethod`

load_audio `abstractmethod`

audio_dir `property`

text_path `property`

CmuArcticSentence `dataclass`

CollateBatch `dataclass`

DatasetAttribution `dataclass`

DatasetItem `dataclass`

DynamicCmuArcticBuildConfig `dataclass`

DynamicDatasetBuildResult `dataclass`

LibriSpeechSentence `dataclass`