********** User Guide ********** This user guide is intended to give a quick overview of the main features of **audiotoolbox**, as well as how to use them. For more details, please see the Reference Manual. Working with Stimuli in the Time Domain ======================================= **audiotoolbox** uses the :class:`audiotoolbox.Signal` class to represent stimuli in the time domain. This class provides an easy-to-use method for modifying and analyzing signals. Creating Signals ---------------- An empty, 1-second long signal with two channels at 48 kHz is initialized by calling: >>> import audiotoolbox as audio >>> import numpy as np >>> >>> signal = audio.Signal(n_channels=2, duration=1, fs=48000) **audiotoolbox** supports an unlimited number of channels, which can also be arranged across multiple dimensions. For example: >>> signal = audio.Signal(n_channels=(2, 3), duration=1, fs=48000) By default, modifications are applied to all channels simultaneously. The following two lines add 1 to all samples in all channels: >>> signal = audio.Signal(n_channels=2, duration=1, fs=48000) >>> signal += 1 Individual channels can be addressed easily using the :attr:`audiotoolbox.Signal.ch` indexer: >>> signal = audio.Signal(n_channels=(2, 3), duration=1, fs=48000) >>> signal.ch[0] += 1 This will add 1 only to the first channel group. The ``ch`` indexer also allows for slicing: >>> signal = audio.Signal(n_channels=3, duration=1, fs=48000) >>> signal.ch[1:] += 1 This will add 1 to all but the first channel. Internally, the :class:`audiotoolbox.Signal` class is a ``numpy.ndarray`` where the first dimension is the time axis (number of samples). The subsequent dimensions define the channels: >>> signal = audio.Signal(n_channels=(2, 3), duration=1, fs=48000) >>> signal.shape (48000, 2, 3) The number of samples and the number of channels can be accessed through properties of the :class:`audiotoolbox.Signal` class: >>> signal = audio.Signal(n_channels=(2, 3), duration=1, fs=48000) >>> print(f'No. of samples: {signal.n_samples}, No. of channels: {signal.n_channels}') No. of samples: 48000, No. of channels: (2, 3) The time axis can be accessed directly using the :attr:`audiotoolbox.Signal.time` property: >>> signal = audio.Signal(n_channels=1, duration=1, fs=48000) >>> signal.time array([0.00000000e+00, 2.08333333e-05, 4.16666667e-05, ..., 9.99937500e-01, 9.99958333e-01, 9.99979167e-01]) It's important to understand that all modifications are in-place, meaning that calling a method does not return a changed copy of the signal but directly changes the signal's data: >>> signal = audio.Signal(n_channels=1, duration=1, fs=48000) >>> signal.add_tone(frequency=500) >>> signal.var() 0.49999999999999994 Creating a copy of a Signal requires the explicit use of the :meth:`audiotoolbox.Signal.copy` method. The :meth:`audiotoolbox.Signal.copy_empty` method can be used to create an empty copy with the same shape as the original: >>> signal = audio.Signal(n_channels=1, duration=1, fs=48000) >>> signal2 = signal.copy_empty() Basic Signal Modifications ========================== Basic signal modifications, such as adding a tone or noise, are directly available as methods. Tones are easily added through the :meth:`audiotoolbox.Signal.add_tone` method. A signal with two antiphasic 500 Hz tones in its two channels is created by running: .. plot:: :include-source: import audiotoolbox as audio import numpy as np import matplotlib.pyplot as plt sig = audio.Signal(n_channels=2, duration=20e-3, fs=48000) sig.ch[0].add_tone(frequency=500, amplitude=1, start_phase=0) sig.ch[1].add_tone(frequency=500, amplitude=1, start_phase=np.pi) plt.plot(sig.time * 1e3, sig) plt.xlabel('Time / ms') plt.ylabel('Amplitude') plt.title('Antiphasic 500Hz Tones') plt.grid(True) plt.show() Fade-in and fade-out ramps with different shapes can be applied using the :meth:`audiotoolbox.Signal.add_fade_window` method: .. plot:: :include-source: import audiotoolbox as audio import matplotlib.pyplot as plt sig = audio.Signal(n_channels=1, duration=100e-3, fs=48000) sig.add_tone(frequency=500, amplitude=1, start_phase=0) sig.add_fade_window(rise_time=30e-3, type='cos') plt.plot(sig.time * 1e3, sig) plt.xlabel('Time / ms') plt.ylabel('Amplitude') plt.title('Tone with Raised Cosine Fade-in and -out') plt.grid(True) plt.show() Similarly, a cosine modulator can be added through the :meth:`audiotoolbox.Signal.add_cos_modulator` method: .. plot:: :include-source: import audiotoolbox as audio import matplotlib.pyplot as plt sig = audio.Signal(n_channels=1, duration=500e-3, fs=48000) sig.add_tone(1000) sig.add_cos_modulator(frequency=30, m=1) sig.add_fade_window(100e-3) plt.plot(sig.time * 1e3, sig) plt.xlabel('Time / ms') plt.ylabel('Amplitude') plt.title('1kHz Tone with 30Hz Modulator') plt.grid(True) plt.show() Generating Noise ================ **audiotoolbox** provides multiple functions to generate noise. This example adds white, pink, and brown Gaussian noise to a signal and plots their spectrograms. The noise variance and a seed for the random number generator can be defined by passing the respective arguments (see :meth:`audiotoolbox.Signal.add_noise`). .. plot:: :include-source: import audiotoolbox as audio import matplotlib.pyplot as plt white_noise = audio.Signal(1, 1, 48000).add_noise() pink_noise = audio.Signal(1, 1, 48000).add_noise(ntype='pink') brown_noise = audio.Signal(1, 1, 48000).add_noise(ntype='brown') wspec, fc = white_noise.time_frequency.octave_band_specgram(oct_fraction=3) pspec, fc = pink_noise.time_frequency.octave_band_specgram(oct_fraction=3) bspec, fc = brown_noise.time_frequency.octave_band_specgram(oct_fraction=3) norm = plt.Normalize( vmin=min([wspec.min(), pspec.min(), bspec.min()]), vmax=max([wspec.max(), pspec.max(), bspec.max()]) ) fig, ax = plt.subplots(2, 2, sharex='all', sharey='all', figsize=(8, 8)) ax[0, 0].set_title('White Noise') ax[0, 0].pcolormesh(wspec.time, fc, wspec.T, norm=norm) ax[0, 1].set_title('Pink Noise') ax[0, 1].pcolormesh(pspec.time, fc, pspec.T, norm=norm) ax[1, 0].set_title('Brown Noise') ax[1, 0].pcolormesh(bspec.time, fc, bspec.T, norm=norm) ax[1, 0].set_xlabel("Time / s") for a in ax[:, 0]: a.set_ylabel('Frequency / Hz') for a in ax.flatten(): a.set_yscale('log') ax[1, 1].set_visible(False) plt.tight_layout() plt.show() Uncorrelated noise can be generated using the :meth:`audiotoolbox.Signal.add_uncorr_noise` method. This uses the Gram-Schmidt process to orthogonalize noise tokens to minimize variance in the created correlation: >>> noise = audio.Signal(3, 1, 48000).add_uncorr_noise(corr=0.2, ntype='white') >>> np.cov(noise.T) array([[1.00002083, 0.20000417, 0.20000417], [0.20000417, 1.00002083, 0.20000417], [0.20000417, 0.20000417, 1.00002083]]) There is also an option to create band-limited, partly-correlated, or uncorrelated noise by defining low-, high-, or band-pass filters that are applied before the Gram-Schmidt process. For more details, please refer to the documentation of :meth:`audiotoolbox.Signal.add_uncorr_noise`. Playback ======== The :meth:`audiotoolbox.Signal.play` method can be used to quickly listen to the signal using the default device. >>> sig = audio.Signal(1, 1, 48000).add_tone(500).add_fade_window(30e-3) >>> sig.play() Resampling ========== Resampling is done using the :meth:`audiotoolbox.Signal.resample` method. .. plot:: :include-source: fig, ax = plt.subplots(2, 2, sharex='all', sharey='all') sig = audio.Signal(1, 100e-3, fs=2000).add_tone(100).add_fade_window(30e-3) ax[0, 0].plot(sig.time, sig, 'x-') ax[0, 0].set_title('Signal at $f_c$=2kHz') sig.resample(4000) ax[0, 1].plot(sig.time, sig, 'x-') ax[0, 1].set_title('Signal upsampled to $f_c$=4kHz') sig.resample(1000) ax[1, 0].plot(sig.time, sig, 'x-') ax[1, 0].set_title('Signal downsampled to $f_c$=1kHz') ax[1, 1].set_visible(False) ax[1, 0].set_xlabel("Time / s") ax[0, 0].set_ylabel("Amplitude") ax[1, 0].set_ylabel("Amplitude") fig.tight_layout() fig.show() Trimming Signals ================ The :meth:`audiotoolbox.Signal.trim` method can be used to shorten a signal by "trimming" it to a specified start and end time. This is useful for extracting a segment of interest from a longer signal. The method modifies the signal in-place. For example, to extract the segment between 0.2 and 0.8 seconds from a 1-second signal: >>> import audiotoolbox as audio >>> # Create a 1-second noise signal >>> signal = audio.Signal(1, 1, 48000).add_noise() >>> print(f'Original duration: {signal.duration:.2f}s') Original duration: 1.00s >>> >>> # Trim the signal to the segment between 0.2s and 0.8s >>> signal.trim(0.2, 0.8) >>> print(f'New duration: {signal.duration:.2f}s') New duration: 0.60s You can also specify only a start time to trim the beginning of the signal, or use negative values to trim from the end. >>> # Create another 1-second signal >>> signal = audio.Signal(1, 1, 48000).add_noise() >>> >>> # Trim the first 200ms >>> signal.trim(0.2) >>> print(f'Duration after trimming start: {signal.duration:.2f}s') Duration after trimming start: 0.80s >>> >>> # Trim the last 100ms of the remaining signal >>> signal.trim(0, -0.1) >>> print(f'Duration after trimming end: {signal.duration:.2f}s') Duration after trimming end: 0.70s Convolution =========== Signals can be convolved with a kernel, which is itself another :class:`audiotoolbox.Signal`. This is commonly used for filtering or to apply an impulse response to a signal (e.g., a Room Impulse Response or a Head-Related Impulse Response). The toolbox uses the fast, FFT-based convolution from ``scipy.signal.fftconvolve``. The :meth:`audiotoolbox.Signal.convolve` method performs this operation. Its behavior with multi-dimensional signals can be controlled with the ``overlap_dimensions`` keyword. Channel-Wise Convolution ------------------------ By default, convolution is performed only along overlapping dimensions between the signal and the kernel (``overlap_dimensions=True``). This means that if the channel shapes match, the first channel of the signal is convolved with the first channel of the kernel, the second with the second, and so on. This is useful for applying multi-channel impulse responses to a multi-channel signal. For example, to simulate a stereo audio signal being played in a room, you could convolve the 2-channel signal with a 2-channel Room Impulse Response (RIR). >>> # Assume 'stereo_signal.wav' is a 2-channel audio file >>> signal = audio.Signal('stereo_signal.wav') >>> >>> # Assume 'stereo_rir.wav' is a 2-channel impulse response >>> rir = audio.Signal('stereo_rir.wav') >>> >>> # Convolve the signal with the RIR >>> signal.convolve(rir) >>> >>> # The resulting signal is still 2 channels >>> signal.n_channels 2 Full Multi-Channel Convolution ------------------------------ If you need to convolve every channel of the signal with every channel of the kernel, you can set ``overlap_dimensions=False``. In this case, convolving a two-channel signal with a two-channel kernel will result in a ``(2, 2)``-shaped channel output, where each element represents one of the possible signal-kernel convolution pairs. >>> signal = audio.Signal(n_channels=2, duration=1, fs=48000) >>> kernel = audio.Signal(n_channels=2, duration=100e-3, fs=48000) >>> signal.convolve(kernel, overlap_dimensions=False) >>> signal.n_channels (2, 2) Convolution Mode ---------------- The ``mode`` parameter (one of ``{'full', 'valid', 'same'}``) controls the size of the output signal, corresponding directly to the ``mode`` argument in ``scipy.signal.fftconvolve``. The default is ``'full'``. .. include:: user_guide/stats.rst .. include:: user_guide/input_output.rst .. include:: user_guide/set_level.rst .. include:: user_guide/time_frequency.rst .. include:: user_guide/filters.rst