The Math Behind Audio Equipment

I purchased a USB mic years ago for video conferencing and so forth, but this mic turned out to be insufficient for recording and broadcasting the full voice that I am obtaining with my opera training. A surprising bit of math is regularly involved in rating and judging audio equipment, and I haven’t seen it fully expounded elsewhere, so I attempt to here.

1 Sound Pressure Level and the Logarithmic Scale

Sound is an acoustic vibration (mostly) transmitted through the ambient air. It is modeled a adiabatic pressure perturbations, thus we assume that the total mass and temperature of air remains constant when we monitor for sound.1

Sound is most commonly measure by a diaphragm, which oscillates due to sound pressure levels, which are the instantaneous difference in sound pressure in a sound wave relative to the mean pressure of the air “at rest,” as it were. There is quite a difference in order of magnitude between these. Static air pressure is about \( 100 \) kPa (\( 1 \times 10^5 \) Pa), while the reference threshold for human hearing is about \( 20 \) μPa (\( 2 \times 10^{-5} \) Pa) sound pressure-level (SPL).

The power dissipated per unit area by a sound wave is called its sound intensity \( I \). The sound intensity is proportional to the square of the sound pressure level \( p \). \[ I \propto p^2 \] I omit the full derivation, but it is basically the same as the derivation of the fact that the power dissipation of a resistor is proportional to the square of the current.

The human ear generally perceives loudness as being proportional to the log of the intensity of the sound, so we measure the loudness of sound in decibels of the square of the sound pressure level, relative to the reference threshold of \( p_r = 20 \) μPa. \[ L_p = 10 \log_{10} \left( \frac{p^2}{p_r^2} \right) = 20 \log_{10} \frac{p}{p_r}. \] The units of \( L_p \) are called dB SPL, for decibels, Sound Pressure Level. Thus, every 10 dB SPL increase is an order of magnitude increase in sound intensity and two orders of magnitude increase in raw sound pressure.

1.1 Acoustic Impedance and Attenuation

Acoustic intensity obeys an inverse square law. The acoustic intensity at a point beyond the sound source is proportional to the inverse of the square of the distance \( r \) away from the sound source. Moreover, because air is viscous and thus not totally lossless, the attenuation due to distance is also dependent in part on the frequency (higher frequencies are more quickly attenuated). The equations for this are very complicated in the general case. In general, we can use a few relations to describe this. Let \( f \) be the frequency of the sound. \[ I \propto r^{-2 \alpha(f)} \qquad \alpha(f) \propto f \qquad \alpha(f) \geq 1 \]

Thus, we can say that for every doubling of the distance between the source and the sensor, the SPL is attenuated by \( 10 \times \log_{10} ( 1 / 4) \approx -6 \) dB.

1.2 The Response of the Human Ear

The human ear cannot differentiate loudness well above about 130 dB SPL. (Thus its dynamic range is about 130 dB.) Furthermore, this is about the level where hearing damage sets in, and hearing damage can set in at much lower SPL over long period of time.

The human ear can reliably differentiate frequencies about from about \( 20 \) Hz to about \( 20 000 \) Hz (\( 20 \) kHz). However, it doesn’t perceive them equally. The ear will perceive a 2kHz sound as being about 10 dB louder than a 200 Hz sound at the same SPL. Thus A-weighting was conceived of as a way of compensating for the non-constant response of the human ear in the frequency domain. Age also significantly affects the ability of the ear to hear higher frequencies, as does damage or hearing loss from other sources.

2 The Fourier Transform

Because so many of these responses are frequency-dependent, it is useful to take a signal in the time domain and translate it into the frequency domain. This is done with the fourier transform. Suppose a signal \( s(t) \) is sampled over a period of \( 0 \leq t \leq T \), then the constituent frequencies of the “averaged” sound is \[ \hat{s} (f) = \int_0^T s(t) e^{-2\pi i t f} \mathrm{d}t. \] Thus if we restrict the time sample domain to a small period of time where the sound pressure level is relatively periodic (a single tone from a single instrument), it is easy to break it down into its constituent frequencies.

3 Signal Processing

4 The Source of Harmonic Distortion

Footnotes:

1

Of course, the temperature and mass of air does change, but we can easily assume that those changes are orders of magnitude slower than the general frequencies of audible sound. In general, we can assume these changes affect both sides of the diaphragm measuring the sound (the diaphragm in a microphone, or the ear drum as the case may be) at roughly the same time, relative to the fast perturbations of audible sound.