Microphone Clipping, The Three Types Explained
Can More Bit-depth Fix It?
The motivation of this essay is my ongoing quest to debunk the recent marketing hype behind 32-bit float recording, that it prevents microphone clipping (I won’t name the manufacturers here).
The reason so many people believe bit-depth, of any size, can prevent clipping originates with their limited knowledge of electronic circuits. It’s easy to assume that louder sounds need bigger numbers; that is, 32-bit can hold sounds louder than 16 bit. If one doesn’t have big-enough numbers clipping will result.
That assumption is false because the loudness of sound isn’t recorded as single values, but as fixed-scale pairs. Those who don’t understand that relationship are susceptible to the argument that greater bit-depth can fix clipping.
Every amplitude of sound comes with a fixed range (scale) of values which give it it’s relative loudness. There is no absolute number for loudness, or color, or anything else in the Universe for that matter.
Every sound is relative to another sound. Before we can know what’s “loud” we must know what’s quiet or what’s loudest.
One can’t change the source, or recording scale in “post” and there’s no point in doing so during recording. Yes, later you can place the number pair into a larger range (scale) of computer memory (what these devices do internally, write the 24-bit values to 32-bit float), but that doesn’t change the sound’s original scaled value.
When recording using electrical systems, loudness can only become as large as the maximum voltage possible in the voltage range (scale) set aside to represent it. That maximum voltage gives every amplitude its proportionality. (I explain below why we don’t use zero voltage).
Ever part of the recording “chain” has a limit. In microphones, we measure the point at which the vibration is so strong the diaphragm converting the air vibrations into electrical vibrations is not able to move far enough to represent them.
When we play a string on the guitar we will get a tone. If we touch above the string where it reaches maximum amplitude, to a point that it can’t reach fully, we will have physically clipped the string from producing its natural wave.
If we think about it, every physical sound device has a physical limit in how strong a sound it can produce. Strings can vibrate only so much before snapping or hitting against a fret-board. A speaker cone can “explode” if too much energy is applied. Most have experienced headphones or earbuds not working quite-as-well after having to produce very loud sounds.
Generally, when audio clips it doesn’t clip because it runs out of numbers. It clips because it runs out of voltage (range) set aside to measure the sound in the first place.
Since the 1950s, if recording equipment is used within the known physical constraints of the microphone and electronics clipping is almost non-existent.
Good engineers, of every type, never lose sight of source scale.
These days, many people who work with sound recording live in a digital fantasy world disconnected from the physical realities of physical scale. They work in their Plato’s cave of false knowledge because that’s the cave created by unscrupulous equipment manufacturers and those selling some technique they say will bring fame and riches. Creators ignorance about their equipment isn’t a new phenomenon.
In the classic scene from the movie “Spinal Tap” (1984). Guitarist Nigel Tufnel demonstrates an amplifier whose volume knobs are marked from zero to eleven. He hand-scribbled “11” onto the knob! Did it make his amplifier louder than the same amplifier without an “11” written on the knob? Of course not.
You can’t change an amplifier’s real amplification by extending the scale. Why do people believe you can make room for sounds that will clip by doing the same digitally?
Before we take a deep dive into how numbers are assigned to different levels of loudness we must look at the voltages that represent them after transduction, the process of converting vibrating strings or vibrating air molecules into electricity.
Unfortunately, the world of audio engineers tries to get away from the measurements of voltage as soon as humanly possible. There are good reasons for this once you understand how everything fits together.
What makes the working audio engineer’s life easier makes things difficult to understand for those starting out.
You’ll find it easier to understand the following if you keep reminding yourself, it isn’t a loudness of 7, it’s an amplitude of 7 either -3 down from a maximum of 10, or 7 up from 0 once we count down to 0 from 10. Or think of it as 10–3 or 7/10. Seven never exists on its own.
Generally speaking, it is agreed that 1 volt will equal 0db (there are exceptions). You’d think ‘0’ would be no sound, but it actually represents maximum amplitude (loudness). The ‘10’ in the above example. Why do something as unintuitive as that?!
The answer is that, again, there will always be some voltage that represents the largest amplitude. Why? Because you CAN NOT run any voltage you want into a circuit without damaging it and even if you could, it’s easier to measure something if you have a general idea of its strength. You don’t bring your kid’s school ruler to measure your backyard.
Why not take the intuitive approach, measure from 0 loudness? The simple answer is 20 times 0 is 0. You can’t amplify 0. Sure, they could have said, let’s make it a really small voltage, 0.000001 and amplify from that. That approach would be much easier to understand, but then how would you deal with voltages that become too large for some circuits to handle? From experience audio engineers have found that 1 volt is a low enough requirement to work with most electronics (especially those powered by batteries) but high enough that the signal (sound) embedded in it can be easily moved and copied (re amplified).
Best of all, if everyone agrees that some maximum voltage equals 0db then that voltage amount does two things at once. 1. Gives proportionality to the sound 2. Represents the safe maximum voltage everyone is expected to work with.
Again, 1 volt is just an arbitrary “line level” amount. The equipment might be moving it around internally as 3.3 or 5 volts. What’s important in the practical world of electrical engineering, is that it isn’t too little volts that will kill your expensive box of electronics, it’s too many.
In short, all audio is measured from a known maximum (loudness) to anything quieter below. Keep in mind, these values are relative. 0db only says the loudest sound, it does not say what that sound would sound like to you in real life. It doesn’t even say what voltage you will match that number to. As you’re well aware, you can use your volume gain to get your microphone to produce whatever loudness you want (that your system can handle).
What can happen when you set your gain to “high”? Which you might to because you want the quieter sounds louder? The preamp might amplify a voltage to a voltage GREATER than the maximum range it can output.
Because we hear loudness according to the inverse square law, we don’t use linear numbers like 1,2,3, we use a scale we call bels represents doubling of values 2,4,8. We add a zero so it’s 10db decibels, 20db, 30db equaling twice as loud, four times as loud, 8 times as loud. Again, it’s arbitrary!.
And because we’re counting down, from the maximum level of loudness, it’s -10db, -20db, -30db etc.
Because it’s easier to deal with decibels that are simply powers of two, logarithms are used. So yeah, if your head isn’t spinning now I envy you! We measure from the loudest sound down. Then we use a number system that isn’t natural to most people.
Now that we’ve been hit in the head with a brick. Time to step on the proverbial rake.
Microphones don’t output in dbs! They output very small changes in voltage. And those voltages are linear! When they are ultimately quantized they will be stored as normally scaled numbers, not exponentially scaled decibels (though most people will work with them represented as such).
For most real-world audio engineers, all that matters is that 1-volt “line/reference” voltage (signal). How the preamp gets it there doesn’t matter. In order to understand why bit-rate has nothing to do with clipping we must understand the transition from microphone voltages to our amplified voltages we can do something with.
Let’s recap.
- The maximum voltage coming out of a microphone is always less than 1 volt; actually, never coming near 1 volt and measured in 1/1,000ths of a volt.
- The maximum “loudness”, in voltage, coming out of a preamp, is, let’s assume, 1 volt.
- The maximum loudness an analog-to-digital converter (ADC) will convert to the largest number it has available cannot be larger than 1 volt (or whatever maximum voltage it’s designed to work with).
- Therefore, there will be no number, in whatever bit-depth we’re using, that is assigned to a loudness level that is greater than what the preamp outputs (1 volt in this case)!
- The corollary is that if a microphone outputs a voltage which the preamp amplifies past 1 volts, say 1.1 volts, that microphone voltage level will not reach the ADC! It physically can’t. It would clip before quantization.
You might still be thinking, if we extend the voltage range of the amplifier we can fix it. Seems logical. But remember, we must work with a source voltage, from the microphone, Are there limitation in what we can do with it? YES! It’s about the range, the scale. Let’s talk resolution.
Resolution is about how many differences in loudness we can record/hear.
To understand resolution, let’s do a thought experiment.
Imagine that we had the ability to look at a sound wave and hear the sound’s loudness (like the one in the graphic above). Let’s say that sound wave represents a voice saying “hello world”. Of course, there isn’t enough resolution in that visualization to represent the frequencies of our sound, but enough for the basic changes in loudness.
What if we want to record that wave onto a new sheet of paper and send it to a friend so they could hear what we (see) hear? We’d probably use a ruler, right? I placed rulers at various resolutions (number of hatch marks) to the right.
It looks possible to me that I could use the first three rulers to duplicate that wave. Each one, successfully better than the one before it. However, I can’t imagine using the last ruler because it’s hatch marks are too close together for me to see.
I’d need to use maybe every 2nd hatch mark of the forth ruler. That would be the same as using the third ruler.
Microphone preamps face the same problem. Even if they could use the forth ruler to measure the amplitude, it doesn’t mean you could hear the difference in amplitude. If you can’t hear the difference, why use a super high resolution ruler which is actually more difficult to use because in the end, it mostly ends up “rounding up or down” to a loudness we can distinguish.
What if we scale everything up? Here’s a visualization of that problem.
Now it’s difficult to read the difference in source! And now the ruler with the least resolution looks like a good match.
The bottom line is the source is always tied to a scale that is tied to our physical ability to experience a difference in energy levels. You can’t move either one independently of the other.
There are many opinions on what resolution we need to capture sound with perfect fidelity. I haven’t read any audio engineer claim that number is above 16-bits. Indeed, I would say 12-bits is the consensus.
But there is a fix to clipping, but it comes with trade-offs.
Audio engineers have been using it for over 60 years. It’s called “limiter”, “auto gain control”, “gain assist”, etc., etc. It pulls loud sounds down (squishes it), before amplification, to fit into the scale the device is working in.
The process removes the overt clipping from those sounds, but our ears can hear the fidelity is off, there’s something not quite right about squished sounds–like HDR photography.
Is it better to suffer some clipping or listen to mangled sound? It’s subjective.
Let’s go back to our axioms and conclusions, and add some more.
- The maximum voltage of a microphone is always known.
- The maximum voltage of a preamp output is always known.
- An ADC has a known maximum voltage input.
- We can’t create a number for a voltage an ADC will not accept as an input if it “amplifies” to an output voltage beyond the ADC’s maximum output.
- The corollary is that if a microphone outputs a voltage which the preamp amplifies past 1 volts, say 1.1 volts, that microphone voltage level would clip before quantization.
- We cannot widen a scale without reducing its resolution. Remember, each value of loudness is scaled (proportional to some value)!
- We cannot add resolution to a scale that records differences in loudness we cannot perceive (or our circuit measure).
- Therefore, IF our scale can represent all differences in perceptible loudness (or specifically, measuring the amount of different voltages) nothing can be gained by adding more scale (bit-depth)
Our range of data, bit depth, is always tied to a maximum reading of voltage. All electronic circuits, all physical devices we use, are generally designed to be used within a range of values. We cannot put whatever voltage we want into our electric lights, toaster, washing machine, etc., etc. They’re all designed to work within a range of energy suitable to the limitations, or optimized configuration of the materials inside.
All amplifiers are NOT abstract things like bit-depth. They are physical objects. All electronic circuits exist in the physical realm studied and explained by Newton, Column, Volt, Tesla and Einstein (to name a few)!
Any manufacturer who says you should pay extra money for “32-bit float” because it prevents clipping is–sorry to be grumpy–ignoring the hard-won knowledge acquired by many scientists and disrespecting the hard-won techniques of today’s engineers.
Finally, let’s look at digital clipping. When you see “proof” from some waveform graphic, that 32-bit float fixes clipping what you are witnessing is a trick being played with how data is displayed. I’m not going to go into that here. Suffice it to say, this graphic show how digital clipping can result in the digital domain. This type of clipping is easily preventable. Indeed, it would always be user error.
In measuring anything with a known maximum value it makes no sense to add numbers (scale) beyond that maximum value. Sure, one can make hypothetical arguments about what if we could get amplifiers to do this or that. That speculation should have no place in the actual work of recording sound.
In short:
Almost no one can tell the difference between 12-bits and anything greater (if not 10-bits). 16-bits gives some space for moving data around. 24-bits, what the heck, computer memory is cheap now. 32-bit float? First, it doesn’t have more precision than 24-bit and its larger scale is useless if you accept 12-bit fixed audio source recording is both the limit of your hearing and audio equipment’s electronics to record it without excessive noise.