ESS Articles Page

On The Acoustics of Control Rooms:
Two Decades On

Michael Rettinger published a groundbreaking paper in 1977, 'On the Acoustics of Control Rooms', in an attempt to bring some degree of uniformity to the industry.
Some twenty years later we seem hardly any better off.

The design of an acoustically accurate monitoring environment, consistent with the need for operator comfort, has been an area of conflicting opinion since the dawn of recorded sound.
The very dead control cubicles of the fifties and early sixties quickly gave way to more human-friendly spaces with the advent of multitrack recording and protracted mixing sessions.
Since then, debate has continued as to what exactly constitutes a perfect control room, or even an adequate one.

Rettinger(1) described a set of criteria for recording control rooms in his paper, at a time when programmable computing was, by current standards almost non-existent.
His paper dealt specifically with the shape of rooms, noise level, reverberation time, frequency response, rear wall treatment, and also the LF excursion requirements of loudspeakers, and the geometry of mixing consoles.

He set out, in particular, a relationship between room volume and decay time which represented a summary of surveyed room performances and subjective judgements on those rooms.
He also discussed means by which to maximize the sound level in the room, in particular the use of hard, exponentially splayed surfaces near the loudspeakers, at a time when the largest of quality amplifiers was able to deliver no more than about 200 Watts, and speaker efficiencies were not high.
He made very little mention of the resonant behaviour of medium sized enclosures, except to say that splaying the frontal sidewall surfaces 'tends to avoid coincidental reinforcement of the normal modes in the room and will certainly prevent flutter echoes when these sidewall sections are made reflective.'

Since 1977, certain of the criteria Rettinger set out have shifted, while others seem to have remained almost unchanged.

Amplifier and transducer technology, certainly, have since evolved to the point that many systems now have ample power available, some of which can be traded against efficiency for improved quality.

The dynamic range of the best recording systems has not greatly changed, although such performance as was available only to top studios is now affordable for all.
Accordingly, the noise floor requirements for a control room are still, as he suggested, in the region of 30dB(A).

While Rettinger eschewed the use of the NC and NR methods of defining background noise levels, their usefulness cannot be denied.
If a level of 30dB(A) significantly transgresses the NR25 curve, this indicates that the energy is concentrated in a relatively narrow band, which would be unacceptable. For instance, a pure tone at 5kHz at 30dB(A) would be most intrusive, whilst technically satisfying Rettinger's noise criterion, whereas a broadband noise which closely followed the NR26 contour would be quite tolerable. A more valid measure would combine the two methods, specifying both a broadband level and a limiting contour. Secondary to this discrepancy between methods, NC and NR are fairly well understood by, in particular,air-conditioning engineers, and adhering to convention when delivering a specification tends to save on costly misunderstandings.
Economic factors have conspired to shift the compromise on noise floor, and most installations will now tolerate about NR35/40dBA, rather than spend the extra thousands to squeeze the last few dB.

Bolt(2) studied the behaviour of sound in enclosures where the enclosure was of dimensions comparable to the wavelength, and suggested a family of ratios of height to length to width such that the modal frequencies would be well distributed. Sepmeyer(3) went further, to suggest three ratios to give perfect distribution of resonances in typical sized rooms, but neither of these was universally applicable to rooms of all sizes.

Bolt gives a pair of formulae which, for medium sized rooms, should give a response which is "good enough", without defining any criteria for "good enough":



where the ratios of height to length to width are 1:x:y (in any order).

Sepmeyer gives three ideal ratios, one of which does not satisfy Bolts formulae:

Bonello(4) highlighted the problem with Bolt and Sepmeyer's approach, and suggested an altogether different method of predicting the room performance at low frequencies.

Instead of trying to find a single formula to describe the overall performance, individual mode frequencies are calculated and graphed.
Evenness of distribution may then be determined by eye.

In order to reduce the problem to one which could be solved by manual calculation, Bonello made a simplifying assumption, namely that all modes would exhibit similar decay times, and so be equally damped, and therefore of uniform bandwidth and intensity.
It would therefore be sufficient to know only the number of modes falling within any third octave band in order to know the steady state response in that band.

For this assumption to hold true in practice, a control room would need to be treated uniformly on all surfaces, as well as being uniformly absorbent with frequency, which for practical reasons is not usually possible.
Even if it were, oblique and tangential modes would still exhibit shorter decay times than axial modes.
For the method to have any real value, it therefore becomes necessary to calculate the decay time of each individual mode, in order to extract its level and bandwidth. It is then possible to plot the power response and examine it for maxima and minima.
Even this extended evaluation is far from perfect, as account ought to be taken of the relative phases in any overlapping regions, and this will vary from point to point in space.
However, it is not unreasonable to expect that in a recording control room the loudspeakers will be at or near corners of the space, and so excite all modes, while the 'sweet spot' listening position tends to be near but not at the centre of the room, and therefore not at a pressure node of most of the room modes.


-----------------------------------------------------------------

Reflections

Veale(5) describes the usefulness of 'reflections which arrive at the ear of the listener at times between 10 and 70 milliseconds after the original sound'. He goes on to suggest a particular pattern of steadily diminishing reflections as providing a reference environment in which to mix music in order to be assured of introducing the correct amount and texture of 'contrived reverberation'.

He describes in detail an optimal pattern, with the first reflection arriving at 10-15ms, 4-6dB below the direct SPL, and a further 3 to 6 reflections, evenly spaced and closely following a decay curve of 0.17s.

Sadly, he gives no suggestion as to how one might achieve such a pattern of early reflections. Veale describes in some detail how the reflection patterns in the rooms investigated were recorded, but omits the methodology by which these patterns were correlated to subjective judgements of the output material from those studios.

However, Veale's paper broke new ground in the sense of opening up an awareness of individual reflections, where previously only reverberant decay times were specified in room designs.

In the wake of, amongst many others, Veale's and Rettinger's papers in particular, a great many control rooms were built in the seventies and early eighties with the loudspeakers built flush into rough stone angled front sidewalls, often with considerable success.

However, the results from many of these rooms were disappointing, and a consensus was reached among designers and users that irregularities in the early reflections were imposing the 'character' of the room on the monitored signal.

A logical development to avoid this problem was Davis' Live End Dead End(6) control room, in which the area close to the monitors is made as absorptive as possible, with the rear wall of the room being reflective to provide the necessary working ambience. The dimensions of the room are arranged such that the first reflections in the control room arrive much later than the early ambience of the performance space, to enable the engineer to make the necessary quality judgements as regards mic placement, eq, etc.

This is somewhat at odds with Veale's lower limit of 10ms, since the first reflections in a live room are quite likely to be from the floor, which, with a stand mounted mic about 1.5m off it, gives a first reflection at about 9ms. Typical live rooms continue to produce primary reflections up to about 30ms, depending on placement of source and mic.

The rear wall reflection in a Davis control room needs to be sufficiently diffused as not to produce any noticeable phase cancellations with the direct sound, and arrive within the Haas integration period, 50-100ms depending on environment.

Concurrently with Davis' work, Schroeder(7) was developing the mathematics for a number theoretical diffuser, which D'Antonio int al turned into a physical product, the Reflection Phase Graticule, using a quadratic residue number sequence. This device found almost instant popularity as the ideal method for providing the necessary degree of diffusion in the rear of Davis' LEDE control rooms.

The requirement for an observation window in the front of a LEDE room led to the trick of arranging the room geometry so that the unavoidable reflection off the glass completely misses the mix position, striking a highly absorptive area of the side wall instead.

This technique was extended by numerous designers, and applied to the front wall flares containing the flush mounted monitors, and the Reflection Free Zone room was born.

To date, the RFZ room is hugely popular, but in spite of its elegance as a solution, it is not without its problems. It is a fairly straightforward matter to arrange the front wall and flares so as to produce no reflections at the mix position. Preventing the desk reflection is much harder, but possible, but it is impossible for the designer to allow for every combination of visiting kit that will be placed, at one time or another, somewhere right behind the engineer. It is also almost impossible to arrange for the reflection free area to extend far enough to include the outboard racks, an area which requires particular accuracy for the adjustment of equalisers, and which needs to be sited with ergonomics as a priority.

An alternative solution which is beginning to find some popularity in the UK is to use the early reflections, sufficiently diffused, to mask the unavoidable reflections from the desk and other equipment. This Early Sound Scattering configuration employs Schroeder diffusers not on the rear wall, as in Davis' rooms, but close to the monitors, in place of the rough stone flares of Rettinger's and Hidley's rooms of the '70s. The reflections from such a diffuser are smoothly random, and so without character, unlike the lumpy randomness of rough stone.

This has a number of consequences, all of which so far appear to be beneficial.

Because the direct sound is time-diffuse, the reflection off the desk surface is also diffuse, with the result that the extreme comb filtering effects that a hard desk reflection usually causes in the HF region are reduced to a tiny ripple. Inspection of the energy time curve from a MLSSA test reveals that instead of tall narrow spikes, the reflections from the desk, and all the other unavoidable surfaces, become squat humps. The phasing anomalies associated with the steep sides of the comb-filter notches disappear.

Probably the most striking benefit is the disappearance of hot spotting due to constructive interference. In effect, because the loudspeakers are built into diffusing panels, the single point sound sources become large plane sources, which do not produce such sharp patterns of fringes. As a result, when the listener moves off the room's central axis, the spectral content remains substantially unchanged, unlike just about any other style of room.

The stereo imaging also becomes stable and reliable, probably due to the fact that pan-potted stereo is usually compromised by incorrect L/R phase information. The time-diffuse sources decorrelate this phase information, concealing from the listener the fact that it contradicts the level difference information. The lack of hot spots, particularly in the region of 1kHz, undoubtedly also contributes to the image stability.


-----------------------------------------------------------------

On Reflection

An Evolutionary Hypothesis

Two questions which often arise during the design process are as follow:

"How come a mix done in a good control room sounds the same wherever you play it?", and

"All my product is going to be listened to in living rooms. Why doesn't mixing in one work?"

More often than not, the client doesn't really want to know the whole answer, but we as designers have a duty to understand as fully as possible what happens in our rooms.

The ability to recognize accurately the sound of an approaching predator irrespective of environment confers an evolutionary advantage, and so will tend to become enhanced over successive generations.

Such an ability implies an ability to ignore the effect of the environment, subtracting it from the received sound, to 'hear' only the original sound.

To be able to subtract the effect of the environment, you have to learn it by sensing it whenever you make a sound yourself.

When you listen you subtract your environment.

When you make a sound you sense your environment.

When you mix a track, being in control of the sound, you behave similarly to when making the sound yourself. (Think about how differently your head feels when you're just listening, as opposed to actively mixing).

So, when you listen in the car, living room, club, etc, you are listening passively, so you subtract the room effects, and 'hear' what was playing in the control room.

What you 'heard' in the control room when you were mixing can only be the same as what was actually playing if the control room was truly characterless.

Observations of numerous rooms indicates, however, that if a room has excessive bass absorption, the engineer tends to compensate, resulting in over-bassy mixes. Uniform mixes played back in under-trapped rooms also exhibit an LF lift, indicating that the distinction between active and passive listening does not apply at LF.

Since the lowest frequencies of the human vocal range are in the region of 100Hz, and we use very little directional information below 200, it is reasonable to suppose that the evolution of this selective hearing would not extend to LF. (Nothing to learn from, nothing to learn for).

The distinction between active and passive listening leads to an interesting anomaly in rooms with directional acoustics and a significant initial time gap.(eg Any LEDE or RFZ).

If a passive listener (producer? client? band member?) is in an environment with its own early reflections, he will tend to subtract this character from the sound he hears, even when, due to the room geometry, the sound originating from the monitors is not having that character imposed upon it.

Meanwhile, an active listener (the engineer) in the same place hears what's happening in the room, supposedly a faithful representation of the source signal.

This will lead on occasion to the producer being unaware of desirable early ambiance in the recorded signal during a take or in a mix, even though it is plainly audible to the engineer. The classic industry joke of providing a 'producers panel' of non-functional 'fine adjustment' knobs now makes sense, since this shifts the producer's mode of listening from passive to active, and he now hears the same balance as the engineer.

In poorly designed rooms, room character will be evident when working on a project, but ignored when listening, for comparison, to other records.

The implication of all this is that we should be striving to design control rooms where whatever ambience (as measured at the mix position) is imposed upon a source at the loudpeaker is identical to the ambience imposed upon a source at the mix position. This ambience, however, has to be completely lacking in character of its own.
Otherwise the engineer cannot hope to hear his final mix in a way which is representative of how the end user will hear it.

The need for operator comfort demands that some ambience be provided at the mix position, and therefore similar ambience must be applied to the monitored signal. This ambience must be rendered characterless if mixes are to travel well. The character of a reverberant field is generally considered to reside in the early part of the decay, specifically the pre-delay and the early reflections; the decay time is much more a matter of quantity than quality, and it is reasonable to expect a diffuse reverberant tail with no earlies to provide such a characterless ambience.


-----------------------------------------------------------------

Reverb Times

Sabine(8) related the decay time to room volume (V) and total absorption (A) by

but this formula is only applicable where the average absorption coefficient is low and the surfaces are uniformly treated. Eyring and Norris(9) give a formula applicable to rooms where the wall absorption dominates, with average coefficients above about 0.3:

and Fitzroy(10) resolves Sabine along three axes:

In the special case of recording control rooms, it seems reasonable, therefore, to substitute Eyring and Norris's formula into each of the Sabine terms of Fitzroy's formula:

Clearly, evaluating this formula manually for octave centre frequencies would be laborious in the extreme, but even modest modern personal computers can do this in an instant, either with a dedicated programme or with a general purpose spreadsheet programme.

NB. In larger rooms, all the above formulae require an additional 4mV in the denominators, where m is the attenuation constant of air (which is frequency, humidity and temperature dependent)(11).

Rettinger suggested an ideal decay time for a recording control room of

where V is in cubic feet, which translates into cubic metres as

Current trends in control room usage suggest a figure about 20% greater than this as being comfortable to work in for protracted periods, which at the time when Rettinger made his recommendation would have meant taking pot luck with the room modes, as the reduced damping would have meant risking more severe holes between modes.
In effect, our new-found ability to predict the modal behaviour of the room allows us to have the decay time we actually desire, rather than compromise working environment against flat response.


-----------------------------------------------------------------

Mode Distribution

Development of an algorithm

Beranek(11) gives the mode frequencies of a rectangular room of dimensions Lx/Ly/Lz as

Where nx,ny,nz represent integers chosen separately to define the order of the individual mode in question. c is the speed of sound, 344m/s under typical room conditions.

In not quite rectangular rooms it seems reasonable to assume that this formula still holds, provided the dimensions are substituted with the arithmetic average dimensions.
In this case, the non rectangularity will contribute to the damping of the mode, so shortening its decay, reducing its level, and broadening its bandwidth.

Kinsler and Frey(12) give the decay time of any single mode as

where

Bandwidth is related to decay time:

The additional bandwidth due to non-rectangularity can be estimated by simple geometry, and an equivalent value for Sa back calculated and substituted into the above decay time formula.
The reverberant sound pressure at the resonant frequency for a 1 Watt source is given by

but Morse & Bolt(13) state that tangential modes require twice the power to produce the same sound pressure as a similar axial one, while oblique modes require four times as much. It is therefore necessary to divide the sound pressure so calculated by for each non-zero nx, ny, or nz over one.

Having computed the properties of each mode, it is then necessary to integrate them together to produce the power response curve.

First an array is defined, representing the power spectrum at linear frequency intervals, 1 to 1000 Hz at 1Hz resolution being a typical (but arbitrary) arrangement.

With each mode in turn, the square of the pressure is added into this array. To each side of the mode frequency, for half the geometrical bandwidth, this same power is added into the array. Beyond this bandwidth, the power added in is successively reduced at a rate dependent upon the bandwidth as derived from the original decay time, until it drops below some predetermined level, such as the threshold of hearing. The result is a resonant peak, whose steepness outside the 3dB bandwidth depends on the absorption on the surfaces, but whose width at the peak also depends on geometry. In effect, this curve is the envelope of all the resonant peaks of the infinite number of narrow parallel wall strips that make up the non-parallel pair of walls.

The resultant array can then be plotted against log frequency, either as it is, or perhaps more usefully, converted to sound pressure level and normalized to the reverberant field SPL in the room at 500Hz.

The curve so produced is an instantly recognizable indication of how well the room is expected to perform. Rooms with up to +3/-6dB spread in the reverberant field spectrum have been found to perform adequately for final master mixes .

It also provides a starting point for effective remedial work in existing rooms, as problem modes can be identified without the need for measuring equipment, and confirmed by simple listening experiment.


-----------------------------------------------------------------

References

(1) Michael Rettinger: On The Acoustics of Control Rooms
AES preprint 1261 1977.

(2) Richard H Bolt: Note on Normal Frequency Statistics in Rectangular Rooms
J. Acous. Soc. Am v18 1946

(3) LW Sepmeyer: Computed Frequency and Angular Distribution of the Normal Modes of Vibration in Rectangular Rooms.
Jour.Acous.Soc.Am.V37 No3 1965.

(4) Oscar J Bonello: A New Criterion For The Distribution of Normal Room Modes
AES preprint 1530 1979.

(5) Edward J Veale: The Environmental Design of a Recording Control Room
44th AES convention, preprint A-2(R) 1973

(6) Don & Chips Davis: The LEDE Concept for the Control of Acoustic and Psychoacoustic Parameters in Recording Control Rooms
63rd AES convention, preprint 1547 1979

(7) Mannfred Schroeder: Diffuse Sound Reflection by Maximum Length Sequences.
J. Acoust. Soc. Am. v57 1975

(8) W.C.Sabine:Collected Papers on Acoustics
Harvard Univ Press 1927

(9) Eyring and Norris
J. Acoust. Soc. Am. v1 1930

(10) Fitzroy

(11) Leo L Beranek: Acoustics.
Acoust. Soc. Am. 1954, 1986

(12) Lawrence Kinsler & Austin Frey:Fundamentals of Acoustics.
J Wiley & sons 1950,1962,1982

(13) Philip M Morse & Richard H Bolt: Sound Waves in Rooms
Reviews of Modern Physics.V16 No2. 1944