1. The purpose of the tests was to investigate the presumed relation between the vocal tract and recorder sound quality. My hypothesis was, that the size and shape of the vocal tract had a considerable impact on the quality of the sound, quality in this case meaning tone timbre.
A player's deliberate change of palate position is easy recognizable for a listener as a change of sound timbre, but I had no knowledge as to how this change worked physically.
Dynamics also appeared to be affected, and
adding a deliberate use of vocal tract techniques to improve the
dynamics would greatly improve the expressiveness of an instrument
generally regarded as possessing a somewhat restricted voice.
2. For many years, I have been studying the effects of different vocal tract shapes on recorder sound. It was my impression that a given shape connected to a given fingering would produce the same effect on all recorders, regardless of size. My own conclusion was that a specified, mimicked speech sound has a given effect in a certain register of any size instrument, and that this effect is reproducible. But I had no idea how it worked physically speaking, and the matter was further complicated by the fact that the sound production takes place outside of the body.
I have aimed at establishing a connection
between vocal tracts and tone colour by developing the vocal tract
configurations associated with speech sounds that the player mimics
while playing. It is possible to control the degree of a desired
quality by refining the mimicking. To have access to this technique
would help a player to cultivate his personal interpretation further.
3. A series of tests was set up by Wolfe and
Smith. These researchers and their colleagues have developed unique
technology for measuring acoustic transfer functions (including
acoustic response and acoustic impedance) in real time. This
technology was developed for phonetic research and for applications
in speech therapy and language training. For these applications, a
carefully synthesized acoustic current is injected into the mouth of
a subject while s/he is talking. A microphone picks up the resultant
pressure signal. This is analyzed to give both the speech signal and
the response of the vocal tract to the injected signal. In speech
therapy and language training applications, the current configuration
of the user's own vocal tract is displayed in real time together with
target values (measured on native speakers of the target language).
This visual feedback allows language learners to produce foreign
phonemes much more accurately and comprehensibly than does the
traditional auditory feedback, which is subject to the perceptual
complications of categorization and interference. More details are
given by Epps et al. (1997) and Dowd et al. (1998).
4. For a part of the tests, we used a
standard Yamaha alto recorder in a=440. The choice of a standard
instrument which is easily accessible would allow a possible
duplicate of our tests and added a scent of objectivity. The
mouthpiece of the instrument was fitted with a tube connected to an
acoustic current generator. A microphone was then added next to the
tube. The input signal from the microphone goes via an amplifier
straight into an analog-digital card in a computer, and the result
shows as a graph generated by software compiled for the occasion.
4.1 The investigators were using two
independent spectrometers: one to display the sound of the instrument
in the external field, and the other to display the frequency
response of the vocal tract while I was playing. The external field
microphone was attached to a lightweight frame fixed to the recorder
itself so as to keep the relative orientation constant and thus to
eliminate the dependence of the radiated spectrum on angle. In all
cases the spectrometer displays were hidden from my view so that I
could not use visual feedback to influence the results.
5. The first task was to find out where to
look and for what. We decided on what notes to investigate, as the
somewhat irregular change of register and fingering of the recorder
requires different blowing pressure. I was then to produce two
opposing timbres on each note, as a strong polarization would
possible show most clearly on the screen. Each note and timbre was
recorded three times, and the result on the screen was saved by a
screen shot.
5.1 For the sake of simplicity, I chose to
call the polarized timbres "thick" for the relaxed palate sound and
"thin" for the sound produced with as high a palate as possible. The
"thick" timbre was not produced with a fixed palate position for all
the notes. On the contrary, I strived for a musically viable sound
production, and let myself be guided by biased musical taste as to
what I considered to be "a nice and musical recorder sound". In
simplified technical terms, I lifted the palate a little bit more the
higher a note I was playing. The "thin" timbre, on the other hand,
was always produced by a palate kept as high as possible regardless
or register.
5.2. It was obvious from the start that
everyone was aware of the change of timbre that I could produce.
However, my description in terms of different vowel sounds when
describing my technique in an intuitive manner initially led the
project in an unhelpful direction. The setup of the experiment
examining the frequency range that characterize different vowels i.e.
300 Hz to 3.5 kHz showed that, while the different vocal tract
configurations produced huge changes in the vocal tract response,
there was little reproducible difference in the spectra of the
external sound in this frequency range. In short, we were looking in
the wrong place.
5.3. It was easy for the investigators to measure the sound spectrum over a wider range, and so they measured spectrograms for the range up to 10 kHz. These spectra showed two very clear and reproducible differences over a wide range of notes and on both recorders. The results were indeed interesting. The screen shot below shows the "graphic looks" of my "thick" sound. I play a C6 while mimicking the letter "a" like in "come". [Ill. 1]

The "thin" timbre showed a completely different graph. Some of the partials were much more prominent, and the "noise" (broad band signal) was reduced considerably. The screen shot has a much more edgy appearance, the sharp peaks being the harmonic partials of the pitch frequency. I still play a C6, but I raise the palate as much as possible, mimicking a giant yawn while playing. [Ill. 2]

One doesn't have to be a scientist to notice the difference between the graphs of the "thick" and "thin" sounds, but trying to look for and read more specific data requires a skilled eye. If one compares the shape of the graphs, it changes considerably around 5 kHz, and this is apparently where the most audible change of timbre takes place. The "thick" tone has a broad band signal (which in my vocabulary corresponds to "noise" or "white noise"). This signal is present over most of the range up to 10 kHz and is considerable greater than that in the "thin" tone. The partials were present but the the noise was almost never large enough to obscure harmonic partials. The "thin" tone has stronger partials in range 6-8 kHz, over a wide range of notes over the range of the instrument. The difference in broad band signal is easy to hear and to identify as a significant change of timbre. There is also a change in harmonic timbre but most listeners cannot easily attribute it to components in the 6-8 kHz range.
The investigators then spent some time to adapt their vocal tract spectrometer to allow it to measure up to 10 kHz. This proved rather more difficult as it required changing the frequency response of the transducers and amplifiers and improving the signal/noise ratio. Nevertheless, a somewhat improvised system was set up which was capable of measuring the vocal tract response of the whole range. They then did another series of experiments in which they simultaneously measured the external sound spectrum and the frequency response of my vocal tract while I was playing it. These results showed strong differences in the 6-8 kHz region. [Ill. 3 & 4]


We also recorded a chromatic scale to see
whether the pattern would change depending on pitch, but the results
were satisfactorily similar. Later, my hand-made Frederick Morgan
recorder in a=403 was equipped with the same outfit, and the results
were similar.
6. The last experiment was set up to measure
and compare the air pressure of the "thin" and "thick" timbres. A
small tube connected to a water manometer was attached to the
mouthpiece of the instrument, and I would then play repeatedly the
same note, using a tuner to correct and to keep the pitch stable. The
result was very interesting. A high palate requires 10% less air
pressure to play the same pitch than a relaxed palate. This
difference was much greater than the variability among different
repetitions of the experiment.
7. What can we deduct from the results? The
experiments suggest that the production of the different timbres is
an associated rather than a direct effect of the player's effort to
produce different vowel shapes. The different vowel sounds depend on
the tract's response up to say 3.5 kHz, but the recorder's sound is
more dependent on a higher frequency range. However the different
tract configurations associated with different vowel sounds entail
different responses in the high frequency regime and thus produce the
different effects on recorder sound.
Wolfe and Smith suggest this as a working hypothesis:
The vocal tract position for the "thick"
sound sets up turbulence in the windway. This means that more
pressure is required to get the same air flow. It is probable but not
quite certain that the player needs to keep the air flow pretty
constant to control the pitch constant. Because musicians have a
strongly ingrained feedback mechanism to play in tune, the player
supplies this extra pressure without even noticing that s/he is doing
so (after all it is not a very large pressure). The turbulence
provides the wind noise that creates the broad band signal. It might
be that this turbulence damps out the 6-8 kHz part of the signal, or
that might be the result of an interaction between the second
harmonic of the windway and the vocal tract to which it is connected.
The key is turbulence. Turbulent flow is changing swirling flow, and laminar flow is steady smooth flow. For the same flow rate, turbulent flow requires more pressure than does laminar flow. The turbulent flow is also what (we think) produces the broad band noise. Try this demo: Find a tap in your house that has no nozzle on it. Turn it on slowly. At some point you will get a change from smooth flow to turbulent flow. It is possible to get these to happen at exactly the same flow rate, depending on whether you were previously increasing or decreasing the flow. At equal flow rates, the turbulent flow occupies a greater cross-section. If turbulent and non-turbulent flow are forced to go through equal pipes, the turbulent one requires more pressure to get the same flow.
A comment: Personally, I've been using these techniques for over 25 years, and as a teacher I have had success in passing on to my students the tools for changing the sound deliberately. The wish to organize my thoughts on the subject has been instigated by several cases of official denial of the fact that the shape of the vocal tract would in any way affect the sound of the recorder.
Copenhagen January 31, 1998
Litterature: Epps, J., Smith, J.R. and Wolfe, J. (1997) "A novel instrument to measure acoustic resonances of the vocal tract during speech" Measurement Science and Technology, 8, 1112-1121.
Dowd, A., Smith, J.R. and Wolfe, J. (1998) "Learning to pronounce vowel sounds in a foreign language using acoustic measurements of the vocal tract as feedback in real time." Language and Speech, 41, 1-20.