PS1061: Sensation and Perception
2014-15
Term 2, Thursday 11 am - 1 pm (Windsor Auditorium)
Lecture 5: Auditory perception: Hearing noise
and sound
Course co-ordinator: Johannes M. Zanker,
j.zanker@rhul.ac.uk, (Room W 246)
Lecture Topics
- auditory perception opens another window to the world : another
'sensory channel'
- the physical nature of sound: tones, mixtures, noise,
complex patterns
- basics the sensory system: peripheral filters and tonotopic
cortical representation
- detection and discrimination of pure tones: characterised
by frequency and intensity
- the complexity of spoken language, the use of telephones, the
effects of hearing loss
- sound localisation (& the cocktail party effect) in auditory
scenes
- can we generate auditory illusions ???
questions about hearing (auditory perception)
what is the significance of auditory perception for
a person in real life?
- sounds signal events: therefore hearing provides an alarm system
- emotional balance is influenced by sound (distress by noise, screeching,
relaxation by listening to music)
- the auditory system is essential for communication - crucial for
human society
what are some of the most interesting problems in auditory
perception ?
- the perceptual basis of harmony (why do some things sound pleasant?)
- the influence of experience and knowledge on auditory perception
- mechanisms of separating signal sources (localising the origin of
sound)
- how does the brain allow for recognition of spoken words and voices
the nature of sound
a sound source is emitting circular pressure
waves (shells of air compression)
|
|
a sound source is emitting
circular pressure
waves (shells of air compression) |
sound waves are similar to the radiating
ripples on the water surface, when a pebble is tossed into a still pond |
 |
a pure tone is
represented by a sinewave (air pressure as function of space/time)
which is travelling through space,
with amplitude and
frequency (1/period) corresponding to perceived loudness
and pitch |
amplitude = distance between peak and trough of sinewave (overall size
pressure changes)
frequency = number of pressure changes per unit of time (inverse of sinewave
period, measured in Hz = cycles per second)
making music: the scale
 |
notes of a musical score refer to the keys on the piano -> the frequency
generated -> the pitch of a musical tone
|

C
D E
F
G
A
B
C
|
keys are arranged on the keyboard
in the order of rising frequency of the musical tone generated (e.g.
C-major diatonic scale)
click for tone samples
|
 |
harmonic intervals are determined by characteristic frequency ratios
|
combining frequencies : pitch
frequency & amplitude
of pure tones are the basis of perceived pitch
and loudness
|
|
what happens to the waveform when you superimpose pure tones ?
combined waveforms: perceived pitch usually is that of the fundamental
click for f0 f1 f0+f1
|
 |
musical tones are combinations
of pure tones (cf. Barlow & Mollon 1982): fundamental (determines pitch)
+ harmonic frequencies (determine timbre)
>> click for piano or guitar
more complex sounds can be genrated by adding further frequencies: chords, consonance,
dissonance, vowels, symphonies, etc.
see also: combination
tones
the nature of noise
what happens when you superimpose random tones (i.e. produce several
waves at the same time) ?
click here
(like tossing a handful of pebbles into the pond, or raindrops falling
on the water surface, see Pierce 1983)
-- this relationship between waves is called interference
--
superimposed waveforms generating random patterns make it difficult
to separate the individual contributions
|
 |
white noise is the superposition of many tones
with random amplitude and frequency (interfering ripples on the pond) - think
of the surface structure of the sea
acoustic sensory organ : the ear
the ear
works as a transducer, converting sound waves into neural signals
 |
the ossicles in the middle ear (hammer = malleus, anvil =
incus, stirrup = stapes) convert oscillations
from gas to liquid medium ('impedance' change)
the ear is a sensory organ with extreme sensitivity : absolute threshold
corresponds to sound levels that generate eardrum vibrations smaller
than 10-10 m (0.1 nanometer)
|
encoding of acoustic signals requires several 'engineering tasks', which are
accomplished by astonishing biological solutions:
- outer ear: directional microphone
- middle ear: impedance matching,
overload protection
- inner ear: neural encoding, frequency
analysis
the sensory surface: inner ear
 |
|
mechanical stimulation is transmitted through ossicles onto oval
window - here the osciallations are converted to pressure waves in
the cochlea - this generates
a travelling wave on the basilar membrane (which resonates
like the string of a guitar)
|
the organ of Corti in the cochlea picks up the vibrations from the basilar
membrane by means of hair cells: mechanosensor array
|
from peripheral filtering to tonotopic maps
encoding with travelling waves :
the travelling wave reaches its maximum on the basilar membrane at
a distance from the oval window that is characteristic for a given
frequency
(from top to bottom: 100 Hz, 1,000Hz,
10,000Hz)
|
|
a given frequency stimulates a particular group of haircells at a given location
of the basilar
membrane
tone frequency is converted to location !!
|
mechanical excitation > travelling wave > electrical activity
further frequency tuning is achieved by lateral
inhibition in the auditory pathway (cochlear nerve)
after G.
v. Békésy, 1967
|
fundamental mechanisms:
- tuning : preferential response
of a sensor to a dedicated stimulus range (e.g. frequencies)
- filter : tuning to frequency
(pitch) in the peripheral auditory system
- sensory map : cortical representation
of pitch as function of location (tonotopy)
frequency channels revealed by masking
masking: you only can
hear the piccolo if the bassoon is played very softly

based on this observation, we can design a psychophysical
masking experiment :
close mask |
distant mask |
|
- vary frequency of masking tone to determine how thresholds
change with frequency difference
|
systematic variation of frequency of masking tone to determine a set of detection
thresholds for a given target frequency (cf. Barlow & Mollon 1982) is a
key method to determine the auditory 'tuning curve'
for this target frequency: threshold amplitude as function of mask frequency
the 'bandwidth' of the frequency
filter that is detecting the target tone is given by the frequency differency
at halfheight of threshold function
tuning curve
- low thresholds (mask amplitudes) for masks close to the target
frequency
- high thresholds (mas amplitudes) for masks more distant from
target frequency
- width of the U-shaped threshold curve corresponds to the bandwidth
of the frequency filter responsible for the target
|
 |
this filter tuning is the basis for the perception
of pitch !
frequency tuning: basis of pitch perception
frequency tuning can be measured in systematic masking experiments for different
target tones …
many filters cover full range of frequencies – like digital audio systems!
|
tuning curves resulting from frequency filter mechanisms can be measured
in systematic masking experiments :
three tuning curves for three different target tones
after Barlow & Mollon 1982
|
|
electrophysiological measurements
from the auditory pathway (e.g. cochlear nerve of cat, sensitivity as
function of frequency) generate very similar patterns of frequency tuning
for individual neurones: preferred tones |
many filters cover the whole audible frequency range |
 |
this
filter
tuning is the basis for pitch discrimination involved in auditory perception,
such as recognising musical tunes!
the same principle of encoding frequency components
separately is used in digital audio systems !!
how to measure loudness ...
the pressure of airwaves determines the magnitude of auditory sensation: 'loudness'
 |
when sound intensity = sound pressure level (SPL) is decreased or increased
relative to a reference tone, subjective ratings of loudness change
proportionally
(after Gulick et al. 1989)
sound intensity (SPL) is measured in decibels (dB), 20
* log (I1/I0)
: as multiples of 20 micropascals
(I0
~ hearing threshold at 1000 Hz)
|
perceived loudness can be measured quantitatively
by comparing two successively presented tones (of different frequency)
and deciding which one sounded louder (forced choice FC to find threshold for
louder/softer)
intensity of comparison tone is adjusted until it has the same ‘subjective’
loudness as the reference tone
and then the physical intensity (sound pressure level SPL) is recorded as ‘perceived
loudness’
sound intensity: perception of loudness
Audiogram
(or audibility function): describes the hearing performance of an individual
(Berrien 1946)
by comparing tones of many frequencies we can derive curves of equal loudness
|
- audibility function (AF) : detection threshold as function
of frequency (given in Hz) 20 ... 20,000 Hz
- equal loudness contours are determined by matching the subjective
intensity of tone pairs at various base intensities (1000 Hz reference:
close to 0 dB at threshold)
|
speech only covers a small region of the auditory response range: 300 - 5000
Hz, 40 - 70 dB
remember: sound intensity (SPL) is measured in decibels:
20 * log (I1/I0)
as multiples of hearing threshold I0 at 1000 Hz, in logarithmic units:
increase by a factor of 10 means additional 20dB
(for example: 20db = 10 times louder, 40db = 100 times louder, 60 db = 1000
times louder, etc.)
the ecology of sound intensity
|
0 dB |
threshold of hearing (at 1000 Hz) |
|
|
10 dB |
normal breathing |
|
10 x |
20 dB |
leaves rustling in a breeze |
|
|
30 dB |
empty lecture theatre |
|
100 x |
40 dB |
Holloway campus at night (without planes) |
|
|
50 dB |
quiet restaurant |
|
1,000 x |
60 dB |
two-person conversation |
|
|
70 dB |
Trafalgar Square |
|
10,000 x |
80 dB |
vacuum cleaner |
|
|
90 dB |
huge waterfall (Niagara) |
danger level |
100,000 |
100 dB |
Underground train |
|
1,000,000 |
120 dB |
Propeller plane at takeoff |
hearing loss |
10,000,000 |
140 dB |
Heathrow: jet at takeoff
(low flying Concorde) |
pain level |
for more details, click here
spectrogram
auditory events can be complicated patterns of frequency and intensity (this
is called a ‘spectrum’)
which are modulated as function of time
to display and analyse such events, scientists use ‘spectrograms’,
or ‘sonograms’: frequency composition is shown graphically
as it changes in time
the three musical tones of the chord can be seen in the schematic spectrogram
as succession of different fundamental & harmonic frequency clusters
|
 |
the complexity of a spoken word
each spoken word generates
a complex pattern of frequency and intensity
(spectrum), which is modulated as function of
time
 |
example; the phrase 'enjoy your weekend' is recorded as
- sonogram (or spectrogram: composition of different frequencies
indicated by greylevels as function of time)
- waveform envelope (measured as vibrations of microphone
membrane)
|
… imagine how difficult it is to recognize the
same word form different speakers, or the voice of a particular speaker ...
the spectrum of human speech
speech sounds cover a wide range of the audible spectrum
- vowel sounds are mainly in the lower frequency region
- consonants cover almost the entire range
|
|
telephone systems cut off the upper part of the spectrum with minimal effects
on speech recognition !!!
hearing loss
the are many different kinds of auditory impairments, apart from complete deafness
|
- presbycusis:
selective high-frequency hearing loss with age
- noise
exposure can lead to temporary threshold shifts
(auditory fatigue) and permanent (partial) deafness
-
tinnitus: continuous humming or ringing, aftre some time it
leads to suppression of the affected frequencies
|
... imagine the consequences of such impairments of the audibility function
for communication and social life ...
auditory space
how does (perceptual) auditory space represent events in a four-dimensional
world (3 spatial dimensions + time)?
in vision, we have 2D-images, can easily localise objects in the visual field
and see several objects at the same time
but
can you hear things at the same time in different locations?
the ear is a 1D sensor (a microphone samples at one point in space) - so :
- can we hear in two, or three dimensions?
- can the auditory system localise objects?
- can we hear several objects at the same time?
sound localization
there is no direct representation of auditory space :
location needs to be calculated from a number of cues (see Goldstein
2002, chapter 11)
(in biology there are experts for this job, like barn owls: see Konishi 1986)
- pinnae : crucial for sensation
of space (reduced when using earphones!); used to locate elevation (up-down
dimension)
- inter-aural processing (combining
information from both ears) to find azimuth (left-right dimension) of sound
source
- intensity differences (IID) : acoustic ‘shadow’
of the head
- temporal or phase differences (ITD): humans can detect inter-aural
delays of 10 - 650 microsec (1 microsec = 1/1000,000 sec)

note the similarity between auditory stereo (this name
sounds familiar - your stereo system has two speakers!) and stereovision ....
the cocktail party effect
it is easy (for young folk) to single out one particular
voice from the background of a noisy pub,
or to pick up the tune of a single instrument from a large orchestra (this is
called the 'cocktail
party effect')
how can this be achieved?
the mixture of wavefronts hitting the ear has an overwhelming
complexity !
(more like the ‘rippled’ surface of a pond in a storm, from
Pierce 1983) |
|
the core of the problem of the cocktail party effect (Cherry 1953) is masking
:
the detection of a tone is impaired if another tone or noise is presented at
the same time
- masking depends on proximity
in space and similarity in frequency
- binaural unmasking can be used
to separate sound sources in space: if spatial distance or difference in frequency
increases, separation becomes easier - these two cues are combined in the
binaural information (subtract signals from the two ears to unmask separate
sound sources)
- high-level effects (attention,
familiarity of voice, language) & sensory fusion
(we use vision to support hearing) can also be used to separate sound sources
in space
auditory illusions
we can create illusions in the auditory system like in the visual system !
click here to hear the
eternally rising tone
|
Shepard’s (1964) eternally rising tone:
an impossible acoustic object like Escher's (1961) 'Waterfall' ?
|
|
read more about
how the tone is generated (and try
it here)
summary: auditory perception
- the auditory ‘channel’
is an important source of sensory information
- the ear is a highly sensitive and intelligent device to pick up and convert sound pressure waves
- frequency filtering is the basis of perceiving
pitch
- sound intensity is the second
characteristic property of sound (with high ecological significance!)
- spoken words generate complex
patterns of frequency and intensity (spectrogram)
- speech covers a wide range of the audible spectrum, which can
be affected by hearing loss
- sound can be localized by calculating intensity
and phase differences between the two ears
- auditory localization works
in difficult environments (cocktail party effect)
General Reading:
Specific References:
- Békésy, G. v., 1967, 'Sensory Inhibition', Princeton University
Press, Princeton NJ
- Berrien KF, 1946, ‘The effects of noise’ Psychol. Bull. 43,
141-161
- Cherry EC, 1953, ‘Some experiments on the recognition of speech,
with one and two ears’ Journal of the Acoustic Society of America, 25:975-979
- Evans EF, 1982, 'Functions of the auditory system' in: Barlow, H
& Mollon, J., eds. ‘The Senses’, Cambridge University Press,
1982 (612.8 BAR) chapter 15, p 307-331
- Gulick WL, Gescheider GA, Frisina RD, 1989 ‘Hearing’ Oxford
University Press, New York
- Kolb,B & Wishaw, IQ (2001) An Introduction to Brain and Behaviour.
New York: Worth Publ. (612.82 KOL)
- Konishi M, 1986 "Centrally synthesized maps of sensory space"
Trends in Neuroscience 4/86 163-168
- Pierce, J.R., 1983, 'The Science of Musical Sound', Freeman, New York (
VWBB Pie)
- Shepard, R. N. (1964). "Circularity in judgements of relative
pitch," J. Acoust. Soc. Am.; p 2346-2353
to download a pdf copy of lecture slides, click here
back to course
outline
last update
8-03-2015
Johannes
M. Zanker