PS3060: Perception and Action
Term II, MONDAY 10 – 12 am (Room 128 Wolfson)

Lecture 3: Driving a vehicle: control heading, avoid collision, brake

Course co-ordinator: Johannes M. Zanker,, (Room 218)

Topics Lecture 3:

Development and the ecological approach to vision

Gibson’s (Gibson 1979) :  two novel aspects to analyse vision in biological systems

insects: can be regarded as low level systems operating like automats; there is extensive evidence for direct visual control mechanisms (lecture 2)
---  what about humans? they command a much more elaborate behavioural repertoire, which includes complex control, planning, decision making

>>> therefore it is interesting to look at development of behaviour; responding to stimuli adequately without previous experience could be interpreted as ‘direct perception’

The visual cliff

it is crucial for all terrestrial (walking, climbing) animals not to drop from large heights (rather trivial, isn't it? but is it simple to achieve?)
>> does this behaviour need to be learned? what is the crucial visual information?


  • human babies from earliest crawling age (6 months) avoid to cross the deep side
  •  increase pattern size on deep side (both sides now have identical texture size in static retinal images) > babies still avoid the deep side > motion parallax is used as cue in the absence of texture cues
  •  decrease pattern size on one of two shallow sides (now both sides have same motion parallax but different texture) > babies avoid smaller texture > texture size used as cue in the absence of motion parallax

visual cliff paradigm (Gibson and Walk. 1960):
move along a platform with two sides:
deep and shallow, covered by (invisible) glass

two sources of visual information : texture gradients and motion parallax (flowfields)
sensory mechanisms mature faster than locomotor activity: used directly, no experience  required
innate mechanisms: comparative experiments with dark-reared animals

Posture control

standing up on the legs is sometimes regarded as the defining moment of humanity – does staying upright require to learn control mechanisms ?
mechanical (proprioceptive, vestibular) and visual (exteroceptive) signals are used to control standing up
(please note that there are variations in the use of terminology)  
experimental approach: moving (swinging) room leads to adjustment of body posture:
  • sway 26% 
  • stagger 23%
  • fall 33%
  • this is not just an avoidance response, it can be observed for a moving room looming towards or moving away from the infant
    we can observe visual tuning of the mechanosensory information, which will change during growth

    visual input to posture not exclusively related to standing  :
    similar results for standing and for sitting (younger infants) (Butterworth and Hicks 1977)
    infants use visual feedback before learning to stand or walk >> this control system is not acquired in the context of walking !


    perhaps vision has a more general function in the control of posture?
    >> investigate mechanical and visual information about self-motion (egomotion) in adults: ‘kinaesthesis’ – the sensing of body movement
      (adult) participants are walking or standing on a trolley in a swinging room which are mechanically coupled to each other (Lishman and Lee. 1973)

      adult subjects don’t fall over, but can report perceived egomotion  !!!

    •  moving blind: perceived as moving
    •  moving together with environment: perceived static
    •  static in moving environment: perceived moving in opposite direction

    this pattern of result suggests that proprioceptive and visual information is used, and that visual information dominates in cases of conflict
    similar for active movement  >>> systematic pattern of  speed misjudgements
    • walking & driving trolley and amplified environment: perceived as increased speed
    • walking on fixed trolley with environment attached: trolley perceived as moving
    • walking on fixed trolley with amplified environment attached: perceived as moving backwards

    'kinaesthesis' - the sensing of body movement is based on a sophisticated interaction between vestibular, visual and motor command information

    Speed perception

    simple psychophysics demonstrates a wide range of speed misperceptions under laboratory conditions
    misjudgements of speed
    : ambiguities are generated by size, distance, field of view  (e.g., Brown. 1931, Zanker & Ryan 2001)

    what are the consequences for driving, i.e. in more complex control situations taht are closer to real life ? (Denton. 1980)

    Estimating distance travelled

    in virtual reality experiments, participants are moving along a corridor; they are asked to stop at a defined target distance which was indicated to them before the onset of their journey >> they estimate travel distance from simulated optic flow (Redlick et al. 2001) 
    (experiments like bee tunnel !!)

    target distance vs. travel distance:   

    • undershoot with constant velocity
    • with constant acceleration (if supra-threshold for vestibular system): good approximation of veridical distance (even without actual movement)

    it is concluded from these experiments that humans can use optic flow for measuring travelled distance (just like honey bees...) !!!

    Mixed sensory input – crossmodal interactions

    how do humans integrate different sources of sensory information?
    >> combined visual and vestibular stimulation in the corridor / virtual reality environment (Harris et al. 2000)

    judgement of travelled distance by being moved in the dark (‘actual physical distance travelled’),
    after different presentation of target distance (‘perceived distance of travel’):

    • good match when targets have been presented physically (perceptual gain approx. 1)
    • substantial overestimation of distance travelled (undershoot) when targets have been presented visually (perceptual gain approx. 2)

    visual motion (virtual) shows the inverse effects : visually targets are matched well (gain approx. 1), physical target distances are substantially underestimated (gain approx. 0.25)

    => different sensory signals are processed with different gains, good performance only in combined or consistent type of information !

    Adaptation to missing sensory input

    locomotion speed perception shows impressive plasticity,
    as demonstrated by surprising adaptation effects to extended treadmill exercise (Pelah and Barlow 1996): visual motion illusion from running !

    perceived walking speed is measured after 10 minutes of jogging on treadmill, i.e. in the absence of visual feedback (no flowfield) :
    subjects are asked to walk up and down the room at constant speed after this adaptation period
    walking speed increases in test period (a) >> initially overestimated (like walking on conveyer belt) >> interpreted as aftereffect of decreased visual feedback during adaptation
    no such effect without adaptation period (c: demonstrates that this effect is not just natural decay) or after running outdoors with proper visual feedback (b: demonstrates that this effect is not just fatigue)

    => previous experience of flowfields does affect the perceived walking speed,
    i.e. there is limited use of proprioceptive information (compare driving results! howevre, no proprioceptive information available in a vehicle)

    Judging the time of collision of approaching objects

    the time-to-collision (TTC, time to impact, etc.) of objects that approach on frontoparallel trajectories at constant velocity, can be estimated from simple optical parameters, in the absence of other information (depth, distance, velocity, etc.)

    critical variable tau: inverse of relative expansion rate         

    going back on observation by Fred Hoyle about approaching planets (The Black Cloud, 1957)

    rediscovered in the spirit of Gibson by D. N. Lee: various animals (plummeting gannets, Lee and Reddish. 1981; landing pigeons, Lee et al. 1993) seem to be using such variables to trigger responses (see also landing response of flies, lecture 2)

    experiments with simulated approaching objects of variable size, speed, travel distance provide evidence that this variable can actually be extracted by humans (Todd 1981, Regan and Hamstra 1993)


    during braking, human drivers exhibit characteristic deceleration profiles which do not seem to depend much on the initial speed (Spurr. 1969)

                                                what is the general strategy?

    based on the time-to-collision geometry, leading to the simple optical variable tau, a mathematical theory of safe braking using visual control is developed (Lee. 1976)

    zero velocity at the intended stopping point requires the driver to keep the temporal derivative of tau (tau-dot, change of tau) above a critical value of –0.5 (right panel, all values normalised)

    note that tau-dot values larger than -0.5 generate a monotonous decrease of deceleration, larger initial deceleration and longer stopping time!
    (safety margins)
    the average data measured for drivers (Spurr. 1969) are well approximated by a deceleration profile with a critical tau-dot of –0.425 !


    this is interpreted as evidence that humans use a close to optimum tau-dot strategy when stopping for a static obstacle

    Size cues

    using simple optical variables to control braking allows drivers to avoid extracting variables like size, speed, distance, that are computationally much more demanding
    ------ -------- is braking really that simple?

    it has been suggested that in realistic driving situations more cognitive strategies can be used
    and that knowledge of familiar size of pedestrians in particular may play a role (Stewart et al. 1993)
    • simulated approaches towards static objects of different size (‘child’ or ‘adult’) indicate that timing errors grow with longer TTCs
    • more importantly, larger objects lead to underestimation, and smaller objects to overestimation of TTC

    such size effects led to suggestions of providing absolute size cues at critical points to improve traffic safety

    other criticism of ‘tau strategies’ have been raised,
    and it is obvious that, effects from other flow variables are abundant and various other information sources are used to estimate TTC (Tresilian 1999)


    collision avoidance is not the only task that is crucial for traffic safety – it is clearly as important ‘simply’ to stay on the road
    how can visual information be used to solve this task?

    Gibson’s original notion of flowflieds already conceptualised that the structure of velocity vectors provides rich information about the direction of heading
    • moving and looking straight ahead leads to a characteristic expansion pattern with the centre of flow (focus of expansion, pole) in the centre of the retinal image, i.e in the fovea (fig A)
    • when a moving observer is not looking in the direction of translation, the pole of the flowfield is located outside of the fovea
    • eye movements, as elicited by fixating an object on the ground, leads to characteristic distortions of the flowfield (shearing) and disparity between pole and heading direction (fig D)
    • eye movements, as elicited by tracking a moving object, leads to similar distortions of the flowfield (superposition of rotational component) and disparity between pole and heading direction (fig G)

    extraretinal signals (eye movement signals) could be used to solve this ambiguities,
    but on the other hand the structure of the flowfield can provide sufficient information itself (Lappe et al. 1999)

    presenting mixed flowfields (translation + rotation) to a fixating eye (no extraretinal signals) lead to conflicting results when observeres are asked to judge heading direction (estimation error as function of simulated eye movement component)

    ... and to a continuing debate

    Shifting targets in real locomotion

    prisms shift the angular position of the retinal image – an observer approaching such a displaced target would start walking in the wrong direction, thus increasing the error angle, which should lead to a continuous angle and path correction

    note that the flowfield pole is shifted together with the target by the prism

    an observer using the focus of expansion (FOE) for navigation, minimising the difference between FOE and target location, would walk on a straight path

    an observer using the displaced landmark for navigation would rotate such that the landmark is positioned in the fovea and start walking in the wrong direction, keeping a constant error angle (grey), which should lead to continuous path correction and a curved path

    curved paths observed under such conditions are interpreted as evidence against the use of flowfields to judge heading during walking (Rushton et al. 1998)

    (however, it has been pointed out that the location of the pole outside the fovea as such could be interpreted as heading error, as well, which then would need to be translated into the motor system, leading to similar curved paths: see Harris & Rogers 1999 & reply by Lappe 1999)

    Staying on the road

    what do drivers do when steering a car through the real world?
    the eye and steering movements of drivers have been recorded while negotiating a ‘tortuous’ road, suggesting simple geometrical strategies that can produce adequate driving stability (Land and Lee. 1994)
    drivers are found to keep their gaze in the direction of the tangent point of a curve for a large proportion of the time

    this is thought to be an important point because its angle relative to the car’s heading is a good predictor of the curvature of the road – keeping a constant angle is a simple pragmatic rule to keep the car on the road! (Land 2001)

    key reading:

    comprehensive reference and reading list:

    some study questions

    download lecture handout

    back to index page
    last update 23/02/2004
    Johannes M. Zanker