The ability to determine the location of a sound source is fundamental to hearing. However, auditory space is not represented in any systematic manner on the basilar membrane of the cochlea, the sensory surface of the receptor organ for hearing. Understanding the means by which sensitivity to spatial cues is computed in central neurons can therefore contribute to our understanding of the basic nature of complex neural representations. We review recent evidence concerning the nature of the neural representation of auditory space in the mammalian brain and elaborate on recent advances in the understanding of mammalian subcortical processing of auditory spatial cues that challenge the “textbook” version of sound localization, in particular brain mechanisms contributing to binaural hearing.
The ability to locate the source of a sound is critical to the survival of a wide range of species. From their appearance as primarily nocturnal animals more than 200 million years ago, mammals relied heavily on sound localization abilities to achieve this, and to this day, locating the source of a sound remains an important sensory ability for prey and predator alike. Spatial hearing also contributes to human communication, for example, by providing cues as to the relative number and location of sources and objects in the environment, helping determine the dimensions and characteristics of rooms and enclosed spaces, and contributing to the “cocktail party effect,” whereby listeners are able to hear out speakers against other, interfering, voices in crowded listening environments1 (14).
In terms of its neural processing, sound localization is highly complex but, nevertheless, represents a well-established model by which the principles of neuronal computation might be explored. Here, we review recent evidence concerning the nature of the neural representation of auditory space in the mammalian brain, concentrating on subcortical structures generally considered to be specialized for processing auditory spatial cues, and elaborating on recent advances in the understanding of the mammalian auditory system that challenge the “textbook” version of sound localization, particularly brain mechanisms contributing to binaural hearing. We focus on several advances that have altered our understanding of how sound localization is achieved in mammals.
The neural representation of auditory space is apparently not confined to the form of a topographic map generated by a labeled line system, the so-called “space-map”; rather, source location appears to be represented by a population of relatively broadly tuned spatial channels in both brain hemispheres.
While coincidence detection of binaural inputs in the submillisecond range remains a basic feature of sound localization mechanisms, the historic view that neural tuning for preferred spatial locations based on interaural time differences (ITD) arises by means of purely excitatory axonal “delay lines” does not appear to hold in mammals.
The binaural auditory system in mammals is less “hard-wired” than has been imagined and appears instead to be highly dynamic, able to adjust rapidly its tuning properties to take account of the context in which sounds are heard.
Sound waves impinging on the ear result in a single, one-dimensional movement of the tympanum (eardrum), irrespective of whether the sound is a simple sinusoidal wave emitted from a tuning fork, a mixture of temporally and harmonically related sounds produced by a jazz combo, or a complex muddle of voices at a cocktail party (Fig. 1). By analyzing and comparing the one-dimensional movements of the two eardrums, the auditory brain extracts the relevant physical cues in what we perceive as auditory space, synthesizing auditory objects embedded in that space (Fig. 1). Unlike the visual (retinotopy) and somatosensory (the sensory “homunculus”) systems, this analysis of auditory space is achieved without recourse to any explicit representation of auditory space on the receptor surface; rather, the sensory epithelium of the inner ear, of hair cells arranged along the length of the basilar membrane, are systematically ordered according to the frequency of a sound (tonotopy) rather than its spatial location or, indeed, any further specific features of an object. For that reason, the representation of auditory space is, to a large degree, computed in the central auditory system by converging inputs from the two ears, inputs that of themselves contain no explicit spatial information, onto single neurons that lie deep within the brain stem. The specialized cellular properties and microcircuits of the lower auditory pathways permit a detailed and high-resolution analysis of the physical sound parameters, including temporal features that lie in the submillisecond range.
Sound localization in mammals is based on two very different means of analyzing the acoustic waveform (15). The first constitutes a spectral analysis in which the comparison of sound energy across different frequency bands arriving at each ear provides for sound-localization abilities in the vertical dimension (including distinctions between sources to the front from those behind; Fig. 2A). Although better performance based on frequency spectra may be possible using both ears, it represents an essentially monaural cue for sound localization, generated largely by the direction-specific attenuation of particular frequencies by the pinna and concha of the outer ear. The second means by which sound localization is achieved is based on detecting and comparing differences in the movement of the two eardrums. This binaural computation, which takes place mainly within narrowband sound-frequency channels, underlies sound localization in the horizontal dimension. Two interaural differences are available to such binaural analysis. First, sounds not arising directly from in front (or behind) arrive earlier at one ear than at the other, creating an ITD (Fig. 2B). Second, for wavelengths roughly equal to, or shorter than, the diameter of the head, a shadowing effect is produced at the ear further from the source, creating an interaural intensity, or level, difference (IID or ILD, respectively) (Fig. 2C; Refs. 192, 245).
It is important to note at this juncture that other nonmammalian vertebrates, and some insects, have evolved different solutions to the problem of sound localization. Some fish, for example, exploit particle motion caused by near-field sound stimulation directly exciting hair cells in the inner ear which are oriented parallel to the motion of the water particles. As different hair cells are oriented in different directions, the excitation pattern of the population of hair cells allows the fish to locate sound sources (reviewed in Ref. 52). Certain flies possess hearing organs located at the center of the frontal thorax that, by mechanical means, enhance minute ITDs by several orders of magnitude, rendering them amenable to neuronal processing (140; reviewed in Ref. 200). A further example is the frog ear, which effectively operates as a pressure difference receiver: sound impinges on the frog's tympanic ears not only directly from outside, but also indirectly through the mouth and, via the open Eustachian tubes, from the opposite ear (54–56, 196, 197). As a consequence, the relatively small differences in distance traveled by the sound to the inner and outer sides of the eardrum generate location-dependent interference patterns in the movement of the tympanum, ultimately creating directional sensitivity at the level of the receptor organ (56). Reptiles and birds use similar means of “widening” their effective interaural distance by acoustically (mechanically) coupling the tympani via a rigid tube that connects the middle-ear cavities (reviewed in Ref. 35). Accordingly, reptiles and birds experience larger ITDs than do mammals with equivalent head sizes (97, 205).
In this context, it is important to note that, within the group of tetrapods, tympanic ears evolved independently in frogs and mammals, and likely two or even three times in “reptiles” (Fig. 3). Birds inherited tympanic ears from their archosaur ancestors.2 Interestingly, the parallel evolution of tympanic ears in these groups occurred roughly around the same time, during the Triassic (∼210 million years), more than 100 million years after the lineage that eventually produced mammals split from the lineages from which frogs and “reptiles” evolved (4, 36, 37). To this end, since no common ancestors of these groups existed capable of detecting air-borne sounds, mechanisms for sound localization necessarily evolved independent in these different groups of tetrapods. This offers a cautionary tale: although much may be learned from comparative investigation of sound localization in birds and frogs (as well as crickets, grasshoppers, and other animal models that have significantly advanced our understanding of auditory processing), the very fact that brain mechanisms for sound localization evolved independently across these different groups means that principles of neural processing underpinning localization abilities in one species do not necessarily explain how this task is achieved in another. To this end, the elegant “textbook” understanding of auditory spatial processing, based, as it is, largely on findings in birds, cannot per se be assumed to explain the function of the mammalian brain with respect to sound localization (76).
Below, we discuss three recent shifts in our understanding of spatial hearing in mammals, reserving discussion of other animal groups, including birds, for comparative purposes only.
II. ACOUSTIC CUES FOR SOUND LOCALIZATION IN MAMMALS
A. Cues for Sound Localization
Many mammals, including humans, make use of the two binaural cues, ITD and ILD, to perform sound localization with an accuracy of just a few degrees (15, 19). Brain mechanisms underpinning such accuracy have been the subject of investigation since at least the middle of the 19th century. At that time it was understood that ILDs constituted potential cues for locating the source of a sound, but it was equally well recognized that these cues become vanishingly small as the frequency of sound is reduced, i.e., as the wavelength increases relative to the size of the head. Whilst the possibility that ITDs also constitute a potential cue was recognized, it was widely considered that such tiny differences, limited as they are to the submillisecond range, lay beyond the resolving power of the human brain. To this end, Rayleigh's (1907, Ref. 192) confirmation of Thompson's (1882, Ref. 245) earlier work demonstrating that interaural phase differences (IPDs) of low-frequency signals were indeed detectable by human listeners led to formalization of the duplex theory of sound localization; ILDs were employed in high-frequency localization tasks and timing differences (IPDs or ITDs) for localization of low-frequency sounds.3
In addition to binaural cues, the auditory system exploits frequency-specific modifications in the magnitude and phase of the sound reaching the eardrum that arise from the interaction of the sound with the head and the ears, to determine source location in the vertical plane. These spectral cues for localization underpin the ability to disambiguate the so-called cone of confusion, resolving sources in front from those behind as well as determining their elevation, a task not possible using binaural cues alone (15). The function describing these spectral modifications, which are generated largely by the pinna and concha of the outer ear, is referred to as the head-related transfer function, or HRTF (colored traces in Fig. 2A). Spectral cues for localization consist of notches (or changes) in the sound spectrum at specific frequencies; the exact frequency and magnitude of the notch changing as the location of the source shifts in elevation (19). Spectral cues can be manipulated by modifications to the external ear, and the extent to which birds and mammals, including humans, can adapt to the altered cues that arise from these manipulations demonstrates their importance in localization tasks. For example, immediately following insertion of ear molds that permit the passage of sound to the tympanic membrane, but alter the shape of the concha, human listeners' performance is degraded to the extent that they have trouble distinguishing the location of sounds in the vertical plane (although left-right discriminations appear unaltered). Over the course of several weeks, however, performance improves such that by the end of ∼1 mo, listeners have adapted almost completely to the altered cues. Interestingly, subsequent removal of the ear molds leaves no residual deficit in performance, suggesting that sensitivity to spectral cues represents a learned behavior (96). This is perhaps not surprising given the multitude of complex sounds that might arrive from many different directions, with various features of the source, the environment, and the head itself all contributing to the final sound impinging on the tympanum.
B. Human Sound Localization: Resolutions and Limits
Humans show remarkable abilities in sound localization, discriminating changes of just 1–2 degrees in the angular location of a sound source (13, 118). Studies under listening conditions using headphones (which enables the isolation of specific cues) confirm the remarkable accuracy of human spatial hearing, with thresholds as low as 10 μs for ITDs (119) or 1–2 decibels for ILDs for presentations of binaural clicks (254). For the ITD cue at least, this represents a level of resolution that must be considered with respect to the millisecond duration of nerve action potentials transmitting information in the brain. Threshold sensitivity to ILD, whilst impressive, is no better than sensitivity to sound level per se (151).
Rayleigh's classical distinction of the duplex theory (1907, Ref. 192), separating cues for auditory spatial processing into low- and high-frequency processes, was seemingly confirmed by Stevens' and Newman's (1934, Ref. 233) investigations of absolute localization performance in humans, in which blindfolded listeners were required to indicate the location of a speaker delivering pure tones of various frequencies. Discounting sources to the front and back, where ITD information is ambiguous, performance was best for frequencies below ∼2 kHz, and above 5 kHz, with intermediate frequencies eliciting the highest number of localization errors, suggesting an incomplete representation of either cue over this range.
Psychophysical studies using headphone stimulation have sought to determine the range of ITD detectors represented in the human brain. Based on measurement of ITD discrimination thresholds and binaural unmasking, the range of ITDs encoded is considered to be 1) roughly constant across the range of sound frequencies at which sensitivity to ITDs in the fine-structure of sounds is observed (<1,500 Hz) and 2) set largely by the physiological range (±700 μs in humans) but with ITDs up to at least ±3,000 μs explicitly encoded in each sound-frequency band (251). ITDs longer than the physiological range can be experienced over stereo headphones, arising naturally when multiple sound sources interact, and when acoustic reflections are present (although at high fluctuation rates; compare Ref. 150).
III. NEURONAL MECHANISMS FOR SPATIAL PROCESSING IN THE MAMMALIAN AUDITORY SYSTEM
Mechanical deflections of the eardrum in response to sound stimulation are transmitted to the cochlea via the three middle ear bones, the ossicles, which act as an impedance-matching device effectively coupling air-borne sound to the fluids of the inner ear. Hair cells in the organ of Corti, the sensory epithelium that runs the length of the basilar membrane of the cochlea, transduce mechanical energy into bioelectric activity, and this activity, in the form of nerve action potentials, is transmitted via the auditory nerve to the central auditory nervous system (Fig. 4). The central auditory nervous system can be considered as consisting of different ascending streams, or functionally segregated pathways. For instance, three distinct pathways project from different subdivisions of the cochlear nucleus, the first synaptic station in the auditory brain stem (206). The subdivisions that make up the cochlear nucleus comprise several neural subtypes with very different temporal discharge patterns (166, 198). Bushy cells in the ventral cochlear nucleus, for instance, represent a critical stage in the binaural pathway, responding faithfully to the temporal structure of sounds, either by phase-locking to the fine structure of sounds (up to ∼3 kHz) or to the envelope of high-frequency sounds. In fact, due to a process of monaural coincidence detection, their responses appear even more temporally precise than those of their auditory nerve inputs (106). Bushy cells target the principal nuclei of the superior olivary complex (SOC) (135, 235, 257; review in Ref. 209), where the binaural comparisons that underlie ILD and ITD processing occur (68). The SOC outputs target the dorsal nucleus of the lateral lemniscus and the inferior colliculus (IC) (1, 22; review in Ref. 79). In contrast to the binaural pathways, monaural pathways originate from cell types in the ventral and dorsal cochlear nucleus, largely targeting the lateral lemniscus or, via a direct pathway that bypasses the remaining brain stem nuclei, the IC on the opposite side of the brain (209). The IC itself represents an obligatory synaptic station for almost all ascending pathways. It is characterized by a high degree of convergence and by rather complex response properties, at least compared with those found at lower auditory stations. It would appear that at the level of the IC, all acoustic cues have been processed and filtered into separate streams, forming a basis for object recognition, which can be attributed to the next synaptic levels in the thalamocortical system (Fig. 4). In this review we concentrate largely on those pathways and nuclei up to the level of the IC, whilst higher processing is not considered.
A. The Neuronal Basis for Spectral Integration: the Basis for Sound Localization in the Vertical Plane
A significant amount of literature exists detailing the responses of neurons in the dorsal division of the cochlear nucleus (DCN). DCN neurons appear particularly specialized for processing spectral cues, whilst more recent investigations have begun to examine neural coding of spectral cues for localization in the IC, including how coding of spectral cues is modified between the lower brain stem and the IC. Responses of the so-called “type IV” neurons of the DCN appear to be determined by a dedicated neural circuit within the DCN itself (167) (Fig. 5A1). Type IV neurons show a small “island” of near-threshold activation around their characteristic frequency (CF; the frequency at which threshold for tone-evoked responses is lowest), with a larger central inhibitory area (CIA) at higher sound intensities (49, 99) (Fig. 5B, left column). The CIA can provide for inhibition more broadly tuned than the excitatory response area of type IV neurons through convergent input from multiple (differently tuned) type II DCN neurons (e.g., Ref. 229). The convergence of excitatory inputs from primary auditory nerve fibers (ANFs) and inhibition derived from type II DCN neurons (themselves also thought to be the target of Onset-C inhibition; Ref. 85), renders type IV neurons particularly sensitive to (inhibited by) notches in the acoustic spectrum, presumably those generated by the interaction of sound with the head and pinna (99, 276). How might such neurons encode the potential cues for sound-source elevation such notches provide? The answer lies in understanding the output neurons to which type IV neurons project. Neurons in the IC described as “type O,” they possess a circumscribed frequency-versus-intensity response area, are the main target of DCN type IV neurons (49) (Fig. 5A2). When stimulated with pure tones, type O neurons, like type IV neurons, show a largely inhibitory receptive field, with a small “island” of excitation at low stimulus intensities. However, when stimulated with broadband sounds containing a spectral notch, mimicking the effect of the directionally dependent HRTF, they respond with essentially the opposite characteristics to type IV neurons in the DCN, showing considerable excitatory responses for a single notch frequency, particularly at higher sound intensities, flanked by inhibitory regions generated by all other notch frequencies (49, 189) (Fig. 5B, right column). Thus IC neurons appear to show an essentially unambiguous response to the frequency of a spectral notch. It has been argued (49) that the pathway from type IV neurons in the DCN to type O neurons in the IC is uniquely specialized for processing directionally dependent spectral features generated by the HRTF. Furthermore, modifications in the processing of spectral cues between DCN and IC, specifically the inversion of excitatory and inhibitory responses, is likely provided for by convergent GABAergic inhibition at the level of IC (48) (Fig. 5A2), which also performs the role of sharpening neural selectivity for the frequency of the spectral notch.
B. The Binaural System: the Basis for Sound Localization in the Horizontal Plane
The two binaural cues employed in sound localization tasks in the horizontal plane, ILDs and ITDs, are usually considered to be processed separately by neurons in the two principal nuclei of the SOC, the lateral and medial superior olives (LSO and MSO, respectively). Under the relatively simple assumptions of the duplex theory (192), and the long-standing, yet erroneous, assumption that mammals inherited tympanic ears from reptiles with hearing limited to the low-frequency range, Masterton and Diamond (1967), in a seminal paper (141), set the theoretical frame for many years of research in binaural processing. These authors suggested that the MSO was initially well developed in all mammals but was gradually lost in those mammals that subsequently evoked high-frequency hearing only, for example, in echo-locating bats. In contrast, the LSO, concerned only with the processing of ILDs at high frequencies, was underdeveloped in mammals with largely low-frequency hearing, and in humans may even be absent altogether (for details, see sect. iiiB1). Despite this attractively simple perspective, however, evidence from the fossil record does not support the assumption of an originally reptilian-like ear in early mammals (4, 36, 37), (see Fig. 3 and sect. i). Furthermore, despite general acceptance, the dichotomy suggested by the duplex theory is not strict. First, significant ILDs can occur for low-frequency sounds located in the near field (21, 217). Second, extensive psychophysical evidence indicates that sensitivity to ITDs is conveyed by the envelopes of high-frequency complex sounds (11, 47, 94, 119, 148, 273). In this regard, recent studies have shown that when provision is made for temporal information in the envelopes of high-frequency modulated tones to match as closely as possible temporal information normally present in the output of low-frequency channels, ITD discrimination thresholds can be as good as, and in some cases surpass, those for low-frequency tones (12).
Taken together, these findings support the notion that the duplex theory describes more the frequency dependence of the two binaural cues rather than an absolute segregation into two distinct, nonoverlapping channels. In fact, brain mechanisms processing the binaural cues appear to be present across the entire frequency range of human hearing. To this end, the need for a refined theory of binaural processing, one that also encompasses its evolutionary development, appears timely. Such a theory must also account for emerging data sets describing the physiological nature of binaural processing, as well as the means by which spatial cues are represented in the human brain (compare Refs. 73 and 76).
1. Neuronal mechanisms of ILD processing
From several lines of evidence, it is reasonable to assume that ILDs were the only binaural cue used by early mammals. 1) It appears that the mammalian tympanic ear, when it first appeared some 200 million years ago, was mechanically tuned to frequencies between 4 and 18 kHz (152, 204) and subsequently evolved rapidly to incorporate higher frequencies during the early evolution of mammals (70, 203). Hence, it is likely that these animals experienced prominent ILDs suited to sound localization across their entire hearing range. 2) All terrestrial mammals investigated thus far, including echo-locating bats, possess well-developed LSOs that seem to correlate in size with the range of frequencies an animal can hear (156). 3) All mammals appear to share one common neural mechanism for processing ILDs (267),4 4) requiring considerably less temporally precise cellular and circuit properties than does the processing of ITDs (see below). 5) The earliest mammals were very small and experienced ITDs produced by sound arriving directly from a source of up to ∼50 μs only (73).5 As such, the ability to process ITDs evolved either by means of a sudden improvement in temporal resolution of the auditory brain stem (by two orders of magnitude) or, more likely, gradually, and consequently much later, during evolutionary development (73, 76). 6) ITD processing, with few exceptions (64), appears to be realized only in low-frequency hearing mammals and, hence, in a minority of mammalian species: those that employ low-frequency communication calls (all larger terrestrial mammals) and/or those that reside in open environments and benefit from hearing over larger distances, for example, to detect predators [e.g., some desert rodents (132, 133, 258–260)]. These groups must localize sounds for which no appreciable ILD is generated.
The initial site of ILD processing is generally considered to be the LSO, with LSO principal neurons innervated by direct excitatory inputs from spherical bushy cells (SBC) of the ipsilateral cochlear nucleus (CN) (Fig. 6, A and B; Refs. 29, 209, 257). Bushy cells combine inputs across several auditory nerve fibers, relaying their pattern of nerve action potentials to neurons in the LSO. At higher sound frequencies, these spike patterns resemble the “primary-like” patterns of primary auditory nerve fibers, characterized by a large, rapid response at stimulus onset and a subsequent decline in response (“spike-frequency adaptation”) to a steady state over the remainder of the stimulus duration (197a). If the stimulus is modulated over time (for instance, in amplitude), the spiking activity of SBCs is locked to the stimulus envelope up to modulation frequencies of at least 1,000 Hz (106). For low-frequency sounds (up to a few kHz), SBC spiking is locked to the fine structure of the stimulus (65, 106, 113). LSO neurons also receive inhibitory inputs indirectly originating from the globular bushy cells (GBCs) of the contralateral CN. Globular bushy cells are very similar to SBCs, but with the significant difference that they are innervated by a larger number of ANF inputs and show, if anything, superior temporal precision to SBCs (106, 108). They also possess the thickest diameter axons of any nerve fiber in the auditory system (160). GBCs project to the medial nucleus of the trapezoid body (MNTB) on the contralateral side (89), forming synaptic contact via calyciform synapses (calyx of Held; Refs. 93, 161) onto glycinergic (262) MNTB neurons, which in turn project to the LSO at the same side (62, 142, 207, 226) (Fig. 6, A and B). The calyx of Held, in combination with a battery of potassium conductances that prevent temporal summation of inputs in MNTB cells (reviewed in Ref. 255), renders the MNTB a high-fidelity station in the ascending auditory pathway (95, 125, 149, 224), converting well-timed excitatory inputs into well-timed inhibitory outputs. MNTB cells also project to a range of target nuclei other than the LSO, including the ipsilateral MSO (see below).
The convergence of excitatory inputs from the ipsilateral CN and inhibitory inputs from the opposite CN via the MNTB resembles a relatively simple subtraction process (159) (Fig. 6B), creating the well-described ILD sensitivity of LSO neurons (16, 26, 83, 176). These functions are usually sigmoid in form, with neurons completely inhibited when the sound at the contralateral, inhibitory, ear is more intense (“negative ILDs”) and maximally responsive when the sound at the ipsilateral, excitatory, ear is more intense (“positive ILDs”) (review in Ref. 267; Fig. 6C).
Although temporally precise inputs have been demonstrated not to be required for ILD sensitivity of the sustained response of neurons in the LSO, timing does in fact appear to be important for generating ILD sensitivity at stimulus onset and, generally, to LSO neurons tuned to low-frequency sounds (77, 101, 176). In particular, increasing evidence exists to suggest that, based on their known excitatory-inhibitory (EI) interaction, low-frequency LSO neurons are sensitive to ITDs, with a resolution almost comparable to that of neurons in the MSO (247). In contrast to MSO neurons, however, coincidence of ipsilateral excitatory input with the contralateral, inhibitory input generates response minima (“troughs” in the rate versus ITD functions); peak activity occurs when inputs are noncoincident (or maximally out of phase in response functions) (159, 247, 272). This issue is discussed in more detail in section iii, B and C3.
As suggested above, the two major inputs to the LSO are specialized for high-fidelity temporal transmission. Nevertheless, exquisite timing of inhibitory influences appears not to be a prerequisite for the subtraction mechanism underpinning ILD sensitivity, at least for high frequencies where phase-locking is not an issue. Interestingly, the contralateral input to LSO neurons must traverse a greater axonal distance to reach the LSO including, in the process, an additional synaptic stage via the MNTB. One might therefore expect that contralaterally derived inhibition should arrive later in the LSO (a similar argument has traditionally been used to explain why MSO neurons prefer contralaterally leading sounds, potentially compensating for a longer axonal delay). Nevertheless, Tsuchitani (1988, Ref. 249) described the inhibitory effect in cat LSO neurons to be strongest at onset, and to adapt during ongoing stimulation. Hence, by some means, the GBC-MNTB pathway compensates for the longer distance contralateral axons must span before innervating their target neurons in the LSO. Possible explanations could be the shorter latency and larger axon diameter of GBCs compared with SBCs (review in Ref. 209), and a minimized synaptic delay to the MNTB due to the giant calyx of Held (review in Ref. 255). Delaying the contralateral stimulus in the range of a few hundreds of microseconds results in the ipsilaterally generated excitation preceding the contralaterally generated synaptic inhibition sufficiently to evoke at least a single action potential (101, 174, 176, 178). This may be related to the well-known phenomenon of “time-intensity trading,” whereby ITDs leading at one ear can be used to compensate for ILDs favoring the other, and vice versa (50, 264, 274). Time-intensity trading can also be observed neuronally (20, 77, 187, 271) and, in a significant proportion of LSO neurons, can produce different latency shifts for the two binaural inputs, depending on the overall intensity (101, 176; see also Ref. 271 for similar responses in the IC). In this context, it is important to note that this effect most likely depends on the stimulus envelope, i.e., largely on the overall energy accumulated, corresponding to the integral of the envelope of the waveform rather than the absolute sound level (92).
The LSO, which has no homolog in other vertebrates including birds, is the primary site for processing ILDs in the mammalian auditory system. Nevertheless, there is no reason a priori why neurons in any brain center in which inputs from the two ears could conceivably converge onto individual neurons, should not also constitute sites at which neural sensitivity to ILDs is generated (Fig. 6A). Consistent with this notion, ILD sensitivity generated de novo has been demonstrated in experiments in which the ILD sensitivity of IC neurons was abolished by blocking the action of GABAergic inhibition locally (179). A likely source of GABAergic inhibition to the IC is the cross projection from the dorsal nucleus of the lateral lemniscus (DNLL), and removing this connection by sectioning its projecting axons or by pharmacological inactivation of the DNLL itself, leads to a release from binaural inhibition in IC neurons (24, 102, 134). Furthermore, a greater proportion of IC than LSO neurons shows ILD sensitivity that is stable with average sound intensity (177). Whereas ILD sensitivity in LSO neurons is characterized by sigmoidal ILD functions that shift in a systematic manner with increasing intensity to the excitatory ear, those in IC are less affected. This is consistent with at least some of the inhibitory effect in IC being derived from a source other than the inhibitory input to the LSO (177). Nevertheless, given the sufficiency of the LSO in producing neural sensitivity to ILDs, it seems likely that ILD sensitivity in the majority of IC neurons reflects LSO input, with modifications to this input provided by mechanisms local to the IC.
Current evidence, anatomical and physiological data from experiments conducted in a wide variety of mammals and psychophysics of ILD processing in mammals including humans, is consistent with the view that brain mechanisms for processing ILDs are highly conserved, possibly for almost 200 million years. Nevertheless, an ongoing controversy concerning underlying brain mechanisms in humans must be addressed if this consistency is to be accepted. This controversy relates to a number of anatomical studies seemingly demonstrating the lack of a MNTB in the human brain (see Appendix A). If this were true, and we argue below that it is unlikely, humans would had to have developed fundamentally different brain mechanisms, compared with all other mammals, by which ILDs are processed particularly as this important source of inhibitory input to the LSO would be absent. That the LSO itself represents a prominent structure even in the human brain has been confirmed by a number of anatomical studies (9, 128, 236) and, since chimpanzees are reported to possess an MNTB (237), the presumed loss of the MNTB in the human lineage would have had to occur within the last six million years. Given that every single mammalian species examined to date possesses a prominent MNTB, as well as an LSO, this appears unlikely indeed. Neurons in the human LSO would either have a completely different function, or a different source of inhibitory input compared with other mammals, a difficult proposition to sustain given that the MNTB is the major source of contralaterally evoked inhibition in the entire auditory brain stem. This issue becomes even more problematic if one considers that the MNTB is also the major source of well-timed inhibition not only for binaural circuits (including also those involved in ITD processing, see below) but also for monaural processing. MNTB projections to monaural neurons in the SOC, for example, are responsible for creating transient “on” and “off” responses (72, 129). Axons from the MNTB branch multiple times, targeting multiple sources within the SOC and beyond (e.g., the dorsal and ventral nuclei of the lateral lemniscus; Refs. 82, 219, 224, 225). Hence, without the MNTB, the human auditory system would have to have evolved completely different strategies not only for processing binaural, but also monaural acoustic cues. In the light of the current direct (see Appendix A) and indirect (above) evidence, there is no reason to assume that brain mechanisms underlying sound localization in humans deviate from the general pattern observed in mammals.
2. Neuronal mechanisms of ITD processing
Neural coding of ITDs demands the highest precision of any temporal process known to exist within the mammalian, reptilian, or avian brain. In essence, neurons must resolve differences in the time of arrival of the sound at each ear that are almost two orders of magnitude shorter than the duration of action potentials bearing that information. Nevertheless, despite the apparently insurmountable challenges posed by this task, different groups of animals have, independently, evolved the ability to use ITDs in sound localization tasks (see sect. i). Sauropsids (“reptiles” and birds), for example, have evolved a mechanical coupling of the two ears that increases the magnitude of ITDs prior to the sensory transduction process in the inner ear (97, 205). Nonetheless, these modifications aside, which appear fitted to enhance the magnitude of ITDs generated by the size of the head, it is the exquisite temporal precision of coincidence detection by the mammalian MSO, and its analogous structure in birds, the nucleus laminaris (NL) (Fig. 7A) that makes possible sensitivity to ITDs in the submillisecond range. For important definitions concerning ITD processing, see Appendix B.
A) THE BIRD NL.
The NL receives bilateral excitation via a systematic arrangement of axonal inputs, the so-called delay lines (171, 180, 275), generating a map of azimuthal space at this primary site of binaural integration (in vitro: Refs. 111, 195; in vivo: Refs. 32, 33) (Fig. 7C). This arrangement of coincidence detectors and delay lines accords with that envisaged by Jeffress in his seminal paper (103) (Fig. 7C). A major addition to this basic arrangement in birds is the apparently tonic (i.e., not phase-locked) GABAergic inhibition (155, 266) that targets NL neurons (131, 244) (Fig. 7, A and B), maintaining binaural coincidence detection at high stimulus amplitude levels (46, 63, 184) and/or compensating for ILDs (23). This is achieved through the depolarizing effect of a Cl− outward current that results in low-threshold K+ channels opening and shunting the membrane current (63, 98, 154, 155). By such means, monaural inputs are kept from generating spike activity in NL at high sound levels that could, theoretically, lead to maximal spike rates even for binaural inputs that are entirely out of coincidence (194). Moreover, a recent study in the chick indicates that axon length, axon diameter, and intermodal distance all contribute to tuning individual delay lines to adjust ITD sensitivity of single NL neurons within the physiological range (213).
B) THE MAMMALIAN MSO.
In mammals, it appears that neurons in both of the major nuclei of the lower auditory brain stem, the MSO and the LSO, are capable of extracting ITD information from their binaural inputs, although the MSO has traditionally been considered the major site of ITD processing. Here, we first describe the MSO, and the ITD sensitivity observed in its bilaterally excited (“EE”) neurons. We then turn our attention to the LSO, where ITD sensitivity is reported for neurons excited by one ear and inhibited by the other (“IE” neurons).
Despite many decades of research investigating the structure and function of the MSO [Ramon y Cajal speculated in 1907 as to the function of MSO in binaural processing (190), and the first in vivo recordings from MSO cells date back to Galambos et al. in 1959 (66)], mechanisms contributing to ITD processing by MSO neurons are still not fully understood. Nevertheless, recent studies employing a range of in vivo and in vitro techniques have extended our understanding considerably. Here, we focus only on responses of neurons in the MSO of low-frequency hearing mammals, although considerable evidence exists suggesting that the processing of “envelope” ITD sensitivity in high-frequency cells follows similar mechanisms, at least in mammals that have good low-frequency ITD sensitivity (71). Envelope ITD sensitivity in mammals with only high-frequency hearing and its evolution is not in the focus of this review and has been dealt with elsewhere (78, 87; review in Ref. 73).
In most mammals with well-developed low-frequency hearing, the MSO is a laminar structure located medially to the more prominent LSO (190, 235) (Fig. 8A). The MSO is organized such that neurons tuned to the highest frequency sounds lie towards the ventral pole of the nucleus, and those tuned to the lowest towards the dorsal pole (68, 84). MSO principal cells typically show bipolar morphology and are arranged in a single para-saggital plane with two major dendrites emerging from the soma 180° to each other and extending orthogonally with respect to the dorsoventral axis of the nucleus (112, 190, 191, 235) (Fig. 8, B and C). Principal cells appear to receive four major, and highly segregated, inputs from different sites within the ascending auditory pathway (112, 263) (Fig. 8, D and E). SBCs from the CN on each side of the brain converge onto single MSO neurons, with ipsilaterally derived inputs synapsing on the lateral dendrites and contralateral inputs on the medial (117, 135, 169, 223, 235, 257). The excitatory nature of these inputs, mediated by glutamatergic transmission, has been confirmed by means of in vitro recordings (34, 80, 81, 136, 211, 222). In addition, MSO neurons also receive bilateral inhibitory inputs (80, 81), largely restricted to the somata of MSO neurons (38, 112, 186, 263) (Fig. 8C). Interestingly, this restriction only arises during a period of developmental refinement following the onset of hearing (112, 263), and species in which low-frequency ITD processing is absent do not show this refinement (112, 252). These inhibitory, glycinergic inputs (112, 227, 262) originate in the MNTB (28, 130, 235, 263) and the lateral nucleus of the trapezoid body (LNTB) (30). Input from the MNTB, which is driven by stimulation of the contralateral ear, as well as input from the ipsilaterally driven LNTB, have been shown to evoke inhibitory postsynaptic potentials (IPSPs) in MSO cells in vitro (34, 80, 81, 136), and local application of glycine in the MSO by means of iontophoresis blocks spiking activity in vivo (18, 181). Both the MNTB (as discussed in sect. iiA) and the LNTB show morphological specializations for fast and high-fidelity synaptic transmission, their calyceal structure being one such example [MNTB input (255) and end bulbs, LNTB inputs (227)].
An absolute requirement for ITD sensitivity is the ability to generate and retain information concerning the fine-structure waveform of the sound arriving at each ear independently, at least until the primary stage of binaural integration in the brain stem. Primary auditory nerve fibers (ANFs) synapsing at the base of the inner hair cells (IHCs) of the cochlea respond to the cycle-by-cycle changes in the IHC membrane potential (itself reflecting the back-and-forth deflections of the stereocilia) with action potentials that are “phase-locked” to the stimulus waveform (65, 113, 201). Phase-locking in ANFs degrades with increasing frequency with a pronounced roll-off above 2–3 kHz (106), although fibers along the full length of the basilar membrane show phase-locking if they are capable of responding to a low-frequency stimulus (reviewed in Ref. 267). Note that the limitation to phase locking does not arise in the ANFs per se (90). Rather, it reflects the reduction in capability of the receptor potential of the IHCs to follow the cycle-by-cycle deflection of the stereocilia at increasing stimulation frequencies. This low-pass feature of the IHCs provides an upper limit for which temporal information is theoretically accessible in the mammalian brain. Accordingly, the upper frequency limit at which mammals can resolve the ITD in the signal fine structure lies between 1 and a few kHz (119).
Since MNTB cells are also known to phase-lock their output to low-frequency pure tones (51, 95, 125, 149, 173, 224), at least three of the four MSO inputs provide precise temporal information. Based on the anatomical specializations of the LNTB and its input (228), we can speculate that this input is also phase-locked (high-frequency LNTB cells in the bat are able to follow fast fluctuations of the envelope similar to MNTB cells, Ref. 78). However, despite decades of research, it remains unclear how the four inputs to MSO neurons interact to produce the well-described ITD sensitivity of MSO neurons, and how ITD sensitivity of single MSO neurons provides for a meaningful representation of auditory spatial cues. In large part, this is due to the technically demanding nature of in vivo recordings from low-frequency MSO cells, most likely as a result of the high degree of myelination of input fibers (161), the strong neurophonic potential generated by their phase-locked synaptic input (162, 269), and the possibility that action potentials are generated some distance from the soma along the axon (211). Similarly, in vitro recordings from acute brain slices, the only means so far by which the cellular properties of MSO and its inputs can be investigated, grow rapidly more difficult with each day following hearing onset. As a result, almost all intracellular recordings have been performed in immature brain tissue, and while these are vastly valuable to the understanding of the cellular basis of coincidence detection in the microsecond range by MSO neurons, many questions remain to be answered.
That the MSO is the main structure for ITD coding is supported by the fact that albino cats exhibit a pronounced atrophy of MSO neurons (40, 41) and, at the same time, show strong behavioral deficits in azimuthal sound localization (91) as well as diminished ITD sensitivity at the level of the auditory midbrain (268). It has been known for many years that low-frequency MSO neurons respond to pure-tone stimulation with phase-locked discharge patterns (7, 18, 44, 66, 68, 181, 231, 269) and are sensitive to ITDs in the stimulus fine-structure (cat: Refs. 26, 66, 269; dog: Ref. 68; kangaroo rat: Refs. 44, 162; gerbil: Refs. 18, 181, 231; rabbit: Ref. 7).
Based on their apparent binaural excitatory (“EE”) characteristics, some MSO neurons are responsive to monaural stimulation of either ear alone, but the sum of the two monaurally evoked responses is generally far below the response to binaural stimulation at favorable ITDs (269). In addition, binaural stimulation at favorable ITDs normally evokes far higher degrees of phase-locking than stimulation at unfavorable ITDs (7, 26, 44, 68, 231, 269). Moreover, the best ITD (BITD, the ITD eliciting maximal spike rates) can be predicted from the phase delay of the two monaural responses (68, 78, 231, 269). The tuning of MSO neurons for sound frequency is also apparent: the highest spike rates (and greatest modulation in the response) are generally evoked by pure tones presented at a neuron's characteristic frequency (CF). Shifting the pure-tone frequency away from the CF in either direction results in a systematic reduction in the maximum spike rates elicited, and a concomitant reduction in the modulation depth of the ITD functions, i.e., a reduction in the dynamic range of the spike rate elicited by different ITDs (18, 181, 231, 269) (Fig. 9A). Consistent with theoretical outcomes of Jeffress' (1948) model (103), many MSO neurons show a characteristic delay (CD): an ITD for which the relative discharge rate is identical for different stimulus frequencies (202, 272). This is illustrated in Figure 9A, which shows the responses of an MSO neuron to pure tone stimulation for five different sound frequencies. Each pure tone frequency, when presented with a range of static ITDs imposed on the stimulus waveform, evokes a cyclic pattern of responses; response maxima occur at intervals separated by the period of the stimulus waveform, reflecting the underlying mechanism of binaural coincidence detection. Theoretically, for neurons with a pure axonal conduction delay from one or both ears, the CD corresponds to the ITD at which response peaks are aligned for all stimulus frequencies to which the neuron is sensitive. This would constitute a pure time delay for the coincidence detector as suggested by the Jeffress model (7, 231, 269) (Fig. 9B1). ITD-sensitive neurons of the LSO represent a variation on this theme, showing a frequency-independent trough in their response (when ipsilateral excitation and contralateral inhibition coincides), consistent with their “EI” inputs (Fig. 9B3, see also sect. iiiB2c). However, CDs are often not found at response peaks (or troughs), but rather tend to align along the slopes of the ITD functions (Fig. 9, A and B2) (68, 202, 234). This suggests that ITD sensitivity in these neurons is not generated by a pure time delay mechanism but that additionally a form of phase delay is present, which is not explained with simple, equally tuned “EE” and “EI” inputs as suggested by the Jeffress model. Given the anatomical evidence for both excitatory and inhibitory inputs onto MSO neurons, it may not be surprising that many cells with such “intermediate” characteristics have been observed (7, 8, 181). Thus it may be more accurate to describe ITD sensitivity as comprising a continuum between the extreme pure peak type, or “EE,” and pure trough type “EI” characteristics (61). Nevertheless, the relative stability of the BITD as a function of frequency underlying the notion of a characteristic delay is predicated on the concept of purely excitatory coincidence detection and, combined with its predictability based on the monaural responses, has been taken as evidence for, if not definitive proof of, an underlying mechanism similar to that proposed by Jeffress' 1948 model (103).
Despite its general acceptance, an increasing number of studies report findings that are inconsistent with the standard Jeffress model. These findings relate to a range of different anatomical and physiological aspects of ITD processing, as well as the nature of any neuronal representation of ITDs. The latter is discussed in detail in section iiiC2, but at this juncture, one important consideration is the apparent departure of the distribution of BITDs from model predictions. The Jeffress model proposes an azimuthal space map based on the BITDs of an array of coincidence detector neurons. Hence, BITDs should lie largely within the physiological range of ITDs (humans: ±690 μs, Ref. 157; gerbils: ±135 μs, Ref. 138),6 and potentially with greatest density in the region of highest psychophysical resolution (and ethological relevance) around 0 ITD. Such a pattern conforms to descriptions of ITD coding in the avian brain (see sect. iiiC1) but has not been confirmed for mammals. Indeed, BITDs appear not to be restricted to the physiological range (or at least predominantly within it, again as found in birds), but often lie beyond it, at ITDs greater than would be created by the interaural distance. This is based on solid population statistics at different levels of the ITD coding pathway such as the IC of the guinea pig (146), cat (86), and chinchilla (246); the MSO of the kangaroo rat (44) (although not discussed there) and gerbil (18, 181; see Fig. 9A1); as well as the dorsal nucleus of the lateral lemniscus in the latter (219). Furthermore, there appears to be little difference in the distribution of BITDs between species with different interaural distances (cf. cat and guinea pig, Refs. 172, 246). Moreover, neurons in the MSO of the dog, described in some detail by Goldberg and Brown (68), show BITDs far beyond the range of ITDs predicted by the animal's head width, and Kuwada and colleagues (60, 61) reported many, if not the large majority (judging from their Fig. 8 in Ref. 61, for example), of ITD-sensitive “peak-type” cells in the lateral lemniscus having BITDs outside the rabbit's physiological range. In relation to the Jeffress model, this creates two conceptual problems. First, the large number of cells with BITDs beyond that predicted by the size of the head requires explanation. Second, in the absence of any other delay mechanism, the coding of these exceptionally long BITDs would require substantial differences in axonal path-lengths from each ear to individual MSO neurons.
Related to the first conceptual problem, the existence of BITDs outside the range predicted by the head size could be explained by the need to process reverberations that result in reduced correlations of the phase-locked inputs between the two ears (59). In fact, the mixing of multiple direct and indirect sound sources that results in reduced interaural coherence (15) produces rapidly fluctuating ITDs and ILDs that can greatly exceed the “physiological range” (150). In such cases, neural mechanisms for detecting reduced interaural coherence could exist in the form of coincidence detectors wired for longer delays. Nevertheless, such long ITDs are relatively rare in most natural environments and their importance lessened by relatively long integration times at higher auditory stations. There appears no particular reason why the mammalian auditory system might devote the majority of its ITD-sensitive neurons to the processing of interaurally decorrelated sounds or fluctuating ITDs that do not indicate a specific source location, as well as the questions as to why this does not appear to be the case in birds. Interestingly, and as mentioned above, there is no obvious difference in the distribution of BITDs in small mammals like gerbils or guinea pigs and the much larger cat (172, 246). Rather, BITDs depend on the frequency tuning of the ITD-sensitive cells, i.e., the CF (the frequency with the lowest tone-evoked threshold). Based on a large population of guinea pig IC neurons, McAlpine et al. (146) first pointed out that the average BITD increases with decreasing CF and, in fact, shows an almost constant best IPD. Such a constant BIPD has also been shown for the gerbil MSO (18, 181) and dorsal nucleus of the lateral lemniscus (219), although only with tonal stimulation, and for the cat IC (86). The conceptual implications on coding strategies of ITD in mammals are discussed in section iv. But it should be noted here that the observed dependency of BITD on CF is hardly compatible with the idea of a labeled line coding of auditory space assumed in Jeffress' model.
The second conceptual problem in the context of the Jeffress model that arises from the existence of long BITDs concerns the existence of delay lines per se and the requirement for exceptionally long delay lines originating from the contralateral ear (60). What mechanism could account for the actual delays observed for contralateral inputs responsible for generating neural BITDs, almost all of which correspond to stimulus locations in the contralateral hemisphere (sound leading at the contralateral ear)? The traditional textbook notion is that the internal delay mechanism is realized by means of specific combinations of the axonal length of the binaural excitatory inputs. As obvious as it is in the bird NL, the anatomical arrangement of MSO inputs on the contrary is difficult to interpret. Some inputs may resemble delay-line arrangements as proposed by Jeffress (223), but the interpretation of these single projections is problematic (compare Ref. 74). A detailed anatomical analysis comes from Oliver and colleagues (10) who performed a complex reconstruction of small tracer injections into the CN. Their results include an analysis of the axonal diameter of individual fibers and suggest some form of graded input from the contralateral side where, in some reconstructions, shorter collaterals tended to innervate more rostral parts of MSO and longer collaterals more caudal. However, other injections revealed very restricted terminal fields or even gradients running in the “wrong” directions. Accordingly, the authors were very careful in their conclusions and suggested that “other factors may be involved in the computation of ITDs.” Most exactly, as stated above, the calculated delays would, in some cases, fit the Jeffress model with ITDs only within the physiological range, but definitely would not account for the observed long delays. Hence, axonal length, although likely an important factor in generating delays, appears not to be systematically arranged and cannot account for the long delays observed physiologically. In this sense, the situation is fundamentally different from that in birds. Interestingly, a factor that has received little attention until lately is the distance of nodes of Ranvier and the influence of axon diameter, which could, if varied from axon to axon or at different axonal branches, easily contribute to a substantial delay (42); accordingly, data from the chick brain stem indicate a significant contribution of internode distance (213).
Another form of physical delay that has been proposed to contribute to ITD processing is the cochlear delay (208). Due to the limited propagation speed of the traveling wave within the cochlea, moving from basal (high frequencies) to apical (low frequencies), the response latency to an acoustic event, for instance, a very brief click that activates inner hair cell transmitter release, occurs at increasingly longer delays relative to the stimulus onset for progressively lower frequency regions along the basilar membrane. Schroeder's notion that an interaural mismatch in the frequency tuning of the inputs to MSO neurons could be responsible for generating ITD delay tuning (Schroeder 1977, Ref. 208) was popularized by Shamma (216) and has been revisited by Joris and colleagues in recent years (104, 109). However, it must be stated that no convincing evidence for the employment of such cochlear delays in the tuning of neurons to a preferred ITD has been provided.
Yet another means by which internal delays could be generated, and by which the experimentally observed phase delays could be explained, is synaptic inhibition. As discussed above, the MSO receives bilateral inhibitory inputs that employ glycine as their neurotransmitter. These inputs are highly specialized for high-fidelity and high-precision (i.e., phase-locked) temporal transmission. In vivo experiments combined with pharmacology demonstrate that blocking glycinergic inhibition with strychnine results in a broadening of the ITD tuning function towards ipsilaterally leading ITDs resulting in a shift of the best ITD towards 0 (to the “left”; for details, see Appendix C) (18, 181). This suggests that inhibition plays a crucial role in the ITD tuning of MSO neurons. Moreover, a “leftward” shift of the best ITD is also observed when endogenous inhibition, rather than being blocked, is supplemented through iontophoretic application of glycine onto MSO neurons. However, this shift, most likely, arose due to disruption of the normal timing of the glycinergic inputs (for details, see Appendix C). Together, these results suggest that it is not only the inhibition as such that tunes the ITD function, but its timing relative to the timing of the excitatory inputs (181). A detailed explanation of the findings and the suggested underlying mechanism is given in Appendix C.
An important consideration for the role inhibitory inputs in tuning neurons for their preferred ITD is the possibility of adjusting or refining the contralateral delay during ontogeny. Anatomical refinement of the glycinergic inhibition has been shown to be dependent, at least partially, on auditory experience during early development. Raising gerbils in omni-directional noise, which masks most spatial cues (265), reduces the degree of synaptic selection of glycinergic MSO inputs during what appears to be a critical period shortly following the onset of hearing (112, 263). In animals subject to such deprivation, the distribution of BITDs deviates significantly from that in control animals; BITDs are, on average, much closer to 0 ITD than in normally reared animals (212). These findings are consistent with the interpretation that other mechanisms (axonal length, cochlear delays, etc.) establish an approximate BITD but, following the onset of hearing, an experience-dependent process selectively enhances inhibitory inputs to further tune BITDs to the desired population mean. Moreover, by changing the balance of excitatory and inhibitory inputs (compare Ref. 137), it is possible that BITDs might also be adjusted under dynamic control as needed (e.g., in noisy environments with many concurrent sounds).
C) ITD SENSITIVITY IN LOW-FREQUENCY LSO NEURONS.
Interestingly, although controversial with respect to MSO input (see Appendix C), there is no debate over the significance of phase-locked inhibition from the MNTB contributing to the ITD sensitivity of low-frequency LSO neurons. As described in section iiiB1, the “EI” connectivity of LSO neurons is primarily thought to subserve the processing of ILDs in high-frequency sounds by a process of simple subtraction. Nevertheless, even though ILDs are of marginal importance for low-frequency sound localization (save for situations of near-field stimulation), an entire limb of the LSO appears dedicated to the processing of low-frequency sounds in many species. Previously it was considered that low-frequency LSO neurons were only weakly sensitive to ILD, or even monaural in their responsiveness, because they lacked contralateral inhibitory input (16, 26, 83, 231, 248). However, Finlayson and Caspary (58), in a comprehensive study in the chinchilla, found that low-frequency LSO neurons (CFs <1,200 Hz) were not only sensitive to ILDs, but also to phase-inversions of an acoustic stimulus, suggesting that they might additionally be sensitive to ITDs/IPDs in the fine-structure of low-frequency sounds (Fig. 10A). This notion was subsequently verified by Tollin and Yin (247) (Fig. 10B) in the first study to record systematically the responses of low-frequency LSO neurons of the cat to ITDs in the stimulus fine-structure. This study also provided direct evidence to corroborate the notion that LSO neurons generate the trough-type ITD sensitivity expected from phase-locked EI interactions.
It is worth noting at this point that these findings strongly suggest that the inhibitory inputs from the contralateral ear (via GBCs and the MNTB) arrive simultaneously with the ipsilateral excitatory inputs (via SBCs) at the LSO to suppress any responses, although the inhibitory inputs feature a longer pathway and an additional synapse. Possible explanations for this faster contralateral conduction are discussed in section iiiB1.
D) ITD PROCESSING OF BROADBAND SIGNALS.
ITD sensitivity, whether the result of EE or EI mechanisms, is observed not only for the fine-structure, or carrier, of low-frequency sounds, but also for the envelope of high-frequency or broadband sounds, and significant insight into the relative contribution of carrier and envelope ITD sensitivity has been obtained using signals such as interaurally delayed broadband noise. Response functions generated to interaurally delayed noise (noise delay functions, or NDFs) are of a damped, oscillatory shape (Fig. 11), and the central peak of the response, in terms of its shape and preferred ITD, is approximated reasonably well by the linear summation of delay functions obtained to pure-tone stimulation (269, 270). In a combined study of auditory-nerve responses and ITD-sensitive neurons in the IC of the cat, Joris (105) demonstrated that the shape of NDFs is determined both by the neural sensitivity to the fine-structure of the low-frequency components of broadband stimuli, and by the sensitivity to envelope features generated by the bandpass filtering of the cochlea. Intriguingly, envelope sensitivity was not restricted to neurons with relatively high CFs, but was also present in neurons with CFs in the low-frequency, phase-locking range, extending even below 1 kHz [a finding recently supported by Agapiou and McAlpine (2)]. Joris and colleagues (107, 110) subsequently investigated the extent to which damping of the NDFs reflects the spectral bandwidth over which binaural coincidence detection occurs. Whilst finding strong correlations between the two measures, the data also indicate that the peripherally generated envelope sensitivity cannot completely explain the extent to which such damping is observed, as transitions from fine-structure to envelope sensitivity occurred for IC neurons with CFs ∼1 kHz lower than is observed in ANFs (105, 107). Hence, higher level transformations such as convergence are expected to enhance envelope sensitivity in the IC, influencing the precise form of the NDFs (2).
C. Neuronal Representation of Auditory Space
1. The auditory space map in birds
A systematic map of auditory space is known to exist in the optic tectum of the barn owl (OT, corresponding to the mammalian SC). This “space map” is actually conveyed to the OT from the external nucleus of the IC (ICX); as Knudsen and Konishi first reported in their seminal paper in 1978, the receptive fields of ICX neurons shift systematically both in azimuth and elevation along the posterior-anterior and dorso-ventral axis of the nucleus, respectively (122). It was later shown that the auditory space map is formed in the ICX by convergence of ILD- and ITD-sensitive inputs from the lateral shell of the central nucleus of the IC coding for the same region of auditory space. Moreover, these converging projections combine their input across a wide range of CFs (39, 240), pooling that is crucial in overcoming phase ambiguities that arise for narrow-band signals in the relatively high-frequency range employed (uniquely) by the owl in sound-localization tasks using ITDs (122, 143, 240). The head-centered auditory space map in the OT is then merged with that of the visual space map7 (120), ultimately providing the anatomical substrate for the pronounced audio-visually driven head-saccade reflex of the barn owl. Knudsen and colleagues went on to show that the visual input provides instructive signals for the calibration of the auditory map during maturation (for review, see Ref. 121). Hence, the formation of the auditory space map is strongly related to that of the visual system, and it may not be surprising, therefore, that the neural representation of the auditory world resembles that of the visual world. It is well established that in owls the topographically ordered arrangement of auditory space is present at the level of the NL, the initial site of ITD processing (33, 185, 239)), and recent data (127) suggest a similar arrangement in the chicken (Fig. 12,A–C).8
As in other birds, ILD sensitivity in barn owls is initially encoded in the lateral lemniscus via a reciprocal inhibition between the lemniscal nuclei on opposite sides of the brain. Already at this level of processing there exists a systematic gradient of ILD sensitivity and, hence, a rough topographic map of ILDs (31, 139, 242, 243). However, barn owls are clearly an exception in that they belong to the group of so-called “asymmetrical owls,” having the two outer ears and ear canals located at slightly different positions and pointing into different vertical directions (153, 165, 253). This asymmetry results in iso-ILD contours in the frontal field being tilted by almost 45 degrees compared with the vertical alignment found in animals with symmetrical ears. Thus, in barn owls, ILDs serve to encode the elevation, rather than the azimuth, of a source location. To this end, the representation of ILDs in these birds is almost orthogonal to the representation of ITDs, which are unaffected by the asymmetry. By means of a multiplicative interaction of ITD and ILD sensitivity at the level of the auditory midbrain (182), “classical” two-dimensional auditory-spatial receptive fields are created (183). These are then projected, and systematically arranged, in a two-dimensional map of auditory space in the barn owl tectum (17, 67, 122).
An interesting feature of the auditory space maps in the owl OT is the fact that individual cells are relatively coarsely tuned (∼20° on average) compared with the behavioral acuity of ∼2–3° (6, 120). Takahashi and colleagues (5, 241) pointed out that this apparent discrepancy can be overcome by the analysis of spike rate changes (scaled to their variance) in individual neurons with changes in sound location, i.e., changes in spike rate along the slope of the ITD functions. This idea is in line with the theoretical framework of optimal coding posited in recent years (25, 88). Hence, the systematic map of ITDs in the owl brain stem and midbrain is undisputed, although the nature of the associated neural code is still open to debate.9
2. The representation of binaural cues in mammals
The topographic arrangement of auditory space represented in the avian OT has a counterpart in the mammalian SC, with a systematic arrangement of ILD sensitivity extending along the mediolateral extent of the nucleus, creating a space map (review in Ref. 114). However, several significant differences exist between the representations of acoustic-spatial features in barn owls and mammals. Most notably, the convergence of ITD and ILD cues in the barn owl's OT, from neurons with similar frequency-tuning characteristics in the IC, is absent in mammals; by and large, sensitivity to fine-structure ITDs is absent for any frequency at which an appreciable ILD would be generated (except for near-field situations). Indeed, Campbell et al. (27), using the technique of “virtual acoustic space” (VAS), demonstrated a lack of any contribution of ITDs in the SC's representation of auditory space. Notably too, animals deprived of visual input show abnormal arrangements of ILDs in the SC (115), although they do not show deficits in sound localization and, indeed, may actually show improved performance (116). Furthermore, the initial substrate from which a topographic mapping of auditory spatial cues might be developed, a representation of preferred ITDs and ILDs, ordered or otherwise, appears not to exist in the mammalian brain stem (see below), although the existence of a visually guided space map based on ILDs in the SC is not disputed.
A) THE NEURAL REPRESENTATION OF ILD.
A systematic representation of ILD has often been assumed (193) to exist in the mammalian LSO, but has never actually been demonstrated. At the level of the LSO and the IC, some studies suggest a clustering of neurons with the same preferred ILD sensitivity [overlying slopes of ILD functions in neurons recorded close to each other (16, 188)], and anecdotal reports exist of single electrode penetrations in which ILD functions systematically varied along the penetration (261). Because ILD functions are sigmoid and not Gaussian or sinusoidal, LSO neurons do not show closed receptive fields as such. Rather, their sensitivity to ILDs follows a sigmoid function with a slope across a range of ILDs often corresponding to frontal, or near frontal, space, resulting in a dynamic range of ILDs where small changes in ILD produce relatively large changes in spike rate. These slopes may shift significantly in individual LSO neurons (or their target neurons in the IC) when manipulating stimulus parameters such as spectral content (3, 75, 100), duration (170), temporal pattern (75, 123, 124), or intensity (100, 177, 188) and the recent stimulus history (see sect. iiiC3). Hence, LSO neurons do not code for absolute positions in space in the manner of a “labeled-line” code but, rather, the overall relative activation of the left and the right ILD pathways, which appears stable across the population of ILD sensitive neurons (177), likely encodes the source location.
B) THE NEURONAL REPRESENTATION OF ITD.
In contrast to birds, there is little evidence in mammals for any organization in the brain that provides for topographically ordered maps of ITD. This may not be altogether surprising; evidence for an orthogonal arrangement of preferred ITDs with respect to the main tonotopic (frequency) map in the MSO is weak at best.10 Moreover, the projection pattern of the MSO to the IC would imply that this suggested organization in the MSO is not maintained in the IC (168). Nevertheless, the absence of a demonstrably ordered map of acoustic space based on ITDs does not necessarily preclude the existence of a less-ordered organization were it not for compelling evidence suggesting that the arrangement of preferred ITDs in the IC runs parallel with, rather than orthogonal to, the tonotopic gradient. As described in section iiiB2, values of BITDs are negatively correlated with neuronal CF. The consequence of this for the neural representation of ITDs in mammals is shown in Figure 12, E and F, bottom panels, which plots the distribution of peak ITDs as a function of neural tuning for sound frequency for MSO and IC. The demonstration by McAlpine et al. (146) that ITD tuning is CF dependent marked the beginning of a significant departure from the notion that ITD is represented in the form of a labeled-line code, whereby neurons are sharply tuned to an ITD by virtue of their peak response (i.e., response maximum) and the tuning is systematically varied to cover the entire physiological range (Fig. 12, B and C). To date, the relationship between CF and preferred ITD has been shown for the IC of the guinea pig (146, 215), the cat (86), and the chinchilla (246) as well as the DNLL of the gerbil (219); it evidently does not represent specific processing at secondary stages as it has been reported for the MSO itself of the gerbil (18, 181) (Fig. 12E). Based on these findings, the current authors earlier postulated a reconsideration of the mammalian system emphasizing the encoding of spatial positions by changes in spike rate across the span of “physiological” ITDs in the entire population of homogeneously (in terms of BIPD) and broadly-tuned (due to their low CFs) ITD-sensitive neurons (145): neurons with the lowest CFs have, necessarily, the broadest ITD functions in response to interaurally delayed signals; even for relatively large changes in ITD, encompassing perhaps the entire range experienced by a small mammal such as a gerbil, the response near the peak of the function may change only very slightly. As such, positioning the peak of the ITD function at long ITDs, beyond the ecologically relevant range, places the sensitive slope of the function where greatest ITD discrimination is required. By shifting the peak ITD closer to zero for neurons as CF increases, the position of the slope through the range of relevant ITDs is maintained (Fig. 12, D–F). Accordingly, the steepest portions of the slopes were found to align around midline both in MSO, DNLL, and IC (86, 146, 181, 215, 219). It has also been shown that such alignment allows for high-resolution ITD discrimination both on the single cell and population level (214, 220). This coding strategy, referred to as two channel model, stands in contrast to the labeled-line arrangement, in which spatial positions are encoded in the specific activity patterns of only a few identified neurons within an array of heterogeneously (in terms of BIPD) and sharply-tuned (due to their high CFs) neurons. It should be stressed that the two paradigms do not necessarily differ in the coding strategy at the single-neuron level, as spatial discrimination is maximal along the slope of the neurons' tuning curve regardless of coding strategy (given a moderate response variability, compare Refs. 25, 69, 88, 241). Rather, the significant difference is given by the fact that in low-frequency hearing mammals with comparatively small head-widths, the distribution of BITDs suggests an encoding of spatial positions by the (average) response rate of the entire population of neurons and not by the sparse activity of few identified neurons (as, e.g., observed in the high-frequency hearing barn owl).
3. Dynamics in neuronal representation
As in other sensory systems, the auditory system is characterized not only by ascending projections, but also by many descending connections. One class of efferent connections seems to mainly originate in the auditory cortex and targets auditory neurons at different levels of the system: the medial geniculate body, the IC, several rhombomeric structures (SOC, CN), and even the hair cells in the cochlea. These projections are known to influence tuning properties dependent on short-term experience and, hence, seem to be related to stimulus context (163, 238). Another class of efferent connections arises in the SOC and targets directly inner and outer hair cells (218). These provide a fast form of sensory feedback that, for instance, adjusts the gain of the auditory system to take account of overall sound intensity. The two systems may well function in concert. Until recently, the efferent system was not considered with respect to binaural processing, and stimulus-dependent adaptation and context dependency were typically not attributed to the first stations of the binaural pathways. Only at the level of the IC, stimulus-dependent adaptation has been observed to influence binaural processing, resulting in apparent motion sensitivity (147, 230). Moreover, although many auditory brain stem neurons receive efferent (“context dependent”) projections, the MSO, LSO, and MNTB principal cells seemed to be devoid of such inputs. This appeared to accord with the notion that the MSO and LSO constitute “hard-wired” binaural comparators that function almost like electrical circuits: highly exact and invariable. For a number of reasons, however, this view is changing. First, evidence exists that the fast feedback providing olivocochlear system can influence binaural sensitivity via the lateral olivocochlear feedback system (45). Second, evidence exists for dynamic adjustments within the binaural nuclei themselves. MSO and LSO, which both crucially depend on the right balance of excitatory and inhibitory inputs, heavily label for GABAB receptors. GABAB receptors are coupled to G proteins and provide a mechanism for slow adjustment of synaptic efficacy (reviewed in Ref. 43). In a recent study, Park et al. (175) showed systematic stimulus-dependent adaptation in LSO neurons, confirming earlier evidence for dynamic processing (57). Moreover, Magnusson et al. (137) showed clear effects of the GABAB system on ILD sensitivity by applying GABAB receptor agonists or antagonists to LSO in vivo: GABAB activation decreased, antagonizing GABAB increased the “receptive field” of LSO neurons by shifting the synaptic gain and, thereby, systematically shifting the ILD function (Fig. 13). In a second series of experiments in vitro, the same authors also revealed the underlying mechanisms contributing to this: upon activation, LSO neurons release GABA from their dendrites, and this differentially regulates transmitter release from the excitatory and inhibitory input terminals via presynaptic GABAB receptors. Hence, at the very least, LSO neurons are able to adjust rapidly the efficacy of their excitatory and inhibitory inputs, altering their binaural sensitivity in the process. The function and exact location of GABAB receptors in the MSO are not known yet; however, it seems likely here also that stimulus-dependent dynamics might act on binaural properties even at the first level of binaural interaction.
IV. CONCLUSIONS AND OPEN QUESTIONS
When Lloyd Jeffress published his concept on binaural sound localization in 1948, his elegant model fundamentally influenced for generations the way researchers considered neural mechanisms contributing to the brain's representation of sound-source location (103). The model was thereafter consolidated by data obtained from ITD-sensitive neurons in the brain stem and midbrain, revealing that several features of the neural code matched Jeffress' predictions (e.g., submillisecond coincidence detection and the existence of characteristic delays). Moreover, the finding of an almost exact manifestation of the delay-line concept in the avian brain appeared to confirm the general validity of this model.
However, it is fundamentally important to bear in mind that the evolution of the mammalian auditory system occurred independently from that of birds. Consequently, brain mechanisms responsible for processing binaural cues developed under different evolutionary pressures, leading to distinct constraints for sound localization mechanisms, e.g., different hearing ranges, and head and ear morphology. The Jeffress-like labeled-line mechanism of ITD processing and the related topographic representation of space (as reported for the owl) are related to a number of specializations not present in the mammalian system. An important aspect of the differences between birds and mammals is that, in the mammalian auditory system, the two binaural localization cues are utilized in distinct frequency ranges (ITD for low frequency, ILD for high frequency) with the analysis of monaural, spectral cues employed to resolve the cone of confusion along the vertical axis. Both spectral cues and ILD processing show no evidence of a representation based on labeled lines, bringing into question the necessity of such a representation for ITDs. Finally, recent evidence concerning mechanisms of neural tuning for ITD, including the role of inhibition, as well as the form of the representation of ITDs in the MSO and IC, are not well described by the Jeffress model, but rather favors an alternative theory of binaural sound localization first mooted by von Bekesy in the 1930s (254) and later developed by van Bergeijk (1962, Ref. 250), in which the neural representation of auditory space is arranged in the form of broadly hemispheric channels, with horizontal space encoded in the form of an average population response. This is consistent with recent evidence from ITD and ILD recordings. Indeed, the appearance of neurons with broadly tuned response functions is characteristic of the majority of studies examining auditory spatial tuning in mammals and is evident across a wide range of species and stages of brain processing from brain stem to cortex [brain stem: gerbil (18, 181); midbrain: guinea pig (146, 215), cat (86), gerbil (219), chinchilla (246); A1: cat (232)].
To this end, the representation of auditory space in the mammalian brain might best be described as a relative rate code, comparing activity across brain hemispheres rather than within individual brain nuclei, a more distributed and therefore more plastic code compared with the “hard-wired” local code suggested in the Jeffress model, and evident in the avian brain.
We trust that it has become apparent to the reader that brain mechanisms underpinning sound localization in vertebrates are not only interesting of themselves, the current state of the field being characterized by evolving views and concepts, but also that they represent a general model system by which a broader understanding of brain processing may be obtained. First, the almost unique structure-function relationship of the binaural system, combined with the few physical cues involved in sound localization, allow for very specific and testable hypotheses of how information is processed in neuronal circuits. Second, the different implementations of binaural systems, in particular in birds and mammals, are informative as to the range of possible strategies of computing and coding information in the brain and how these strategies might depend on phylogeny and evolutionary precedents of a nervous system. Third, new insights concerning the exquisite balance of excitatory and inhibitory inputs, down to the submillisecond range, and how the recent history of activation influences this balance, renders the auditory pathway a potential model system for studying age-related changes that affect neurotransmission in a variety of ways.
Present address of D. McAlpine: UCL Ear Institute, University College, London, UK.
Address for reprint requests and other correspondence: B. Grothe, Div. of Neurobiology, Ludwig-Maximilians-Universitaet Munich, Grosshaderner Strasse 2, D-82152 Martinsried, Germany (e-mail:).
APPENDIX A: DO HUMANS LACK THE MNTB?
Besides the logical impossibility of having to provide evidence of absence, one must consider that the MNTB is sited within one of the most prominent fiber bundles in the entire human brain: the trapezoid body, with its highly myelinated fibers crossing the midline from both cochlear nuclei. Many of these fibers traverse the MNTB itself, and hence, MNTB neurons are literally spread apart from each other, making it difficult to delineate the nucleus by traditional histological means. In many species, particularly those with good low-frequency hearing, high-quality histological preparations are required to define the MNTB as a distinct structure, and in humans, this represents a particular problem. Nevertheless, Bazwinsky et al. (9) managed to demonstrate the existence of neurons residing within the trapezoid body fibers that receive parvalbumin-positive synaptic terminals. Because MNTB inputs in other mammals contain parvalbumin (53), it is therefore likely that the MNTB is simply difficult to delineate in humans and that this has led to the misunderstanding, perpetuated to this day, that it is absent altogether. Interestingly, the first study claiming that humans do not possess an MNTB (158) described a structure the authors referred to as “nucleus of the trapezoid body” (NTB, in their paper), which is sited in some distance from the location of the MNTB. This study ignored the Nissl-stained cell bodies precisely at the position where MNTB would be in other mammals, and where Richter et al. (199) described loosely arranged single cells with morphological characteristics resembling MNTB neurons in other species. What Moore and Moore (158) referred to as NTB resembles, and was most likely mistaken for, a group of peri-olivary neurons, or possibly the ventral nucleus of the trapezoid body (VNTB). This particular region of the brain stem, in fact, appears somewhat different in other primates, perhaps not surprising given these neuronal clusters are known to be arranged slightly differently in almost every species (for discussions of this problem, compare Refs. 79, 82, 209, 210). Hence, the view that the MNTB is absent from the human brain was initiated with a publication that was, almost certainly, referring to a different neural population. Furthermore, Strominger and Hurwitz (236) pointed out that the human MNTB was not easily delineated as a clear nucleus, leading some authors to deny its existence in humans.
Together, the problematic “evidence” against the existence of an MNTB in the human brain stem with the, admittedly weak (for the same reasons), positive evidence for its existence, raises conceptual problems for the existence of a prominent LSO in humans; its absence would bring into question the evolutionary understanding and the important general role of the MNTB, and a strong case exists for both the existence and the importance of this nucleus in the human brain.
APPENDIX B: ITD PROCESSING: IMPORTANT DEFINITIONS
The following are expressions frequently used in discussions on ITD processing.
Model of ITD processing put forward by L. Jeffress (1948), based on three crucial assumptions: 1) ITD detection via neuronal coincidence detection of binaural excitatory inputs, 2) “delay lines” (systematically varying axonal length from both ears) adjust the coincidence detectors to different best ITDs by compensating for all possible ITDs, and 3) creation of a topographic representation of ITDs in each frequency channel via evenly distributed and systematically varying tuning functions.
Labeled Line Code
Coding strategy in which each ITD is encoded by a specific subset of neurons within the population of neurons. It follows that the identity of each neuron is preserved in order to allow upstream evaluation. This is a direct consequence of Jeffress model-like ITD processing.
Peak Coding Strategy
Coding of a specific ITD via the peak activity of single or few neurons within a heterogeneously tuned population. Typically linked to a labeled line code, although not obligatory.
Average Population Coding Strategy
Coding of ITDs via response rate modulation along the slope of the tuning function of a neuron. Individual neurons in a population are tuned similarly and do not require to be identified for upstream evaluation.
Topographic representation of external space (azimuth and elevation) via an ordered arrangement of systematically varying neuronal receptive fields along multiple axes of a nucleus.
Model of ITD processing on the basis of the average population coding strategy in which two broadly and inversely tuned channels are created by the populations of neurons in the two brain hemispheres. The location of a sound source is encoded in the inversely proportional relative firing rates in the two channels.
APPENDIX C: ROLE OF GLYCINERGIC INHIBITION IN ITD PROCESSING IN THE MSO
Indirect evidence for an important role of the inhibitory inputs to the MSO for tuning ITD sensitivity comes, on one hand, from in vitro experiments in which contralaterally driven inhibitory inputs were found to dominate binaurally coincident excitation (80). This occurs only over a very short time window (a few hundreds of microseconds) when examined in acute brain slices excised some 10 days following hearing onset (81). In fact, these inhibitory effects are extremely strong, generated by some of the fastest- and shortest-acting inhibitory currents measured to date. When measured during blockade of other currents, which normally increases input resistance, potentially slowing membrane kinetics, time constants of isolated glycine currents (measured in even younger tissue; 4–5 days following hearing onset) remain in the range of 1–2 ms (136, 221). More direct evidence for the role of inhibition in generating internal delays comes from pharmacological studies of the gerbil MSO in vivo, which indicate that glycinergic inhibition plays a major role in determining the ITD sensitivity of single MSO neurons. During blockade of glycinergic inhibition by means of local iontophoretic application of its antagonist, strychnine, neural discharge rates are elevated, as expected. More interesting, however, the inhibitory block asserts an asymmetric effect on ITD functions of the neurons tested: notably, the slope of the ITD function closest to zero (the slope “facing” ipsilateral ITDs) shifts to beyond the physiological range due to increased responsiveness to ipsilateral leading ITDs. As a result, the BITD shifts toward 0 ITD (indicating less effective contralateral-leading stimulation; Fig. 14). In contrast, the contralateral-facing slope is almost unaffected by blockade of inhibition. These effects are reversible, with the BITD returning to its original value a few minutes following termination of the iontophoretic application of strychnine (18, 181). The effect of blocking glycinergic inhibition with strychnine shifts both the onset as well as the ongoing component of each neuron's ITD function.11 A likely explanation for this effect is that glycinergic inhibition provides for a net delay in the effective contralateral excitation. Two potential scenarios had been posited for this effect. One assumes the action of tonic synaptic inhibition, and the other the action of phase-locked synaptic inhibition. To differentiate between these two possibilities, Pecka et al. (181) applied glycine iontophoretically to MSO neurons. The rationale of this experiment is that in the event that glycinergic inhibition is tonic, further tonic application would enhance the contralateral delay (277), but would mask the effect of inhibition if endogenous glycinergic inhibition is phase-locked. BITDs should shift to even longer ITDs if the endogenous inhibition is tonic but should cause a similar shift to that observed during inhibitory block (although at a highly reduced discharge rate) if the timing of the inhibitory input is important. The results favored the latter scenario: tonic glycine application resulted in a net shift in the ITD functions in the same direction as that observed during strychnine application. Importantly, unlike with strychnine application, the contralateral-facing slope of ITD functions was most affected by tonic glycine application, an outcome also predicted by the timed-inhibition scenario (Fig. 14, see figure legend for details). In particular, this scenario relies on the existence of phase-locked, and therefore well-timed, inhibitory inputs derived mainly from the contralateral side. Such inputs could, with each cycle of the stimulus fine-structure, postpone the net excitation arriving from the contralateral side if contralaterally evoked inhibition precedes contralaterally evoked excitation by a few hundred microseconds (18, 74). This may seem counterintuitive, since the inhibitory pathway has to pass an additional synapse (in the MNTB) before reaching the MSO. However, the inhibitory pathway, as described in detail above, shows numerous specific specializations to enable fast transmission (255).
Another unexpected feature of this scenario, based as it is on a fast-acting, phase-locking inhibition, is that the speed of inhibition is generally considered too slow to allow for any such phase-locked action. Thus the question might arise, Is inhibition fast enough? As outlined in the main body of the text, repolarization time constants for the inhibition in the gerbil MSO assessed in vitro are in the range of 1–2 ms (136) and effective in suppressing subsequent firing for <1 ms (81), which would be sufficiently fast for the proposed mechanism of ITD processing for frequencies up to at least 1 kHz. The question remains then, What happens for higher frequencies? Interestingly, recent calculations based on optimal coding theory suggested a dichotomy for ITD coding strategies dependent on frequency with the cutoff frequency being ∼1 kHz for the gerbil (88) (for details, see sect. iiiC2b).
↵1 In concert with other cues, mainly the spectrum of a voice.
↵2 Archosaurs: crocodiles, pterosaurs, dinosaurs, and their descendants, the birds.
↵4 At least concerning the initial stage of ILD processing (see below).
↵7 Note that the barn owl is not capable of moving its eyes, hence does not perform eye saccades.
↵8 Recent evidence indicates that in the barn owl forebrain ITD-sensitive neurons are not arranged in an orderly map (39), and the distribution of BITDs is similar to that in small mammals, with the steepest slopes clustering around the midline (256).
↵9 Actually, broader tuning of neurons relative to behavior is not a special feature of the sound localization system but rather a ubiquitous feature of sensory systems.
↵11 There was a recent debate (104) about the example cell displayed in the original publication showing the strychnine effect (18), which was, unfortunately for an example, heavily dominated by the onset effect. The authors therefore raised the possibility that the observed effects reflect mainly an “onset artifact.” A careful statistical analysis and the examples shown in Pecka et al. (181), however, reveal that the glycine affects both onset and ongoing component roughly equally.
- Copyright © 2010 the American Physiological Society