|
|
||||||||
Laboratory of Auditory Neurophysiology, Division of Neurophysiology, K.U. Leuven, Leuven, Belgium; Coleman Laboratory, Department of Otolaryngology, Keck Center for Integrative Neuroscience, University of California at San Franscisco, San Francisco, California; and School of Neurology, Neurobiology, and Psychiatry, The Medical School, University of Newcastle upon Tyne, Newcastle upon Tyne, United Kingdom
ABSTRACT I. TEMPORAL DIMENSIONS OF SOUND II. HUMAN SENSITIVITY TO AMPLITUDE MODULATION III. NEURAL RESPONSE MEASURES IV. AUDITORY NERVE: BOTTLENECK TO THE CENTRAL NERVOUS SYSTEM A. Basic Auditory Nerve Properties B. Average Response Rate and Magnitude of Synchronization C. Phase of Synchronization V. COCHLEAR NUCLEUS: PARALLEL CHANNELS A. Basic Organization of the CN B. AM Responses of Neuronal Types in the CN VI. SUPERIOR OLIVARY COMPLEX: AN EXAMPLE OF TIME-TO-RATE CONVERSION VII. THE NUCLEI OF THE LATERAL LEMNISCUS VIII. AMPLITUDE MODULATION ENCODING IN THE INFERIOR COLLICULUS: A CENTER FOR CONVERGENCE A. Basic Organization of the IC B. Modulation Transfer Functions for IC Units: Synchronization C. Modulation Transfer Functions for IC Units: Average Rate D. What Determines the MTF Upper Limit in the IC? E. Is AM Encoded in the IC by Rate or Synchronization? F. Relationship Between AM Responses and Other Neuronal Properties G. Is Modulation Frequency Represented Topographically in the IC? H. Responses to Interaural Time Disparities in Modulation Envelopes I. Contribution of Nonlinearities IX. AMPLITUDE MODULATION ENCODING IN AUDITORY THALAMUS AND CEREBRAL CORTEX A. Basic Layout of the Thalamocortical System B. Temporal Responses in the MGB C. Responses to AM in Primary Auditory Cortex: Synchronization D. Responses to AM in Primary Auditory Cortex: Average Rate E. Responses to AM in Primary Auditory Cortex: Influence of Modulation Parameters F. Differences of Temporal Coding Between Cortical Fields G. Cortical Mechanisms H. Temporal Coding of Complex Sounds I. Plasticity of Temporal Coding Properties in Auditory Cortex X. NEUROPHYSIOLOGICAL AND PSYCHOLOGICAL STUDIES IN HUMANS XI. CONCLUSION
| ABSTRACT |
|---|
|
|
|---|
| I. TEMPORAL DIMENSIONS OF SOUND |
|---|
|
|
|---|
Importantly, there are multiple temporal dimensions in acoustic stimuli (238). It is useful to distinguish "fine-structure" and "envelope" as two components of a time waveform. The fast pressure variations that determine the spectral content constitute the fine-structure. This fine-structure waxes and wanes in amplitude, and the contour of this amplitude modulation (AM) is the envelope. For example, the waveform of a speech utterance shows bursts of energy that correspond to phonemes. The temporal characteristics of these bursts carry much information (44, 108, 214, 265, 272, 281), but their dominant modulation frequency is rather slow (typically 34 Hz, extending up to
20 Hz) vis-à-vis the temporal capabilities of the peripheral auditory system. Faster modulations of several hundred Hertz are also very common, e.g., in segments of voiced speech where they are perceptually associated with voice pitch. These envelope components arise from interactions between fine-structure components and are not present as such, i.e., as acoustic energy, in the waveform. This is illustrated by the superposition of two sine waves, equal in amplitude but separated by a small difference frequency (fd): constructive and destructive interference of the two components generate AM in the form of "beating" at frequency fd. The same principle extends to environmental sound sources, which commonly produce quasi-periodic signals consisting of a range of frequency components (harmonics) that are multiples of a fundamental frequency: the combination of even a limited number of components, e.g., within a cochlear filter, reconstitutes the fundamental frequency in the form of a temporal envelope modulation. (For examples of spectrograms, waveforms, and treatment of AM, see Refs. 99, 100, 177, 180, 302.)
The laboratory stimulus most often used in physiological studies of modulation is a pure tone (sinusoid) modulated by another tone. Figure 1A and Equation 1 represent the waveform [s(t)] of a tone with frequency fc (the carrier), whose amplitude is modulated by a lower frequency fm (the modulator) at a modulation depth m (0
m
1)
![]() | (1) |
fmt)] is the time-varying amplitude or envelope.1 Using trigonometric identities, s(t) can be rewritten as the sum of three components at fc and at fc ± fm (the upper and lower sidebands)
![]() | (2) |
|
The sinusoidal AM stimulus is special because its envelope consists of a single sinusoidal component. In real-world stimuli, a range of modulations is usually present, which can be summarized by the modulation spectrum: the distribution of modulation energy for the whole waveform or for a selected band of carrier frequencies in the waveform. The subjectively experienced quality of a modulated signal depends on modulation frequency so that the modulation spectrum also defines different perceptual ranges (see sect. II).
The impetus in early physiological studies to use modulated stimuli (57, 62, 78, 183, 196) was a desire to go beyond the arsenal of simple stimuli (pure tones, clicks, noise) that dominated much of the research at that time. Somewhat similar to gratings in the visual domain, AM and frequency modulation (FM) were regarded as elementary features of natural stimuli, which could reveal dynamic properties of the auditory system not addressed with simpler stimuli. Interest in responses to AM was rekindled in the 1980s and 1990s through a convergence of different lines of research concerned with the "dynamic range problem," speech coding, pitch, and spatial localization of high-frequency sounds, among others. However, AM signals are more than just a convenient laboratory tool to study a diversity of psychophysical and physiological phenomena. The question that we are concerned with here is whether envelope processing is embedded in the auditory system, as may be expected from the ecological prominence of envelopes.
Given the theory of natural selection, one can assume that animals are well adapted to their specific acoustic environment and that the statistical structure of the natural auditory environment or the "acoustic ecology" (5) is reflected in the structure and function of the auditory system. Acoustic ecology can be defined as the total ensemble of sounds present in an animal's environment, from both inanimate as well as biological sources. Indeed, the auditory systems of acoustically specialized animals have revealed the existence of highly developed adaptations. Prominent examples include the echolocation system of bats (e.g., Ref. 61), the mating call detection system in frogs (245), and the alarm call differentiation in vervet monkeys (275). Common to these examples is that particular behaviors are elicited by a small set of signals with specific, fairly invariant acoustic properties. Characterization of these lower order physical sound attributes led to the discovery of special neuronal mechanisms.
Relatively little work has been done on the quantitative analysis of amplitude modulation statistics in acoustic ecologies and their consequences for neuronal processing. Not only overtly specialized but all animals are likely to exploit consistencies in statistical properties of the acoustical environment. Nelken et al. (194) found that low-frequency amplitude modulations are prominent in natural environments and are often coherent over different frequency regions, and may be exploited by the auditory system in signal detection. Voss and Clarke (288) computed temporal correlations of music passages and discovered a 1/f scaling relation over a few decades. More recently, Attias and Schreiner (6) decomposed music, speech, and animal vocalizations into narrow-band frequency channels and studied the statistics of the amplitude and phase distributions for each channel. They also found a distribution of modulation frequencies following a power-law, indicating that the amplitude modulation statistics of natural sound are non-Gaussian, cover a wide range of modulation frequencies, and scale universally, i.e., the frequency dependence is similar over different frequency ranges. Using a mutual information metric between stimulus and spike trains, it was also found (7) that neurons in the cat inferior colliculus are more efficient at coding naturalistic stimuli than nonnaturalistic stimuli: the information rate per spike for naturalistic stimuli was more than 60% higher than for nonnaturalistic signals. Similar results have been seen in the frog (232). This implies that neural processing is adapted and perhaps optimized for the encoding of naturally occurring modulation information.
Our purpose is to review physiological mechanisms that may be important for the processing of temporal envelope information. We first briefly highlight findings from human psychophysics to illustrate some of the perceptual consequences of AM, but we refrain from a more substantial discussion of the relationship between physiological mechanisms and perception. Rather, our focus is on a simpler and more basic question; namely, within what limits is AM encoded by single auditory neurons, and does the form of encoding suggest that the temporal envelope dimension is a fundamental organizing principle in the auditory system; in the manner that tuning to orientation, direction, or spatial frequency are considered fundamental in vision.
For reasons of space, only occasional reference will be made to the extensive research in bats or nonmammalian vertebrates, even though AM is often an important feature in echolocation signals (156, 198, 258) and their study often preceded the research reviewed here.
| II. HUMAN SENSITIVITY TO AMPLITUDE MODULATION |
|---|
|
|
|---|
With improvements in technology, subsequent studies (see Ref. 131 for historical review) extended and quantified these findings. Zwicker (324) showed that the threshold for detecting AM is very small at low modulation frequencies (threshold m
2% for fm of 14Hzand fc of 1 kHz) and increases to a maximum with increasing fm (m
5% for fm of 32 Hz and fc of 250 Hz; and for fm of 125 Hz and fc of 4 kHz). Above this maximum, threshold decreases and falls below the values obtained at low modulation frequencies, but in this range subjects perceive the carrier and the modulation frequency as distinct tones. Zwicker (324) also determined that, for a given carrier, thresholds for the detection of AM and FM measured in terms of their modulation depths coincide on the upper side of the maximum at a modulation frequency he termed the Phasengrenzfrequenz. This led Zwicker to postulate that above the Phasengrenzfrequenz [now termed the critical modulation frequency (CMF) (250, 263)] the carrier and sideband components are analyzed in different critical bands (auditory filters), and thus subjects are not sensitive to differences in the relative phase of the modulation components that enable them to distinguish AM from FM below the CMF. More recent evidence suggests that the situation is more complex than this (180, 263), but nevertheless, it appears that when listening to AM imposed on pure tone carriers detection may rely on spectral rather than temporal cues over some ranges of modulation frequency.
One means of eliminating spectral cues, and therefore estimating the temporal resolving power of the auditory system, is to measure the detection of sinusoidal modulation imposed on noise rather than a tonal carrier. The broadband spectrum of the noise precludes the listener detecting the individual spectral components of the stimulus spectrum. The use of such stimuli (9, 285) demonstrated that the relationship between threshold and modulation frequency (the psychophysical temporal modulation transfer function) is essentially a low-pass function with a 3-dB cut-off around 50 Hz and a slope of 4 dB/octave. The minimum threshold modulation depth is
5% at low modulation frequencies (<10 Hz) where subjects detect the individual amplitude changes in the stimulus. The upper limit of modulation detection extends to
2.2 kHz (68, 285, 286). As will become apparent later, this coincides with the very highest limits of neural phase-locking to envelopes obtained for some neurons in the auditory periphery in cats (Fig. 2, Refs. 127, 229) and exceeds the limit for phase-locking to envelopes in more central neurons. This raises questions as to the nature of modulation encoding in the central auditory system, even when one takes into account the encoding of modulations by changes in average rate that become apparent at more central sites.
|
|
Two competing models have been proposed to explain the detection of AM. The first consists of a bandpass filter and half-wave rectifier representing processing by the cochlea, followed by a low-pass filter (285). Some measure of the output of this filter provides the basis for the subject's response (see Ref. 181 for discussion). In essence, therefore, this model is an envelope detector. The second scheme models the detection of modulation by a bank of bandpass filters that are sensitive to different ranges of modulation frequency. A channel or filterbank model of modulation analysis was first proposed by Kay and colleagues (84, 132) on the basis of adaptation studies with FM and AM. Subsequently, the adaptation paradigm was questioned (178, 289), but the concept of a modulation filterbank persists because studies using different psychophysical paradigms have since reported findings which support the concept of modulation frequency tuning. Evidence for such selectivity comes from modulation masking experiments (8, 107), and modulation detection interference (MDI), a phenomenon in which the detection of AM is influenced by modulation at the same frequency but on a very different carrier (318). Dau et al. (36) invoked a model consisting of a modulation filterbank associated with each auditory filter to account for the detection and masking of sinusoidally amplitude-modulated narrowband noise. The latter model was extended (283) to account for comodulation masking release, another phenomenon, like MDI, that indicates some element of modulation waveform analysis across different carrier frequencies (96) (see Ref. 180 for review). Such across-frequency interactions between similar modulation envelopes are likely to contribute to grouping and the construction of auditory images (90). Despite different lines of evidence favoring some form of modulation filterbank, the concept remains controversial, and the experimental findings discussed above do not concur in their estimates of the bandwidths for these putative channels.
| III. NEURAL RESPONSE MEASURES |
|---|
|
|
|---|
The earliest single-unit studies of peripheral auditory neurons already reported synchronization to the fine-structure of tones, in the sense that discharges occur at a particular phase of the cyclical waveform. For example, auditory nerve fibers have the striking capability to "phase-lock" to low-frequency tones up to several kilo-Hertz [45 kHz in the cat (121), but the upper limit is species dependent (298)]. Phase-locking also occurs to stimulus envelope; both forms of phase-locking are immediately apparent in the poststimulus time (PST) histogram (Fig. 1C) to the AM stimulus of Figure 1A. The fine spacing of peaks at intervals of 1 ms indicates phase-locking to the 1-kHz fine-structure; the grouping into broader peaks spaced by 10 ms indicates phase-locking to the 100-Hz envelope. In contrast to the stimulus spectrum (Fig. 1B), the response spectrum (Fig. 1D) shows energy at fm, i.e., the AM signal is demodulated. Several cochlear nonlinearities with asymmetry between the positive and negative part of the transfer function can contribute to this demodulation, the most important being half-wave rectification in the relationship between displacement of hair cell stereocilia and receptor potential, and in the absence of negative firing rates (135). The response spectrum also shows a value at 0 Hz (Fig. 1D: small circle on ordinate) which equals the average firing rate. In this review, we will use the terms envelope synchronization and envelope phase-locking synonymously to refer to synchronization of the response to the stimulus envelope waveform, and use the term rate coding for changes in average firing rate during manipulation of the stimulus modulation parameters.
Different synchronization measures have been used, sometimes leading to seemingly contradictory statements. The most popular metric is "vector strength" R, also called synchronization index (81). Each spike is treated as a vector of unit length and with phase
i between 0 and 2
measured as the spike time modulo the stimulus period of interest. The x- and y-components of the vector are xi = cos
i and yi = sin
i. The n spikes in a response are combined by vector addition, and the resultant vector is normalized to n
![]() | (3) |
is also retrieved with either technique. Statistical significance of synchronization is usually quantified with the Rayleigh test (23, 168). As will become clear in this review, envelope coding at peripheral stages is predominantly temporal rather than rate-based, but these two aspects of the response progressively reverse in prominence at successive stages along the neuraxis. Because both average firing rate and synchronization may contribute to the impact that a neuron has on its postsynaptic targets, many experimenters have combined the two metrics by multiplication (nR, with n = total number of spikes, variously called "modulated rate,""phase-locked rate,""synchronized rate"), or, equivalently, by reporting the unnormalized Fourier component, expressed in spikes per second (33, 141, 224, 314). Recently, some authors have used 2nR2, which is also the statistic used in the Rayleigh test of significance (157, 266). Finally, envelope synchronization is often reported as a gain value (in dB), defined as 20 log10 (2R/m), which relates output directly to input and facilitates comparison across studies which use different modulation depth m.
The vector strength metric, often under different names (e.g., selectivity index), has found general use in the quantification of periodic neural signals in sensory and even motor physiology (43). Despite its pervasive use, it is important to be aware of its limitations. First, the metric gives only the degree to which the response is modulated to the frequency at which R is calculated (we use the subscripts m and c to indicate modulation frequency and carrier frequency, respectively). It does not capture the full harmonic content of the cycle histogram at fm so that histograms with a rather different shape can result in the same Rm value (see Ref. 127 for an example). An Rm value of one only results from perfect alignment of all spikes at one phase, but a value of zero does not necessarily indicate a random distribution of spike times. For example, if spike times are equally divided between phase
and
+
, the average vector has zero magnitude. Thus a low vector strength should not necessarily be equated to absence of temporal structure in the spike train, but rather is an indication of lack of energy at the frequency for which R was calculated. Second, high R values indicate that spikes are distributed over a narrow time window relative to the period of interest, but such values do not imply a faithful replica of the stimulus modulation waveform in the probability of discharge. As a reference, a PST histogram that closely resembles a half-wave rectified sinusoidal AM signal with m = 1 gives R = 0.5. Higher R values are obtained when the period histograms are more "peaked" than the original sinusoidal modulation signal. Third, R is a compressive metric and is therefore sometimes graphed on an expansive scale (120). Finally, a problem at a more general level is that calculation of Rm requires knowledge of fm, a strategy that the brain cannot use. It may be argued that a "clock" signal is available in the form of the highly synchronized discharge of some types of cochlear nucleus neurons, which could be used to perform a vector strength type calculation in which degree of synchronization is translated into average firing rate, e.g., as suggested in the periodicity extraction scheme by Langner (150). Some authors have used interspike interval or autocorrelation analysis to bring out the time structure of responses that may be more relevant to the operations performed by the central processor (27, 85, 123, 141, 226, 301). In this context it is important to remember that the envelope of most natural sounds is not strictly periodic in the first place and that the raw acoustic waveform is not available as such to the auditory nervous system. Rather, this waveform is decomposed into a multitude of waveforms by virtue of cochlear narrowband filtering (reviewed in Refs. 206, 234). This process profoundly affects the modulation spectrum present in each frequency channel, which is thus determined jointly by the spectrotemporal properties of the acoustic stimulus and of those of the peripheral filtering process (for illustrations, see Ref. 286). In summary, while most studies discussed here have used deterministic stimuli with periodic envelopes and have applied the R metric, it is important to keep in mind that, for natural stimuli, the relationship between neural response modulation and stimulus modulation is more complex and that the neural operations by which the central processor extracts envelope information likely differ fundamentally from the analytical ways of the experimenter.
The bulk of studies on AM coding have used the same stimulus strategy, which is to tailor the stimulus to the cell under study. Early work (78, 183) established that peripheral neurons display envelope phase-locking only if the stimulus energy falls within a cell's tuning curve. For example, Javel (114) shows the lack of response of an auditory-nerve fiber tuned to 800 Hz to a high-frequency AM complex (fc = 5 kHz) modulated at 800 Hz. Most studies using AM stimuli with tonal carriers match fc to the neuron's characteristic frequency (CF, frequency of lowest rate threshold), and usually also optimize other stimulus parameters for the cell under study. The complementary approach, in which the population response of cells at many different CFs is studied to a limited set of stimuli, has been little used (27, 293).
A description employed both acoustically, psychophysically, and physiologically, is the modulation transfer function or MTF, which is response modulation relative to input modulation as a function of modulation frequency. Schroeder (257) predicted more than 20 years ago that the concept of MTF would increase in importance because the modulation rather than the carrier usually contains the important information and because highly nonlinear transmission systems often exhibit a quasi-linear response to modulation. Physiologically, MTFs are usually measured as the phase-locking to AM tones of fixed m and fc presented at consecutive modulation frequencies, but other methods have been employed (see sect. IXB). Marked effects on average rate occur so that a distinction between temporal MTF (tMTF) and rate MTF (rMTF) is usually drawn.
| IV. AUDITORY NERVE: BOTTLENECK TO THE CENTRAL NERVOUS SYSTEM |
|---|
|
|
|---|
Activity in the auditory nerve represents both the output of the cochlea and the input to the central nervous system, and studies of envelope phase-locking have been conducted both to gain more insight into cochlear processing and to define the limits within which the central processor has to operate. Compared with optic and peripheral somatic nerves, the auditory nerve is highly uniform both morphologically (in caliber and branching pattern) and physiologically. We only discuss type I auditory nerve fibers, which form the bulk of the nerve, since near to nothing is known about the physiology of the unmyelinated type II fibers. Because each type I nerve fiber contacts only a single inner hair cell, its activity can, to a first approximation, be understood from basilar membrane motion at a single point in the cochlea followed by further signal modifications by the inner hair cell and hair cell/nerve synapse (76, 136, 137, 243). The most salient properties are 1) sharp V-shaped tuning to a narrow range of frequencies; 2) a limited dynamic range of
2030 dB, reflected in an sigmoidal rate-level function; 3) adaptation of firing rate to sustained stimuli, rather modest compared with adaptation of peripheral nerve fibers in other systems; and 4) phase-locking to low-frequency pure tones (<45 kHz in the cat).
Auditory nerve fibers show a bimodal distribution of spontaneous rate (SR), on the basis of which several classes of fibers are defined that differ in a number of properties (158, 246, 305). Fibers with high SR (>18 spikes/s), which in cat form
60% of the total population, have low thresholds and limited dynamic range. Fibers with medium and low SR have higher thresholds and tend to have "sloping" saturation, i.e., their rate-level functions show a decrease in slope at
30 dB above threshold but do not fully saturate. Also, low-SR fibers show less adaptation than high-SR fibers (230). Differences between the SR classes have been documented mostly with pure tone and spectrally complex stimuli, but AM stimuli have revealed response differences in the time domain as well. We first discuss how the basic AM parameters m, sound pressure level (SPL), fm, and fc (Fig. 3) influence synchronization and average rate, then describe the response phase.
B. Average Response Rate and Magnitude of Synchronization
When a tone is presented at a fiber's CF at a fixed suprathreshold level and is modulated with increasing depth, the nerve fiber shows a monotonic, saturating increase in synchronization Rm (Fig. 3A). Although Rm increases with m in absolute terms, synchronization magnitude decreases in relative terms, i.e., the gain (response modulation relative to stimulus modulation) decreases (127). The gain can be as large as 10 dB for m of 10% and decreases to values near 0 dB for m of 100%.
Responses to AM as a function of stimulus intensity have been studied extensively in a variety of animals (guinea pig, Ref. 33; chinchilla, Ref. 114; cat, Refs. 127, 135, 294; gerbil, Ref. 270). The rate-level function with AM shows only small differences relative to the function obtained with an unmodulated carrier wave (127, 270). The synchronization-level (Rm vs. SPL) function shows a stereotypic nonmonotonic shape; a maximum is reached at low suprathreshold levels, with a decrease in Rm for further increases in SPL (Fig. 3B). It is easy to see how this relationship is expected from the compressive relationship between firing rate and SPL, especially when the modulation depth m is small; maximal modulation of firing rate should occur for amplitude changes centered on the steepest part of the rate-level function, between firing threshold and saturation. At high SPLs, amplitude fluctuations should not translate into fluctations in firing rate because firing rate is saturated. Qualitatively the synchronization-level function does indeed show the expected nonmonotonic shape. However, compared with quantitative predictions based on the rate-level function, the observed synchronization shows 1) larger maximal R values, 2) a maximum that is displaced towards a higher SPL, and 3) higher synchronization values at high SPLs and a shallow downward slope. These deviations are predicted when adaptation over a short time scale is taken into account (33, 270, 311). Basically, adaptation boosts the coding of stimulus changes so that the operating range over which changes in SPL result in changes in firing rate is larger for responses to AM than for steady-state responses to pure tones.
There are systematic differences in AM responses of the different SR classes of auditory nerve fibers. One descriptor commonly used to compare envelope phase-locking across cell populations is the maximal R value of the synchronization-level function (Rmax). Cells with low and medium SR tend to have higher Rmax values than cells with high SR, and this difference is particularly marked at low CFs (<5 kHz) (127, 294). However, the difference in synchronization between these different auditory nerve classes strongly depends on the synchronization metric used (33, 127, 183, 295). In contrast to earlier reports, Cooper et al. (33) concluded that fibers with high SR showed larger envelope synchronization values than low SR fibers. Their result is less of a conflict than it appears if it is taken into account that the metric used by these authors was (unnormalized) modulated rate rather than Rm, that the average discharge rate of fibers with low SR is generally lower than that of fibers with high SR (158), and that the sample of Cooper et al. is biased to high CFs (>8 kHz).
Synchronization is robust in high SR cells at low SPLs and in low and medium SR cells at mid and high SPLs (294). However, the different fiber populations reach maximal synchronization at the same level relative to rate threshold (33, 294). Low SR fibers have a larger dynamic range over which significant modulation is present (33), lending further support to the general hypothesis that these fibers are particularly important for hearing at high SPLs.
The narrow bandpass filtering by the cochlea limits the range of modulation frequencies transmitted by nerve fibers. As schematized in Figure 3C, increase of fm causes the sidebands in the stimulus spectrum to move away from fc. If fc is centered at the CF of the fiber studied, the energy in the sidebands is increasingly attenuated, resulting in a loss of modulation at the output of the peripheral filter. The response as a function of fm is usually referred to as the modulation transfer function (MTF) and again one should clearly distinguish effects on average rate (rMTF) from effects on synchronization to fm (tMTF). The rMTF is usually flat but may show some decrease in rate with increasing fm, particularly in low-SR fibers (127). In contrast, tMTFs all have a low-pass shape (guinea pig, Ref. 203; cat, Ref. 127; rat, Ref. 186; Fig. 3C). These functions are smooth and do not show any structure related to harmonic ratios, i.e., whether or not the AM components (fc and the two sidebands) are integer multiples of fm is inconsequential. The absolute bandwidth of frequency tuning curves, e.g., at 10 dB above threshold, increases with CF (59, 86, 230), and the cut-off frequency of tMTFs shows a concomittant increase with CF (Fig. 2). At very low CFs (a few hundred Hz), a tMTF cut-off frequency can often not be determined because of the broad frequency tuning. Interestingly, for CFs above
10 kHz, the increase in cut-off frequency is not commensurate with the increase in bandwidth of frequency tuning at these high CFs. This presumably reflects temporal filtering at the hair cell/synaptic level rather than spatial filtering at the mechanical level (86, 127). The highest modulation frequency at which significant envelope phase-locking is observed, in high-CF nerve fibers, is
2 kHz (127, 229). A less marked feature of many tMTFs is a shallow positive slope in the low-frequency skirt (94, 127). According to Cooper et al. (33), this slope tends to become steeper at high SPLs, consistent with models that include effects of response adaptation (311).
Clearly, the extent of envelope phase-locking in the auditory nerve is sufficiently wide to encompass psychophysical existence regions (Fig. 2). Javel and Mott (115) attributed the disappearance of residue pitch at fc >5 kHz to increased sharpness of tuning of high-CF fibers (59, 230). However, while bandwidth limitations may contribute to the upper fm limit of
800 Hz, they do not explain the disappearance of residue pitch altogether.
The dependence of envelope phase-locking on carrier frequency, relative to CF, has not been explored in great detail (114, 127, 295). It merits further study because the available data suggest an important effect. If fc is moved away from CF, the synchronization-level function shifts to higher SPLs. Consequently, for moderate to loud stimuli, strongest phase-locking is present in fibers with CFs that differ from fc, provided that the stimulus is able to excite these fibers (Fig. 3D). Thus, for all but the weakest signals, the representation of stimulus envelope may be carried mainly by fibers tuned to frequencies that differ from fc.
Few studies reported phase or latency data for AM stimuli. For a given fiber, the phase of response to the envelope shows a slight lead with increasing SPL (127) and, at fixed suprathreshold levels, varies little with changes in carrier frequency (122). In contrast, response envelope phase increases nearly linearly with fm. The slope of this relationship has been used as an estimate of the total delay accrued between the acoustic stimulus and the site of recording, similar to earlier such measurements on responses to pure tones in low-CF fibers (4). The linearity of the phase-fm relationship indicates that it is mostly determined by fixed mechanical and neural transmission delays. Consistent with other delay or onset latency measures, the values obtained vary systematically and inversely with CF (127, 294), as expected from the travelling wave on the basilar membrane which starts at the base of the cochlea and reaches its more apically located maximum after some delay. However, many processes contribute to the total delay (242, 244). Gummer and Johnstone (93) scanned envelope delay of nerve fibers near their tuning curve threshold, using AM complexes of fixed fm and low modulation depth over a large range of carrier frequencies. They found a delay component that was large for carrier frequencies near CF and smaller in the tuning curve tail, and the authors provide several arguments to suggest that this component reflects a delay associated with cochlear bandpass filtering.
The preceding descriptions are based on synchronization of the response to the envelope frequency. Again, it is important to bear in mind that such descriptions are incomplete. The shape of cycle histograms can depart severely from the shape (usually sinusoidal) of the stimulus envelope, particularly at high SPLs and at large modulation depths. Therefore, the spectrum of the cycle histogram typically consists of a number of spectral peaks, of which the peak at fm is only one, and not necessarily the largest, component (135, 294). Also, the most salient temporal information present in the discharge patterns is not necessarily revealed by calculation of synchronization to stimulus components. For example, robust phase-locking to fm does not imply that the most common interspike intervals are at the period of fm: for envelope periods of several tens of milliseconds multiple spikes occur per envelope cycle, while periods shorter than a few milliseconds succeed each other too fast to allow a spike in every envelope cycle. An interesting discrepancy between envelope phase-locking and dominant interspike intervals is in "pitch-shift" effects of changes in fc (27, 114): phase-locking to fm stays roughly constant, while the most dominant interspike interval shifts in a direction which parallels the subjective pitch of the AM stimulus.
In summary, envelope information is abundantly available in auditory nerve discharges in temporal form. Each nerve fiber transmits envelope information over a stereotypical range of modulation frequencies, carrier frequencies, and intensities. These ranges are consistent, at least at a qualitative level, with known auditory nerve properties of frequency tuning, compression, adaptation, and spontaneous activity, and computer models incorporating these properties reproduce the main features of AM responses (105, 117, 271). The main way in which the auditory nerve is a bottleneck to the central nervous system for AM signals is in the extent of modulation frequencies over which synchronization occurs. This range cannot be enlarged centrally, except possibly for frequencies at which fine-structure information is available (<45 kHz), because AM arises from a time-domain interaction of stimulus components.
| V. COCHLEAR NUCLEUS: PARALLEL CHANNELS |
|---|
|
|
|---|
A. Basic Organization of the CN
An important insight that emerged from study of the CN with simple stimuli was that a limited number of response patterns or "classes" could be discerned and that these patterns are related to morphological cell classes (18, 202). Especially through the technique of intracellular labeling, many of the structure-function relationships that were surmised earlier on the basis of indirect evidence were solidified. The physiological diversity of these different cell types, combined with the diversity of their central projections (297), led to the concept of functionally specialized, parallel pathways (for review, see Refs. 26, 69, 112, 227, 319).
Briefly, three subnuclei are defined on the basis of the bifurcation pattern of the auditory nerve. The anteroventral cochlear nucleus (AVCN) has three principal cell types. Stellate cells project to the inferior colliculus (IC) and respond to tones with a burst of regularly spaced action potentials called a "chopper" pattern. Bushy cells, which derive their name from their small and confined dendritic tree and which are remarkable for their strong inputs from the auditory nerve, occur in two types. Spherical bushy cells receive large calyceal auditory nerve terminals (end bulbs of Held) and show responses similar to auditory nerve fibers and are therefore called "primary-like" (PL). Their main projection is to binaural nuclei in the superior olivary complex. Globular bushy cells also receive large nerve terminals in the form of modified end bulbs of Held, and show a characteristic "primary-like-with-notch" (PLN) pattern in response to tones. Their main projection is contralaterally in the superior olivary complex where they give rise to giant calyceal endings on cells in the medial nucleus of the trapezoid body, which are inhibitory on binaural cells in the lateral superior olive (LSO). The posteroventral cochlear nucleus (PVCN) contains octopus cells that project to the ventral nucleus of the lateral lemniscus (VNLL) and show pure onset (Oi) responses to tones. It also contains inhibitory multipolar cells that project to the dorsal cochlear nucleus (DCN) and the contralateral CN and which show onset-chopper (Oc) responses. The principal neurons of the DCN are the fusiform cells, which project to the IC and display remarkably nonlinear spectral properties. These properties arise through local inhibitory interactions with interneurons in DCN (type II cells) and presumably with the Oc cells (195).
The classification of CN cells is mostly based on subjective criteria, which contributes to discrepancies in conclusions of different studies. Although there is by no means an agreed upon "task" for each of these circuits, it is clear that each cell type performs a different analysis of the auditory nerve input and conveys its output to a different part of the auditory brain stem. The bushy cells are clearly involved in binaural analysis important for spatial localization of sounds. Stellate cells are able to represent vowel spectrum over a wide range of intensities. Fusiform cells integrate somatosensory and spectral information and may signal important auditory events. Responses to AM offer another illustration of how CN cell types differ in their processing of auditory nerve input.
B. AM Responses of Neuronal Types in the CN
The relationship between AM coding and physiological cell class, as defined by the response to pure tones, was first examined by Frisina and co-workers in the gerbil (70, 71). These authors found that envelope phase-locking in ventral cochlear nucleus (VCN) was generally enhanced relative to the auditory nerve, and they described a hierarchy of enhancement that correlated with the precision of timing of response onset to pure tones. Of the four physiological VCN cell types studied, cells with well-timed onset responses showed the highest gains, followed by choppers, PLN, and PL. The decrease in synchronization with increasing intensity is less than in the auditory nerve and in some cell types depends on fm, resulting in a peaked or tuned tMTF at high SPLs. Particularly these latter two response features, extended dynamic range and selectivity to fm, received much attention in later studies (Fig. 4). The general behavior of synchronization as a function of SPL and fm described by Frisina et al. (71) was confirmed and extended to other cell types in many subsequent studies, even though not all studies agree on the exact hierarchical ordering and the discreteness of the ordering.
|
Some of the most interesting responses were observed in cells with chopper responses. Choppers are temporally tuned for fm, as reflected in bandpass tMTFs particularly at higher SPLs (gerbil, Ref. 71; cat, Ref. 229). A small percentage of choppers also shows bandpass tuning in their rMTFs (228). The fm causing the strongest synchronization is called the temporal best modulation frequency (tBMF). The occurrence of bandpass tuning is of obvious importance to the concept of a "modulation frequency filter bank" or "modulation channels" (131). This concept has some popularity, particularly in the psychophysical literature (see sect. II), and will be taken up again in our discussion of IC and auditory cortex.
As mentioned, "chopping" reflects the intrinsic tendency to fire a regular burst of spikes at the beginning or sometimes entire duration of the stimulus, and these cells have therefore been viewed as resonators or intrinsic oscillators (150). SPL-dependent bandpass tuning and oscillatory responses were also described earlier by Møller (187) in the rat. In a subclass of cells in the guinea pig, the intrinsic behavior is invariant with SPL and affects the temporal characteristics of the response to nondeterministic stimuli (301). There is a possibility that the intrinsic properties make these cells function as envelope filters that decompose the envelope spectrum, much in the way that inner hair cells in the turtle cochlea decompose stimulus frequency by virtue of an intrinsic electrical resonance mechanism (63). Several authors have therefore looked for correlations between AM and intrinsic oscillation behavior. Frisina et al. (71) compared the frequency of chopping with the tBMF for a sample of sustained choppers in VCN. The tBMFs spanned a range (170700 Hz) roughly similar to the range of chopping frequencies (80520 Hz), but the correlation between the two response properties was poor. There was a suggestion of interaction between chopping frequency and fm in that the tBMF only rarely exceeded the chopping frequency, which therefore seemed to set an upper bound. In a subpopulation of choppers (sustained choppers with a well-defined tBMF between 150 and 450 Hz), Rhode and Greenberg (229) noted a tendency for maximal envelope synchronization when fm matched the discharge rate to a tone at the same intensity.
A strong and more general relationship, not restricted to choppers, was found by Kim et al. (141) in DCN/PVCN neurons of the unanesthetized decerebrate cat. In this study, the "intrinsic oscillation" frequency of a neuron was measured from the autocorrelation of its responses to pure or AM tones. Frequency of intrinsic oscillation and BMF were well correlated (r = 0.86) with regression close to the diagonal of equality, and the frequency ranges were roughly similar (50500 Hz) to those reported for VCN choppers (71, 229). Importantly, the remarkably good correlation arose from the pooling of different cell groups, rather than from a within-population trend, complicating any AM-coding scheme based on intrinsic oscillators. At least five cell types contributed to the data, surprisingly also including auditory nerve fibers.
Besides choppers, the other main constituent cell types of the AVCN are the two types of bushy cells with PL and PLN responses. As expected from their powerful auditory nerve inputs, PL and PLN cells resemble auditory nerve fibers in many regards, and indeed, their Rmax and tMTF cut-off frequency distributions at different CFs largely overlap that of the auditory nerve (129, 229). For PL cells this overlap is virtually complete, but for CFs below
7 kHz, PLN cells synchronize much better to envelopes than auditory nerve fibers. At very low CFs some bushy cells have enhanced synchronization to both fine-structure and envelopes (124).
Comparisons of cell types across studies illustrate that one has to be careful with simple characterizations to multi-dimensional stimuli like AM. As remarked by Rhode and Greenberg (229), a single response parameter is not sufficient to characterize envelope synchronization. The highest gains found in choppers exceed those of PL cells but are mostly at fm values below 500 Hz (129, 229) so that at higher modulation frequencies PL cells are superior to choppers in transmitting envelope information. Consequently, the hierarchy of modulation enhancement strongly depends on the range of modulation frequencies of interest and also, as pointed out earlier (see sect. IVB), on the chosen metric (266). Rather than providing an exhaustive listing of response parameters for all cell types, we emphasize here the properties by which different CN cells stand out most from the auditory nerve and from each other. For chopper cells this is the bandpass tuning of tMTFs; for bushy cells it is the extent of the tMTF (high cut-off frequencies).
The two main response types found in PVCN are onset (Oi and Oc), associated with the octopus and multipolar morphology, respectively. Both cell types show remarkable envelope phase-locking, in line with the precision of their onset response to pure tones. Oc cells have been particularly well-studied (cat, Refs. 125, 140, 228, 229). These cells show some of the highest gains, over the widest fm and SPL range, which is why Kim et al. (140) proposed that these cells have a special role in the extraction of the fundamental frequency of voiced speech sounds. Moreover, large changes in fc and even use of a wideband carrier have little effect on magnitude of synchronization (228). Oi cells have been studied very little, but the few existing data reveal interesting properties, in line with their biophysical specializations (199). These cells show the highest gains of all CN cells, reaching Rm values near 1 (228). Moreover, their tMTFs are high in gain and invariant for SPL, but all-pass. The rMTFs of these two classes of onset cells also appear unique among CN cell classes because they can be sharply bandpass. It is unclear whether these bandpass rMTFs can sustain a rate code for modulation frequency: among the handful of Oi cells reported, the range of rBMFs was only 350450 Hz.
Onset units have wider frequency tuning than auditory nerve fibers (80, 118, 231). They therefore provide a test case of the suggestion that is sometimes made that tMTF bandwidths may broaden centrally by virtue of convergence of cells tuned to different CFs (180, 286). However, this would require phase information on the individual spectral components of the AM stimulus, and for frequencies above the pure-tone phase-locking range (>45 kHz in cat), such information is not available to the central processor. Indeed, despite their wider frequency tuning, tMTF cut-off frequencies of onset cells do not exceed the limits imposed by the auditory nerve (125, 228, 229).
The DCN has traditionally been regarded as a part of the CN which has poor timing properties (79, 82, 154), and initial studies with AM seemed consistent with that view (horseshoe bat, Ref. 282; kangaroo rat, Ref. 29). However, more recent studies emphasized good AM coding in DCN (cat, Refs. 125, 229, 254; guinea pig, Refs. 322, 323) and specific roles for DCN in temporal processing have been proposed [pitch (150); extraction of envelopes in background noise (73) or at high SPLs (229)]. The tMTFs are typically low-pass or bandpass and differ from other CN cell types in their upper fm limit of phase-locking which never exceeds 800 Hz. To some extent, differences between studies reflect the complexity of this nucleus, both in diversity of response types and in nonlinearity of behavior (319). Oc cells can be found in deep DCN and may explain some of the high-gain responses to AM reported for DCN. Second, simple measures like maximum synchronization or cut-off frequency do not reveal the full complexity of DCN responses and give DCN a misleading "AVCN-like" appearance. Even though DCN interneurons and principal neurons can display high gain responses to AM stimuli, their response often shows strong nonmonotonicities, not only in average rate but also in magnitude and phase of envelope synchronization (125, 254, 322). These nonmonotonicities are likely a manifestation in the temporal domain of the intricate inhibitory and excitatory interactions that have been invoked to explain similar complexities in the frequency domain.
A preliminary study by Frisina et al. (73) in the chinchilla suggests that envelope synchronization of DCN neurons can be enhanced by background noise, but more systematic data and comparisons with auditory nerve and VCN are needed to evaluate whether DCN neurons are special in this regard. Rhode and Greenberg (229) studied envelope synchronization in the presence of wide-band noise in different CN cell types of the cat and found that in general there is remarkable preservation of envelope synchronization even at high noise levels.
As in the auditory nerve, few authors have systematically reported envelope phase data. Cells in the CN also show a linear increase in envelope phase with increasing fm, but the slopes are systematically steeper than in the auditory nerve, consistent with additional time delays required for conduction and synaptic transmission (125, 129). Delays calculated from response envelope phase are more tightly distributed and shorter than traditional measures of latency based on response onset (94, 185), as is the case for delay estimates based on fine-structure (65). Most CN studies of AM coding considered only tMTF magnitude and not phase when trying to infer functional consequences of AM tuning for the perception of natural stimuli. Delgutte et al. (40) used both tMTF magnitude and phase of responses in auditory nerve, CN, and IC to predict responses of the same neurons to speech utterances (see below) and stressed the importance of incorporating phase, particularly at very low modulation frequencies, to make succesful predictions.
To summarize, the CN shows marked differences in AM coding relative to its auditory nerve input: wider dynamic ranges, higher gains, appearance of bandpass tMTFs, and less sensitivity to the presence of background noise. Furthermore, different cell types show marked diversity in their synchronization and average rate behavior to AM signals. A simple hierarchical ranking does not do justice to the differences among cell types and depends on whether one emphasizes Rmax values (71, 295), breadth of the tMTF (129), or statistical reliability of phase-locking (266). As in the nerve, AM coding is almost entirely temporal: bandpass rMTFs occur rarely, in a few cell classes.
Our knowledge of CN responses to AM is still lacking in many ways and basically does not go far beyond phenomenology. Perhaps the most pressing question is the robustness and relevance of bandpass tMTFs, which many investigators regard as genuine envelope filters. More studies are needed to determine how invariant tMTF tuning is with stimulus parameters, what range of tBMFs is spanned at different CFs, and whether tMTF tuning indeed supports filtering of envelope energy in natural stimuli. Such information would be particularly valuable for carrier frequencies in the range of phase-locking to fine-structure (<45 kHz), which is poorly sampled in most studies in small animal species with higher-frequency hearing than humans. There are other lacunae. Data are sparse for certain cell types, most notably pure onset units in PVCN. In most studies, the stimulus is optimized for the cell under study; there is a need for population studies in which the response to a limited set of stimuli is examined for an entire population. Finally, there is currently no evidence for any kind of within-class topographic organization (e.g., within an isofrequency strip) of AM response properties in the CN.
| VI. SUPERIOR OLIVARY COMPLEX: AN EXAMPLE OF TIME-TO-RATE CONVERSION |
|---|
|
|
|---|
The duplex theory of sound localization holds that the azimuthal spatial position of low-frequency signals is determined primarily on the basis of the minute differences in time at which the acoustic waveform reaches the two ears, interaural time differences (ITDs), while high-frequency signals are localized on the basis of interaural SPL or level differences (ILDs). This classical psychophysical theory seems to be embodied anatomically and physiologically in two binaural circuits in the SOC of most mammals. The circuit centered on the medial superior olive (MSO) detects ITDs and contains primarily low-frequency cells. Another circuit, centered on the lateral superior olive (LSO), detects ILDs and has a bias towards high CFs. The detailed physiology of these circuits and their afferents is beyond the scope of this review (see Refs. 279, 312, 316).
Starting in the mid-1970s, a number of investigators reported that humans can reliably discriminate ITDs of high-frequency signals at thresholds approaching those for low-frequency signals, i.e., <20 µs, provided that the signals are not pure tones but have a time-varying envelope, as in AM sounds with the parameters illustrated in Figure 2. Clearly, subjects can detect the on-going envelope differences that occur when complex stimuli are delayed between the two ears with high precision. Physiological studies in the IC of cat (317) and rabbit (12) provided evidence for ITD sensitivity to AM signals but indicated that this sensitivity was probably generated at a lower level. Subsequent recordings in the SOC indeed revealed cells that were sensitive to interaural delays of AM signals, and this ITD sensitivity could be understood from the binaural interactions known to occur in these nuclei and the AM coding properties of their afferents.
In the MSO, ITD sensitivity to AM signals is generated by a multiplicative, cross-correlation type operation. These cells behave as coincidence detectors, which has been particularly well-documented for low-frequency signals (81, 126, 313) but holds for modulated signals as well. The average firing rate of high-CF MSO cells to AM signals varies with ITD (Fig. 5A). Moreover, the optimal ITD is predicted from the phases measured from the monaural response to an ipsi- or contralaterally presented AM signal: the firing rate is high when the envelope signals from the two ears arrive in-phase at the site of convergence (10, 122, 313).
|
In the LSO, ITD sensitivity to AM signals is generated by a subtractive rather than a multiplicative process (Fig. 5B). These cells have ILD sensitivity by virtue of excitatory signals from the ipsilateral ear and inhibitory ones from the contralateral ear. Again bushy cells constitute both contra- and ipsilateral pathways. For ITDs at which the inhibitory and excitatory phase-locked signals reach the LSO cell coincidently, the signals cancel each other and the cell remains silent. At other ITDs cancellation is not perfect and the excitatory ear is now able to drive the cell. Thus the ILD sensitivity of the LSO cell combined with the envelope phase-locking in its afferents generates overall changes in discharge rate with ITD (10, 11, 122, 128, 129). Interestingly, in anesthetized cats LSO neurons show a "chopper" pattern to ipsilateral tone bursts, but unlike choppers in the CN, they lack tuning in the tMTFs (or rMTFs) to ipsilateral stimulation (129).
The simple time-to-rate conversion that occurs in binaural SOC nuclei may have analogs in monaural processing, e.g., rMTFs in the SOC of the mustache bat appear to be shaped by monaural excitatory and inhibitory interactions and delays similar to the binaural interactions described in cat and rabbit (91). The envelope ITD sensitivity in MSO and LSO also illustrates the general point that it is probably beneficial for a time-to-rate conversion (or more generally a recoding of a stimulus-locked temporal code into another form) to occur at a peripheral neural level. Indeed, the upper frequency limit (though not necessarily the gain) of phase-locking tends to decrease with subsequent integrative stages so that a de novo comparison of monaural phases by neurons at a higher level in the neuraxis would yield a more restricted ITD sensitivity. The frequency and modulation frequency range over which ITD sensitivity occurs in the IC and higher levels is comparable to that in the SOC but is rate-based (66, 219). For example, the ITD sensitivity of high-frequency cells in the IC extends to modulation frequencies to which the cells no longer phase-lock when tested monaurally [on average 600 Hz binaurally vs. 250 Hz monaurally (12), see also sect. VIII, B and H]. Also, envelope phase-locking in the monaural inputs to the LSO extends to modulation frequencies more than an octave higher than the highest fm at which LSO neurons show ITD sensitivity (
800 Hz) (129). The use of temporal information may thus be one evolutionary reason for the extensive subcortical processing in the auditory system relative to the other sensory systems.
Little is known about envelope sensitivity in other nuclei of the SOC. Olivocochlear efferent neurons in the guinea pig are surprisingly well phase-locked to AM signals below
400 Hz (94), with bandpass tMTFs peaking at
100 Hz. Maximal gains were
8 dB higher than for auditory nerve fibers recorded in the same experiments. It is not known whether modulation differentially affects the targets of the medial olivocochlear neurons (the cochlear outer hair cells), although AM signals have been reported to be effective signals to suppress evoked oto-acoustic emissions in humans (162).
Remarkable AM responses were described in monaural cells in the SOC of awake rabbits (145). Cells with sustained responses showed responses to AM similar in several respects to CN choppers, but an unusual class of "off" cells was inhibited during the presentation of pure tones and responded vigorously after stimulus termination. These cells were strongly driven by AM stimuli and showed high gains over a wide range of modulation frequencies, resulting in low-pass tMTFs and rMTFs. Several properties suggested that the responses were in effect a rebound from inhibition phase-locked to the stimulus envelope, a mechanism also observed in the SOC of the bat (91).
| VII. THE NUCLEI OF THE LATERAL LEMNISCUS |
|---|
|
|
|---|
| VIII. AMPLITUDE MODULATION ENCODING IN THE INFERIOR COLLICULUS: A CENTER FOR CONVERGENCE |
|---|
|
|
|---|
The several parallel pathways that diverge in the cochlear nucleus from the common input of the cochlear nerve converge again in the IC, the principal midbrain nucleus in the auditory pathway. The IC is an obligatory processing center for most information ascending via the medial geniculate body to the auditory cortex. Anatomical investigations of the IC in several species have identified a broadly consistent arrangement of subdivisions: a central nucleus (CNIC) receiving most of the main ascending afferent input from many brain stem nuclei is surrounded dorsally, laterally, and rostrally by dorsal (DCIC) and external cortices (ECIC) (166, 200, 201). The CNIC is distinguished from the other subdivisions by its laminar organization. It is composed of two main cell types termed disc-shaped or flat cells interspersed with stellate or less-flat cells (164, 182). This cytoarchitecture gives rise in three dimensions to twisted laminae of cells and fibers (167) that constitute the substrate for the highly tonotopic frequency organization in the IC (173, 237, 252, 264). The frequency-band laminae are oriented so that neuronal best frequency increases along the dorsolateral to ventromedial axis of the nucleus. A defining feature of CNIC is the convergence of temporal, spectral, and spatial information extracted in parallel earlier in the pathway onto this laminar structure. However, the full details of how these converging inputs map onto individual neurons have yet to be elucidated, and it is not known to what extent the different strands of information are processed independently in the IC.
The DCIC and ECIC as well as differing from the CNIC in their cytoarchitecture have different inputs and outputs. Descending projections from the cortex terminate, predominantly (304), although not exclusively, in the cortical divisions (248). The IC is an important source of both ascending fibers to the thalamus and descending connections to lower brain stem structures (110).
The monaural and binaural response properties of single neurons in the IC have been extensively documented (see Refs. 24, 112, 113). Despite the limitations in our knowledge about its cellular organization, it is clear that the output of the IC is considerably modified relative to its input. This is exemplified by the response patterns of IC neurons to complex sounds including AM. For the most part, such knowledge is derived from studies in anesthetized animals that have focused on neurons recorded in response to monaural stimulation of the ear contralateral to the side of recording, and in what follows monaural stimulation should be assumed unless specified. Most of the studies discussed here describe recordings attributed to the central nucleus, but depending on the age of the study and the parcellation adopted, in many cases this will have included at least part of the DCIC and ECIC as well as the CNIC. Therefore, in this review the term IC is used to indicate all subdivisions.
B. Modulation Transfer Functions for IC Units: Synchronization
IC neurons show strongly modulated responses that for many modulation frequencies greatly exceed the modulation in the stimulus (144, 222224). Modulation gains calculated from synchronized responses in the IC are often 1520 dB (144, 222, 224) and so are larger than equivalent measurements obtained in the auditory nerve and for most neuron types in the CN. The shape of the tMTF depends on the parameters of the stimulus (see below) but is invariably either bandpass or low pass (144, 152, 191, 222224).
Modulation gain may be enhanced in the IC, but modulation frequencies that elicit a synchronized response are restricted to a lower range than in the periphery. This is manifest in both the tBMFs of neurons in the IC and the range of frequencies over which there is significant modulation of the response (Fig. 9). In the rat, Rees and Møller (223) obtained a modal tBMF in the range of 100120 Hz. The tBMF never exceeded 200 Hz, and the high-frequency cut-off of the tMTF (measured 10 dB down from the BMF) did not exceed 320 Hz. In guinea pig, tBMFs fall below 150 Hz with most peaking between 50 and 100 Hz (224). Broadly similar values have been obtained in gerbil (144) and squirrel monkey (191). In the latter, 73% of neurons showed a bandpass tMTF for AM with tBMFs between 32 and 64 Hz. In rabbit, single units and multiunit clusters had a mean tBMF of 87 Hz (12). However, it is worth noting that one unit synchronized to a modulation frequency of 925 Hz. For samples of phasic neurons in both young and old mice, tBMFs were all below 200 Hz (291). Similarly in mustache bat, the majority of units (
70%) only synchronized their firing to modulation frequencies below 300 Hz, but a small proportion (4.5%) synchronized up to 500 Hz (20). While these values are broadly similar, the differences that exist more likely reflect species differences rather than the presence or absence of anesthetic, since there is no segregation of the values consistent with anesthetic status.
|
Rees and Møller (223) demonstrated that the shape of the tMTF is highly dependent on stimulus level as in some cochlear nucleus neurons. When stimulus intensity is close to threshold, tMTFs are usually low-pass functions but become more bandpass as the mean intensity of the stimulus is increased. This change may be accompanied by an upward shift in the tBMF. For neurons with nonmonotonic rate-level functions, however, the tMTF becomes low pass at sound levels falling on the negatively sloping limb of the rate-level function (224). So the relationship between tMTF shape and sound level is indirect, with firing rate, perhaps reflecting the net excitatory drive to the neuron, being the better predictor of the low-frequency slope of the tMTF. Why the effect of stimulus level is only apparent at low modulation frequencies is not clear and may depend on a number of factors including adaptation. Another possibility is that the neuron's probability of firing at low stimulus intensities is only high near the peak of the modulation cycle resulting in highly synchronized firing. As intensity is increased, threshold is exceeded for a larger fraction of the modulation cycle leading to a reduction in synchronization. This effect might not be apparent at high modulation frequencies because the frequency of modulation approaches the neuron's maximum firing rate, so ultimately only a single spike occurs in each cycle giving a high degree of synchronization whose upper limit is determined by temporal resolution. Such effects become more apparent in the cochlear nucleus and IC than the auditory nerve because of the enhancing effects of time-dependent inhibition, membrane properties, and other nonlinearities in more central neurons, evidenced by their lower spike rates.
Further evidence for a relationship between tMTF shape and firing rate is provided by the effect of background noise. Bandpass tMTFs become low pass with the addition of progressively higher levels of background noise (223). Rees and Palmer (224) showed this change correlated with the noise-induced shift in the neuron's input/output function along the level axis and its consequent effect on the firing rate elicited by a stimulus (224).
C. Modulation Transfer Functions for IC Units: Average Rate
The most striking change in AM responses between the IC and its peripheral inputs is in the tuning of rMTFs; the dependence of average firing rate in the IC on modulation frequency is stronger, more common, and has a much wider diversity of patterns than is the case in the CN or the SOC (Fig. 6). (But it is important to note that we have only limited information about rate responses to modulation in the nuclei of the SOC and lateral lemniscus.)
|
rMTFs show a wider range of patterns than is usually observed for tMTFs. In the cat, Langner and Schreiner (152) identified specific patterns of rMTF in a population of single- and multi-unit clusters. These included band-pass, low-pass, high-pass, band-reject, or complex types. The majority were bandpass (70% of single units, 58% of multiunits). Similar response patterns are also found in bat (32) and mouse (291). In guinea pig, 45% of rMTFs were bandpass; the remainder included a variety of different shapes, with some units showing little effect of modulation frequency on firing rate (224). Units whose average firing rate did not change with modulation frequency were the most common type encountered in squirrel monkey, making up almost half of the total (191). The most detailed study of rMTFs in the IC is that of Krishna and Semple (144) in gerbil. In addition to confirming the rMTFs shapes described previously, Krishna and Semple (144) noted that many rMTFs were characterized by distinct ranges of modulation frequency over which firing rate was enhanced or suppressed. In some, regions of enhancement were separated by a marked region of suppression that defined a worst modulation frequency separating the two maxima.
Like synchronized responses, rate responses to modulation depend on the mean level of the stimulus (144, 224). Where units have bandpass rMTFs and monotonic rate level functions, the heights of the peaks in the rMTFs increase and then decrease with the average level. They are highest when measured at sound levels on the sloping portion of the rate level function and decline as the stimulus level rises into the saturating region of the rate level function (224). Across a population of neurons with peaked rMTFs, increases and decreases in BMF with level were observed (144). In units with rMTFs containing regions of suppression, the suppression often becomes more prominent as stimulus level or modulation depth is increased. In some instances, regions of firing rate enhancement changed to suppression at high stimulus levels. Krishna and Semple (144) postulate that inhibition is an important contributor to these effects.
There is general agreement across species in the modal value of the rBMF distribution in the IC. In the cat, the modal value for rBMF lies between 30 and 100 Hz (152). These values are in keeping with those reported in rat (222, 223), guinea pig (224), gerbil (144), and bat (32). In the primate, the peak of the distribution of rBMFs of multi-units was 128 Hz (191).
There is less agreement over the upper frequency limit for rBMFs. In the cat, almost 20% of multiunit clusters had rBMFs greater than 200 Hz as did
5% of single units (152). A few units had rBMFs as high as 1,000 Hz. rBMFs of up to 800 Hz were also reported for some units in bat (32) and mouse (291). In contrast, the maximum rBMFs recorded for single units in gerbil did not exceed 300 Hz (103, 144), and in squirrel monkey, the maximum rBMF value reported was 256 Hz (191). It is quite likely that the differences between these studies reflect true species differences, with there being no such creature as the average mammal. However, other factors might be contributory. The cat data show that rBMFs >300 Hz were more prevalent in multi-unit recordings. As Langner and Schreiner (152) comment, multi-unit recordings may contain responses from the fiber inputs to the IC as well as its neurons. Given that some of these inputs originate from nuclei in which neurons synchronize to higher modulation frequencies than in the IC, their contribution could be misleading. On the other hand, units with high rBMFs may be more difficult to record as single units, and a small number of single units with high BMFs were reported. Krishna and Semple (144) suggest that misclassifying the secondary peak of enhancement as the BMF in those units with more than one rMTF peak might explain the high rBMFs reported in cat. Apart from species differences, the presence or absence of anesthesia is another factor that could account for the observed differences in the ranges of rBMFs. However, it seems unlikely that anesthesia is the only factor, since some of the largest differences are seen when comparing data from different species where no anesthetic was used [compare values above for squirrel monkey (191), bat (32), and mouse (291)]. On the other hand, similar values were obtained in some anesthetized and unanesthetized preparations, e.g., cat (152) and mouse (291). Unfortunately, definitive experiments comparing the presence and absence of anesthetic have yet to be perfomed.
D. What Determines the MTF Upper Limit in the IC?
Lower cut-off frequencies for both tBMF and tMTF in the IC than at more peripheral stages of the pathway are generally observed across species. The reasons for this are not clear. In the auditory nerve, filter bandwidth is one limiting factor as evidenced by the correlation between the upper limit of the response to modulation and a fiber's CF (see sect. IVB and Fig. 2). However, evidence for a similar relationship between the response to AM and CF in the IC is weak. In the cat, the upper boundary of the rBMF distribution (and presumably the tMTF distribution since rBMFs and tBMFs are reported to be similar) for multiunits increases with CF (152). But evidence of such a correlation was not apparent in single-unit data recorded in other species [rat tBMF (223), squirrel monkey (rate or synchronization not specified) (191), bat rBMFs and tBMFs (32), or gerbil (144)]. Krishna and Semple (144) examined a large data set and failed to find any correlation between CF and rBMF or between CF and the cut-off frequency of either rMTFs or tMTFs. Furthermore, the frequency bandwidths of most IC neurons are sufficiently wide to accommodate the stimulus spectrum. Thus it seems something other than frequency bandwidth is primarily responsible for setting the upper frequency limit of the response to AM in the IC.
An alternative possibility is that the shift in the response to lower modulation frequencies in the IC reflects a reduction in temporal resolution. Such a reduction is suggested by an upper frequency limit of 600 Hz for phase-locking to pure tones in the IC, a substantially lower value than pertains in auditory nerve fibers (147). The mechanisms responsible have not been identified, but intrinsic membrane properties and synaptic mechanisms are possible candidates, as is the accumulated loss of temporal resolution en route from the periphery. The contribution of synaptic processing is now being investigated, but thus far blockade of inhibitory or excitatory mechanisms has failed to show any significant influence on the upper limit of synchronization. Neurons in the IC of the mustache bat seldom responded to a wider range of modulation frequencies following the blockade of GABAA, GABAB, or glycinergic inhibition (20). This finding is in contrast to the marked increase in the upper limit of synchronization in DNLL neurons in the same species with GABAergic blockade (310). Similarly, neither blockade of N-methyl-D-aspartate (NMDA) (20, 321) nor DL-
-amino-3-hydroxy-5-methylisoxazole-propionic acid (AMPA) excitatory receptors (321) resulted in changes in the upper limit of synchronization. Similarly, in chinchilla, Caspary et al. (28) found no change in the temporal response to AM with blockade of GABAA receptors, but they did report changes selectively affecting the low-frequency limb of rMTFs in some units.
E. Is AM Encoded in the IC by Rate or Synchronization?
Whether AM is encoded in the IC by synchronization or by average firing rate remains an open question. Of course, both measures may be important either independently or combined as synchronized rate. tMTFs and rMTFs and BMFs match in many units, but in a significant percentage of neurons they are different, with, in some cases, no obvious dependence of rate on modulation frequency despite a clearly tuned tMTF (144, 152, 191, 222, 224). Population data on the MTF types obtained using synchronized or average rate measurements were reported in the cat (152). Seventy percent of units had bandpass rMTFs, and only 7% were low pass. In contrast, a much larger proportion of tMTFs showed low-pass functions (48%) compared with bandpass functions (33%), such that 60% of units with low-pass tMTFs had bandpass rMTFs.
Nevertheless, the relationship between firing rate and modulation frequency that emerges in IC might signal a transformation in the encoding of AM from a temporal to a rate-based representation, and models have been proposed explaining how this might be achieved (105, 149, 160). A common approach invokes coincidence detection in IC neurons operating on synchronized responses to modulation from stellate cells in the cochlear nucleus. Although elegantly simulating many modulation responses of neurons in the IC, current implementations match the BMFs of the IC neuron and its inputs from the cochlear nucleus despite experimental data (cf. sects. V and VII and Fig. 9) which suggest that the BMF ranges are not the same.
As this discussion has shown, synchronized responses to the modulation envelope are well maintained in the colliculus, and rMTFs are not simple reflections of tMTFs. It is premature, therefore, to conclude that temporal based encoding of the modulation envelope has no significance in the IC. Both rate and synchronized coding might be retained with different functional consequences. A rate code could allow the encoding of modulation frequencies that exceed the synchronization limit in the IC, and the data of Schreiner and Langner (251) support this conjecture as does the finding in squirrel monkey that the distribution of rMTFs peaks at a higher frequency than the distribution of tMTFs (191). On the other hand, some studies show that synchronization and rate measures extend over broadly similar ranges of modulation frequency (see sect. VIIIC).
F. Relationship Between AM Responses and Other Neuronal Properties
Possible functional relationships between response to AM and other physiological properties have not been well explored in the IC (at least partly because there is no generally accepted physiological classification scheme, as is the case for the CN). A variety of firing patterns to tones are recorded in the IC, and most authors have distinguished onset and sustained responses (see Refs. 112, 113 for review), which can be further subdivided into distinct classes (e.g., Refs. 221, 290). Such patterns depend on the state of intrinsic membrane conductances that in turn are modulated by inhibition (155, 209, 268). Both sustained and onset units can respond to continuous AM stimuli that last several seconds (144, 222, 224). Although some onset units fail to respond to AM, those that do respond at modulation depths well below 100% negating the argument that the response is effectively to a series of tone bursts. It does seem that onset units are the least likely to respond to AM. In both bat and the rat, most of the units failing to respond to modulation were onset types (32, 204). Other differences in the response to AM between different unit types are also beginning to emerge. In bat, average rBMFs increased progressively when comparing the responses of tonic, chopper, and onset neurons (32). Sinex et al. (267) report differences between unit types and their responses to sinusoidal and trapezoidal AM. Krishna and Semple (144) describe rMTFs with two peaks separated by a region of suppression. These were predominantly seen in units with sustained or pauser PST histograms. Onset or onset-sustained neurons showed only a single peak of enhancement.
Another property of IC neurons correlating with the response to modulation is regularity of firing. Regular firing, as measured by calculating the coefficient of variation (320), is apparent in a number of different neuronal types (221). A preliminary report (225) shows that units with highly regular intrinsic oscillations show a strong correlation between tBMF and the oscillation frequency. On the other hand, cells with peaked rMTFs are mainly limited to neurons that fire irregularly to tones.
G. Is Modulation Frequency Represented Topographically in the IC?
Some of the responses discussed so far, in CN and IC, provide suggestive evidence for a physiological implementation of a modulation filter bank. This view would be strengthened if neurons were found to be spatially organized according to their AM tuning properties, since the creation of spatial maps is a common strategy in nervous systems. Evidence for a topographic representation of modulation frequency in the IC of cat was reported by Schreiner and Langner (251). rBMFs and tBMFs were determined for units encountered in multiple penetrations through the IC at recording sites reconstructed from the coordinates of the electrode penetration and the recording depth. The measured values, together with interpolated points, were assembled to create a map of BMF. Two patterns of rBMF organization emerged. First, a gradient of rBMF extended along the dorsoventral axis of the colliculus with CF. Measurements of rBMF along such electrode penetrations revealed a progressive increase in rBMF with depth, although the overall trend was accompanied by discontinuities and reversals of rBMF. In addition, a map of BMF extended across the plane of the frequency-band laminae. The highest BMFs were found caudally in the lateral half of the lamina. Regions representing the highest BMFs were surrounded by "quasiconcentric" iso-BMF contours representing progressively lower BMFs. The diameter of the contour representing each BMF and the upper limit of BMF increased with CF. Thus, considered in three dimensions, each modulation frequency is represented on the surface of a cone having its base located in the high-frequency region of the IC and its long axis aligned with the dorsoventrally orientated tonotopic axis of the IC (Fig. 7). Schreiner and Langner (251) propose that this map demonstrates the importance of the IC in the perception of periodicity pitch and that such a representation could facilitate the integration of periodicity information across carrier frequency. In support of the map, they cite the corroborative evidence that response latency is spatially mapped across the frequency band laminae in the IC (153) and that BMF is negatively correlated with response latency. This implies that there should be a mapping of BMF along the same axis as the latency map. Evidence for a mapping of modulation frequency has also been reported in a developmental study in the gerbil with responses to the highest modulation frequencies found most laterally as in the cat (103).
|
The publication of such a mapping of BMF has been influential in the development of theories and models of temporal processing in the auditory pathway (3537, 105, 149). However, a correlation of BMF with location or with CF has not been confirmed in other studies; indeed, as discussed above, there is still debate about the range of modulation frequencies represented in the IC. Given the concentric organization of the modulation map described in the cat, it is unlikely that a pattern of such complexity would be found unless it were the primary objective of the study. But, as discussed in section VIIIC, the determination of BMFs from multiunit data, on which most of the mapping is based, must proceed with caution. On the other hand, it is difficult in single-unit studies to achieve the necessary sampling density that such mapping ideally requires. An additional complicating factor in this discussion is the lack of invariance of both tBMFs and rBMFs with stimulus level (144, 223). Resolution of this issue may depend on the development of techniques that enable the modulation response properties of large populations of neurons to be determined with high spatial and temporal resolution. Finally, it should be emphasized that the absence of a map would not invalidate the existence of a modulation filter bank. As an analogy, there is some evidence for a map of ITD tuning in the MSO (14, 269, 313), but a spatial organization in the IC has not been convincingly demonstrated (315). Nevertheless, the relevance of ITD tuning for binaural hearing is not in question.
H. Responses to Interaural Time Disparities in Modulation Envelopes
Human subjects can localize sounds using on-going ITDs, generated by the amplitude envelope even when the carrier frequency of the sound is above 1.5 kHz and subjects can no longer localize using interaural time differences in the carrier (see sect. VI). Physiological responses to such binaurally disparate amplitude modulations were first investigated systematically by Yin et al. (317). Firing varied cyclically as a function of ITD, at a period equal to that of fm, indicating that the neurons were responding to the interaural delay of the modulation waveform, not of the carrier. In many respects, ITD sensitivity in the IC strongly resembles that in the SOC, e.g., it reflects the same two basic forms of interaction (see sect. VI and Fig. 5). There are also differences, indicating an elaboration of response properties between SOC and IC, but these are outside the scope of this review (13, 66, 172).
The width of ITD tuning to sinusoidal signals is basically determined by the period of the stimulus. Low frequencies are weighted more heavily in responses based on envelope than in those based on fine structure, because envelope MTFs of IC cells typically extend further to low frequencies than their tuning to fine structure. ITD tuning therefore is typically broader to AM signals than to tones. However, even at high CFs, where phase-locking to fine structure is completely lacking, the ITD tuning to broadband noise can be surprisingly sharp (123). The presence of such tuning, in the absence of any ITD sensitivity to pure tones, indicates that envelope fluctuations generated by the interaction of the cochlear bandpass filters with the broadband stimulus can effectively be used in the computation of ITDs.
I. Contribution of Nonlinearities
For all but the lowest modulation depths, the response to a sinusoidal AM in the IC is not sinusoidal but more peaked with firing restricted to only part of the modulation cycle (144, 196, 222). As modulation depth is increased, changes also occur in the phase of the response histograms relative to the stimulus (144, 196, 222). Such changes are consistent with the response following the amplitude envelope at low modulation depths but changing to one which is sensitive to the rate of amplitude change at high depths. Sometimes this is associated with the appearance of a smaller second peak in the histogram indicative of a response to the downward amplitude change in the modulation cycle (222). Direct evidence for such responses comes from experiments using modulations with exponential envelopes (215).
Similarly, asymmetries have been reported in both the rate and temporal responses of IC neurons in guinea pig to exponentially ramped and damped sinusoids (197). When such ramped and damped stimuli have the same half-life, their long-term spectra are identical, but their different temporal structures generate quite distinct percepts (205). The percentage of units showing asymmetry in the magnitude of their temporal or rate responses to these stimuli is greater than obtained using similar analyses in the VCN (216), and the proportion of neurons showing response asymmetry at each stimulus half-life closely matched human psychophysical performance (205).
A few studies have investigated nonlinearities in the responses of IC neurons to AM using more complex modulation waveforms. Møller and Rees (189) recorded spike histograms synchronized to the period of a pseudorandom noise used to modulate a tone carrier. Cross-correlation of the pseudorandom noise with the histogram to obtain the impulse response followed by Fourier tranformation generates the tMTF. This estimate of the linear component of the response correlates well with responses obtained using sinusoidal modulation. An estimate of the nonlinear component can be obtained by using the impulse response to model the neuron, with the difference between the neuronal and model outputs providing a measure of the nonlinearities present in the neuronal response. The nonlinearities were predominantly even order, perhaps representing asymmetry in the response to increasing and decreasing sound intensity. Application of this technique to the owl IC similarly demonstrated the presence of significant nonlinearity (133). Such nonlinearities are more prominent in the response of IC neurons than those in the cochlear nucleus (184, 188).
The AM stimulus that ultimately holds the greatest interest for auditory neuroscience is human speech. Delgutte et al. (40) compared the encoding of modulated noise and a speech utterance at the levels of the auditory nerve, CN, and IC. Step responses derived from the responses to modulation indicate that responses to amplitude changes in the IC are more phasic than those in the auditory nerve and, to a lesser extent, the CN. This was borne out by the responses to speech sounds that were characterized by bursts of activity at the onsets of syllables. When the responses to the speech waveform were estimated with the linear component of the modulation, the model accurately predicted the neural response for neurons in the auditory nerve and cochlear nucleus, but the match for the IC was poor.
Although much less abundant than reports using sinusoidal modulation, these studies indicate that the emergence of nonlinear responses to modulated stimuli is a defining characteristic of processing in the IC, and the greater application of such nonsinusoidal AM stimuli is likely to add substantially to our knowledge of nonlinear mechanisms in the IC.
| IX. AMPLITUDE MODULATION ENCODING IN AUDITORY THALAMUS AND CEREBRAL CORTEX |
|---|
|
|
|---|
The medial geniculate body (MGB) of the thalamus is an obligatory station for auditory information from the midbrain to the cerebral cortex. Based on cytoarchitecture, connectivities, and physiological response properties, three main thalamic regions can be defined (304). Similarly, auditory cortex consists of several distinct fields that can be grouped into core, belt, and parabelt regions according to connectivity and physiology (130, 218). We discuss the projection systems set up in the thalamus and their relationship with the parcellation of auditory cortex.
The ventral division of the MGB (MGBv) is considered the principal part and is functionally distinguished by a clear tonotopy that is related to its laminar dendritic organization. The MGBv is functionally homogeneous with sharp frequency selectivity, short latencies, and low response thresholds. Several properties, such as the density of inhibitory interneurons, sharpness of tuning, onset latency, and strength of pure-tone phase-locking, vary systematically along the anterior-posterior axis, i.e., orthogonal to the frequency gradient (236). The axons from the ventral division terminate predominantly in tonotopically organized "core" areas of auditory cortex, specifically the primary auditory cortex (AI) as well as the anterior and posterior auditory fields (AAF and PAF, respectively) in the cat and field R in the macaque monkey. The projections from MGBv also reflect the anterior-posterior gradients so that, for example, AAF in the cat receives stronger input from the anterior pole, whereas PAF and the ventroposterior auditory field (VPAF) are chiefly connected with the posterior pole. The same holds for the numerous corticothalamic feedback projections from the cortical core regions to the MGBv.
Two further projection systems parallel to the tonotopic system have been identified. One "diffuse" or nontonotopic system is routed through the dorsal division of the MGB (MGBd). MGBd and its subdivisions are characterized by broad tuning, weak responses to tones, and some preference for more complex sounds. The dominant neurons are stellate cells, and the cortical projection is predominantly to nontonotopical fields in the belt and parabelt regions of auditory cortex such as the second auditory field (AII) in cat and CM in the macaque monkey. The third projection system is associated with the medial division of the MGB (MGBm). This "magnocellular" area is characterized by fairly large multipolar cells and receives polysensory inputs. No clear tonotopic organization is evident, and the neurons are usually broadly tuned or have multiple response areas. MGBm projects to a wide range of cortical fields including areas in the core, belt, and parabelt regions, and it also receives widespread corticothalamic feedback. In addition, the dorsal and medial projection systems are distinguished by their termination predominantly in layers I and VI, while inputs from the main tonotopic system end in layers IV and III.
Functional differences between the three projection systems and their associated regions have been mainly explored using spectral properties, such as frequency and intensity. Again, the importance of temporal dimensions in the perception of complex sounds suggests that much can be gained from the study of temporal response features in the different parts of auditory thalamus and cortex (31, 101, 210).
B. Temporal Responses in the MGB
Relatively few studies have addressed the capability of thalamic neurons to encode temporal information. A study of thalamic neurons in the awake guinea pig (34) revealed that some neurons phase-lock to AM tones with modulation frequencies up to 200 Hz. A more systematic study in the awake squirrel monkey (217) showed that most tMTFs were bandpass with tBMFs between 2 and 128 Hz. The most commonly encountered tBMF was at 32 Hz. MGBm had a higher median tBMF (16 Hz) than MGBv (8 Hz). Over the range of modulation frequencies tested, no significant difference was observed between rBMFs and tBMFs. This suggests that AM coding in the thalamus, at least below
100 Hz, is mostly conveyed by a temporal code accompanied by rate changes due to the phasic nature of the responses. To date, there is little information available that directly contributes to the question of the increasing prominence of rate-coding in the more central auditory stations.
Changes in modulation depth affect rate and synchronization differently; synchronization increased with increase in m, while the firing rate showed a nonmonotonic dependence. Changes in overall intensity of the AM signal resulted in either monotonic or nonmonotonic changes in firing rate and synchronization, with a higher percentage of nonmonotonic changes in synchronization.
Recently, a number of studies in a variety of structures have utilized complex auditory spectra to estimate the spectrotemporal receptive field (STRF) of neurons using reverse correlation methods (e.g., Refs. 2, 38, 56, 58, 142, 143, 148). The STRF can be interpreted as the average signal preceding an action potential, corresponding to the spectrotemporal impulse response of the neuron. STRF estimates of temporal resolution can be directly related to estimates using isolated AM sounds and would yield the same result in a linear system. Additionally, the use of complex spectra can reveal nonlinearities such as the dependence of the estimated filter shape on spectral and temporal depth of modulation and overall intensity. A recent analysis of temporal filter properties derived from STRFs in MGBv of ketamine-anesthetized cats (175) (Fig. 8) revealed a similar range of tBMFs (35 ± 30 Hz) to that observed in the awake guinea pig and squirrel monkey (34, 217). As seen with isolated AM signals, individual neurons could follow modulation frequencies above 100 Hz. Compared with AM responses in the IC, it appears that the overall range of temporal following capacity in the auditory thalamus is considerably reduced (Fig. 9).
|
A number of studies that have explored the coding of click trains in the auditory thalamus contribute significantly to our knowledge of temporal coding in the MGB. Changes in fm of an AM stimulus result in the systematic change of two potentially confounding aspects of the stimulus, namely, a change in the period between events and a change in the rise time of each event. To avoid the effects of rise-time changes with repetition rate, click trains have been widely used to explore temporal coding properties. While these two methods are not totally equivalent, they do capture closely related aspects of repetition rate coding. One of the first studies of temporal coding in the thalamus was carried out using click trains (284) in the awake, paralyzed cat. As in AM studies, maximum limiting rates (i.e., the highest click rate that showed any evidence of phase-locking) varied widely between 6 and 200 Hz. These findings were confirmed and expanded in a series of studies by Rouiller and colleagues (240, 241) in nitrous oxide-anesthetized cats. These investigators distinguished neurons by differences in the temporal precision of the responses. The largest group of neurons ("lockers," 71%) showed tight temporal locking to the clicks. "Groupers" (8%) responded with weak temporal synchrony, and "special responders" (21%) showed no clear phase-locked responses although changes in firing rate did occur, occasionally resulting in strongest responses for click rates between 200 and 400 Hz. Overall, limiting rates between 10 and 800 Hz were observed, and
50% of lockers had a limiting rate greater than 100 Hz. Keeping in mind that these limiting rates were not extracted at the 50% value of the transfer functions (the traditional measure of limiting rate), and the inherent differences between click-train analysis and AM analysis, the actual range of temporal resolution estimated by this method appears to be compatible with that observed in AM studies.
Rouiller and De Ribaupierre (240) reported some differences between thalamic subdivisions regarding the percentage of lockers. More lockers were located in the anterior region of MGBv than in the posterior portion, and the highest limiting rates were also encountered in the anterior part. They observed no clear CF dependency for the distribution of lockers but noticed that the lockers had shorter latencies than groupers and special responders. Furthermore, lockers with limiting rates above 100 Hz had response latencies
23 ms shorter than lockers with limiting rates below 100 Hz, similar to the latency-BMF correlation found in the IC (153). No obvious differences in the distribution and range of limiting rates were found between recordings made in the nitrous oxide-anesthetized and awake preparations.
In summary, AM phase-locking in thalamic neurons varies over a wide range from a few Hertz to several hundred Hertz. Some neurons can follow high rates, but the majority of neurons appear to peak at rates below 100 Hz. A subgroup of neurons may respond to temporal information with changes in firing rate rather than in phase-locking; however, the proportion of such a group and its properties are still unexplored. It appears that the majority of neurons show limiting rates below that of the IC, but a detailed comparative study of the transformation of temporal coding from the IC to the MGB is still lacking.
C. Responses to AM in Primary Auditory Cortex: Synchronization
A number of studies provided initial evidence that temporal coding in auditory cortical neurons may be substantially reduced compared with subcortical levels (Fig. 9). Studies with FM and AM in the awake cat (300) and guinea pig (34) showed neurons had maximum following rates of <30 Hz. In later studies, the range of synchronization of AI neurons to AM was systematically explored in a variety of species. A high percentage of neurons showed band-pass tMTFs (53, 75, 157, 256). The tBMF values in AI were found to be independent of the CF of the neurons (53, 157, 256). Accordingly, temporal information in different frequency channels can be processed independently from each other; within each spectral band, AM information can be decomposed by different neurons into different AM ranges. Much attention has therefore been given to the distribution of optimal modulation frequencies. Preferred modulation frequencies commonly vary between 1 and 40 Hz with the vast majority of tBMFs below 20 Hz. Across all studies, tBMFs above 50 Hz were encountered in only a very small percentage of neurons but could occasionally be as high as 100 Hz (17, 157, 255, 256). The composite tMTF in cat AI (ketamine anesthesia), constructed as the weighted sum of all tMTFs measured, shows a tBMF of 12.8 Hz and a 50% cut-off frequency of 37.4 Hz (Fig. 8) (176).
It is tempting to regard the presence of modulation tuning and the range of BMFs as a physiological implementation of a modulation filterbank (e.g., Ref. 35), but the functional consequences of these cortical (and subcortical) observations are at present unclear and should not be overstated. When "spatial-frequency channels" were first described in visual psychophysics and spatial-frequency tuning was later found physiologically, it was suggested that these channels formed the basis for a visual Fourier analysis of the retinal image, but this notion has been discredited (303). There is currently no unequivocal evidence that modulation tuning underlies an analysis of the modulation spectrum in the sense that the cochlea performs an analysis of stimulus spectrum. For example, will an envelope with a low fundamental (e.g., to speech syllables) but fast components (i.e., broad envelope spectrum) recruit neurons tuned to high modulation frequencies? Is the relative phase of different envelope components somehow reflected in neural synchronization or average rate? Even if modulation-tuned neurons do not perform a full envelope decomposition in the Fourier sense, it is easy to see that such envelope tuning could be useful in other ways. For example, modulation tuned channels could parse spectral stimulus components according to their dominant modulation frequency so that the spectral components with a common modulation frequency can be grouped in a further step.
Differences in temporal processing between cortical neurons and their thalamic inputs are not only evident from population comparisons but were directly observed in functionally connected thalamocortical neuron pairs (34, 175) and were also evident in current source density analysis of the thalamic input and cortical output layers of AI (274). While these correlation studies reveal a reduction of temporal following capacities from MGBv to AI, the temporal modulation preferences in thalamus and cortex are not correlated by rank (175), i.e., thalamic cells with high (low) BMFs do not preferentially project to cortical cells with high (low) BMFs. These findings strongly suggest that a transformation of temporal response properties takes place at the thalamocortical interface.
The width of the transfer function provides a measure of response selectivity. For individual neurons, the bandwidth of tMTFs, estimated at 50% of the maximum, is in the range of the BMF values but can vary by a factor of >5 (53, 176, 256) in the anesthetized cat. Bandwidth variations of tMTFs in the awake marmoset monkey (157) are of similar magnitude. This means that AM selectivity varies considerably among cortical neurons but that overall the selectivity is relatively poor.
Variations in species, anesthetic state, and estimation method between the different studies do not permit an easy comparison to sort out these different influences on envelope processing. However, it appears that neither anesthesia nor species-specific effects provide strong influences on the tBMF distribution of cortical neurons. This is not to say that there are no anesthetic effects; however, given the fairly large range of variability in the conditions of these studies, a simple group evaluation is unlikely to provide such evidence.
The range for time-locked AM coding appears to be limited to the envelope frequencies underlying the perception of rhythm, roughness, and the following rate of syllables in communication sounds. The cortical coding of higher modulation frequencies, important for voicing or periodicity pitch information, does not seem to fully utilize the same temporal code.
D. Responses to AM in Primary Auditory Cortex: Average Rate
In view of the successive reduction in envelope synchronization already discussed for the different synaptic stages leading up to cortex, it is not too surprising to find the reduction in tBMF. Adverse effects on synchronization should however not necessarily affect rate tuning. For example, exquisite frequency and ITD selectivity in average rate is found at the cortical level and can be sharper than in the brain stem. Therefore, we expect to find envelope tuning in rMTFs, as it is already prominently present in the IC.
Bandpass rMTFs are indeed found but appear less common than bandpass tMTFs. In the rat, >90% of the tMTFs showed bandpass characteristics while only 30% of the rMTFs were bandpass (75). In AI of the awake squirrel monkey (17), this difference was less pronounced, with bandpass behavior for 49% of the tMTFs compared with 39% of rMTFs. The remaining neurons were either low pass, high pass, all pass, or had complex filter shapes. Similar results were reported for the cat (48). In awake marmosets, 73% of AI units had bandpass rMTFs, and many neurons were only driven when temporal modulations were present (157).
An important difference with tMTF tuning is the consistent observation that the tuning for rMTFs extends to higher modulation frequencies, although it is still quite limited compared with the brain stem. There is also a fairly large variance, possibly related to the use of anesthesia, in the reported range of rBMFs and upper cut-off frequencies (e.g., as defined by a 50% reduction in rate) obtained across the various studies in AI (Fig. 9). The majority of rBMFs in anesthetized studies are below 50 Hz (46, 49, 53, 75, 256). Studies in awake animals (17, 34, 157, 190, 247, 260) yielded rBMFs that were either not substantially different from those in anesthetized animals or differed by less than a factor of two. The effect of anesthesia seems to affect the strength of the response (sustained in unanesthetized animals, onset under anesthesia) more than the range of BMFs. The reduction of the upper cut-off frequencies in tMTFs by anesthesia may be more substantial than on rMTFs (52, 83, 163) and may affect the temporal coding capacity for the highest temporally coded AM frequencies including the range of AM frequencies associated with the perceptual attributes of roughness and periodicity pitch (64).
The general finding that BMFs and upper cut-off frequencies are higher in the rMTF than in the tMTF led Bieser and Müller-Preuss (17) to suggest that "low modulation rates were mostly encoded by phase-locked neural responses and the higher AM sounds by non-phase-locked spike rate variations." While the experimental evidence for this claim was suggestive but not conclusive, Lu et al. (161) demonstrated more forcefully that this notion might indeed be true and proposed a two-stage model in which temporal modulations are combined over an integration window of
30 ms; temporal patterns separated by intervals longer than 30 ms are coded explictly in temporal form, while more rapid patterns are coded implicitly by average rate.
It is not entirely clear whether this scheme can fully account for the coding of modulations since, even in awake animals and for only a small fraction of the cells, rBMFs reach maximal values of only a few hundred Hertz. This is only an octave above the highest tBMFs (even when measured on the same cells, e.g., Ref. 157) and lower than the upper limit for periodicity pitch (
800 Hz) and modulation detection (
2.2 kHz). The markedly reduced cortical upper limit, particularly compared with the brain stem, is in stark contrast to the upper limit for ITD sensitivity to AM signals, which appears not to differ between cortex and brain stem and extends to modulation frequencies up to 1,000 Hz (awake rabbit, Ref. 67). Thus envelope-based ITD tuning created in the brain stem is relayed without degradation or recoding to AI, whereas this does not appear to be the case for AM bandpass tuning.
Schulze and Langner (259, 261) suggested an alternative coding strategy; in AI of the awake as well as the anesthetized gerbil, these investigators showed rate tuning of cortical neurons to AM between 50 and 3,000 Hz, clearly outside the range of cortical phase-locking, but only when the carrier frequency was placed far above the cell's CF. A preliminary study (171) reported similar sensitivity in the IC but attributed the mechanism to difference tones generated in the cochlea, i.e., interpreted it as a spectral rather than a temporal effect. Since psychophysical studies indicate that the perception of periodicity pitch does not depend on difference tones, it is unclear whether the mechanism proposed by Schulze and Langner provides its neural basis, although the authors raise several indirect counterarguments against the role of difference tones as the explanation for their observations.
Overall, then, the timing of cortical discharge encodes low modulation frequencies corresponding to the perceptual ranges characterized by rhythm and fluctuation strength (48, 53, 60) and, potentially, roughness (64, 255). A code based on the mean firing rate may represent fast AMs such as those associated with periodicity pitch, but it remains unclear whether these two coding strategies adequately explain AM coding over the entire perceptual range.
E. Responses to AM in Primary Auditory Cortex: Influence of Modulation Parameters
The results discussed above were mostly derived with a modulation depth (m) of 100%. Decrease in m results in monotonically reduced synchronization (60), especially for m < 0.5 (49). In the awake squirrel monkey, 86% of the neurons had maximum synchronization for 80100% modulation and showed a monotonic decrease with reduction of m. Average firing rate was essentially constant as function of modulation depth (17). Values of rBMF and tBMF were little affected by m in the awake marmoset (157).
Changes in the overall intensity resulted in minor influences on BMF, cut-off frequency, and shape of the MTF (46, 157, 255). However, the firing rate showed a strong effect with intensity revealing a limited range of best levels (49). Phillips and colleagues (211, 212) noticed intensity-specific differences between the responses to low and high modulation frequencies. Better responses were observed for higher modulation frequencies at low intensities and for low modulation frequencies at higher intensities; that is, the shape of MTFs can be level dependent. The rMTF appears to be more resistant to changes in SPL than the tMTF (157).
In a few studies, the effect of the modulation wave-form was investigated. These observations suggest a common temporal window within which afferent signals are integrated. Rectangular AM resulted in stronger response synchrony than sinusoidal AM, but the tBMFs were similar (255, 256). Modulation with an exponential sine-wave envelope increased the sharpness of modulation tuning with decreasing duty cycle but showed no dramatic effects on BMF or cut-off frequency (49). Temporal synchronization to binaural beats (generated by binaural interaction in the brain stem, see sect. VI) also revealed cut-off frequencies of <40 Hz (219). Moreover, results from the awake primate (157) indicate that BMFs for AM and FM are often closely matched for single neurons.
Using dynamic ripple spectra, i.e., spectral envelopes that are periodic along the frequency axis, to determine the temporal impulse response properties in AI by reverse correlation in anesthetized ferrets (142) and cats (175) revealed tBMFs that essentially overlapped with the value range seen in several other species estimated with AM tones. Direct comparison between two carrier types showed either no significant difference in the tBMFs for tonal and noise carriers (53, 217) or an average tBMF that is slightly lower for tonal carriers (49). This suggests that the carrier bandwidth may have little influence on temporal coding properties.
F. Differences of Temporal Coding Between Cortical Fields
In view of the differences between thalamic subdivisions in terms of thalamocortical connectivity (see sect. IXA) and temporal responses (see sect. IXB), it is of interest whether neurons in different cortical fields also differ in their ability to code temporal information (Fig. 9). Field AAF in the cat, a component of the core area like AI, shows evidence of higher BMFs and limiting rates than AI (111, 255, 256). There is some evidence of spatial clustering in AAF with faster following neurons more abundant for CFs above 10 kHz (53, 111, 255). Further evidence of faster following rates in AAF over AI has been obtained from STRF measurements in mice (159). The duration of STRFs from AAF was found to be shorter than in AI. Because STRF duration is inversely related to the BMF of tMTFs, it follows that AAF neurons have higher BMFs compared with AI. Another predictor for repetition following capacity is the onset latency of isolated CF tones or clicks. Schreiner and Raggio (253) reported a weak but significant negative correlation in cat AI for click latency and BMF, similar to results in the IC (153) and MGB (240). Onset latencies in AAF of cats (111) and mice (159) are shorter than in AI, further supporting the notion that AAF has a higher following capacity than AI.
Cortical fields outside the core areas seem to perform at even lower temporal fidelity than that found in AI. In the cat, tBMFs and rBMFs of cortical fields AII, PAF, and VPAF were 2080% of those seen for AI (53, 256). Similar results were found in the awake squirrel monkey (17). In the latter study, three groups of cortical fields could be distinguished based on their temporal properties. A group containing AI had average BMFs of
8 Hz; a group that included the rostral field and the insula had BMFs of 4 Hz and below, and a group containing the anterior-lateral field had a predominance of BMFs around 2 Hz. Combined, these findings suggest that hierarchically "higher" auditory cortical fields primarily receiving input from thalamic projections other than the ventral nucleus appear to show slightly but consistently slower following capacity when tested with AM stimuli than primary cortical fields.
The cause for the reduced temporal following capacity of cortical neurons compared with subcortical stations is still not entirely clear. A diversity of cellular and network properties are likely to affect cortical temporal behavior. These include mechanisms of adaptation and post-excitation suppression (19, 25, 116), postsuppression rebound (42, 47, 75), intrinsic oscillation (42, 75, 106, 134, 249), and synaptic depression (1, 169, 170). It has been suggested that tBMFs are largely determined by processes intrinsic to the cortical-thalamic network while cut-off frequency seems to be influenced by intrinsic pyramidal cell mechanisms (51). Models that include dynamic synaptic processes have been proposed that can account for many aspects of cortical responses to various repetitive signal envelopes, including sinusoidal AM stimuli (41, 54, 55). Eggermont (55) demonstrated that the envelope synchronization of cortical activity can be modeled based on two main components: the degree of input or presynaptic synchrony and the shape of a temporal filter that is determined by properties of synaptic dynamics. The input synchrony is highly dependent on the shape of the envelope waveform and reflects peripheral integrative mechanisms that determine response latency and spiking jitter (102). The properties of the synaptic dynamics are less stimulus dependent and reflect cortical synaptic activity changes after repeated stimulation that cause short-term synaptic depression or facilitation (1, 169, 170). The synaptic dynamic acts as a temporal low-pass filter on the synchronized input and is dominated by synaptic depression. This two-stage model of cortical modulation transformation holds great promise in unifying many aspects of temporal envelope processing (55) and other temporal behaviors of cortical neurons (41). It is likely, however, that other, conceivably nonlinear, influences also contribute to the shaping of MTFs. This is indicated by the observed relationships of onset latency and the period of intrinsic oscillations with BMF as well as the effects of spectral and temporal stimulus composition on cortical adaptation behavior (19, 280).
H. Temporal Coding of Complex Sounds
Most studies of complex multisyllable or multi-"phrase" communication sounds in auditory cortex noted that neuronal responses were predominantely located at the beginning of each phrase provided that the phrases did not follow each other at rates of more than 2030 Hz. This effect was not dependent on the species-specific nature of the calls and was seen for speech sounds as well (50). For example, responses to bird songs in cat auditory cortex (273) showed preferred response intervals corresponding to
10 Hz. Responses to species-specific calls in awake squirrel monkey (74), anesthetized squirrel monkey (192), and anesthetized marmoset (292) all showed "phrase"-locking in the response to repetitive call phrases around 812 Hz. Similar values were obtained in the awake guinea pig to various bird and guinea pig vocalizations (34). Wang et al. (292) tested whether the temporal response to complex sounds was tuned like the response to more elemental sounds by using stretched and compressed natural vocalizations of marmosets, without changes in the spectral content of the calls. The responsiveness to the calls was maximal at the natural repetition rate of the phrases near 8 Hz. In other words, the tMTF of most neurons was tuned to the repetition rate of the natural call. Similarly, Nagarajan et al. (192) reported that the response modulation rates of cortical neurons activated by vocalizations in the marmoset monkey were highly correlated with the BMFs found for AM tones.
The pulse repetitions in echolocation calls of bats are another example of temporal structures that require detailed processing by the auditory system. Phase-locked responses of cortical neurons in the bat occur over similar ranges as found for AM and click trains in other mammalian species. Sixty percent of BMFs in AI of Eptesicus fuscus were at or below 10 Hz but could be as high as 83 Hz (116). Pulse repetition coding in the awake FM bat Myotis lucifungus and the mustached bat Pteronotus parnellii had limiting rates of
100 Hz (308) and up to 300 Hz (276), respectively, commensurate with the behaviorally relevant range of timing used in echolocation.
A likely strategy for encoding of complex sounds in auditory cortex is by the temporal-spatial discharge pattern of distributed neuronal populations across the cortical fields (34, 207; see also Refs. 30, 39). Initial studies of the response of cortical neurons to vocalizations (34, 306, 307) combined with more recent studies of the detailed representation of species-specific vocalizations (192, 292) and speech sounds (309) in the primary auditory cortex of New World monkeys and cats provide evidence that behaviorally relevant vocalizations are well represented by spatially distributed but temporally highly coherent neuronal discharges. At major transitions during the course of the signal, a temporally coherent activation of specific neuronal subpopulations across the cortical fields is created. The synchronous timing of responses across many sites in primary auditory cortex (and in parallel in other cortical fields) may provide the necessary means for appropriate grouping or segregation of sequential elements in ongoing foreground and background sounds. The range of modulation frequencies spanned by cortical tMTFs of generally moderate selectivity may be sufficient to provide representational and, perhaps, perceptual invariances of complex sound sequences despite potentially large variations in phoneme rate or in the sequence rate of musical tones. The distributed representation of temporal envelope information in each carrier frequency band allows a segregated processing of different temporal phenomena within a given frequency "channel" as well as processing of similar temporal aspects across frequency channels (194).
I. Plasticity of Temporal Coding Properties in Auditory Cortex
Studies of representational plasticity in auditory cortex of adult animals have largely focused on spectral properties, but several studies have recently examined temporal properties and reported use-dependent changes in the tMTF. Beitel et al. (15) trained owl monkeys to discriminate between two different, sequentially presented, AM rates and rewarded the animals when they correctly indicated that the second stimulus had a higher AM rate. The modulation frequencies were chosen to be in a range (440 Hz) where they could induce phase-locked cortical responses. Over the course of the training, AM discrimination thresholds gradually improved. Analysis of the tMTFs of the trained animals revealed that the shape of the transfer function changed dramatically. As a consequence, average limiting rate more than doubled from 12 Hz to >30 Hz, and BMF increased from 8 to 15 Hz. This result indicates that temporal coding properties of cortical neurons can be modified by learning.
Studies in rat AI investigated the influence of the statistics of the input signal on the reorganization of auditory cortex (138, 139). Stimulation of the nucleus basalis in the basal forebrain has been shown to increase the potential for cortical plasticity without explicit behavioral training of the animals (45, 92, 98, 174). Pairing of nucleus basalis stimulation with acoustic stimulation (139) caused pronounced changes in the tMTFs which depended on the temporal properties of the stimuli paired with the electrical stimulation (Fig. 10). A 2040% increase of the BMF and cut-off frequency was observed when the modulation frequency of the acoustic stimulus was slightly higher than the normally observed values of the tMTFs. Pairing of electrical stimuli with modulation frequencies below the normal tBMF values caused a decrease in the neuronal cut-off frequencies.
|
These results indicate that important aspects of temporal properties of the cortex undergo plastic reorganization, reflect aspects of the temporal statistics in the input stimuli, and can be modified by mechanisms involved in learning to match specific auditory tasks even in fully mature animals.
| X. NEUROPHYSIOLOGICAL AND PSYCHOLOGICAL STUDIES IN HUMANS |
|---|
|
|
|---|
Ablations and lesions of auditory cortex have been shown to interfere with the processing of temporal tasks, such as the order of events (193), discrimination between 10- and 300-Hz trains of noise bursts (277), the detection of AM frequencies below but not above
30 Hz (89), and the perception of periodicity pitch (299), to name a few examples. Studies in patients with primary cortical lesions resulting in "word deafness" also show evidence for deteriorated temporal processing capacities (88). In addition, it has been argued (87) that the pathway up to and including primary auditory cortex is not sufficient for the detection of continuous AM in humans. The range of these perceptual deficits encompasses the cortical range of temporal as well as the rate-encoded AM frequencies, corroborating the importance of the coding of envelope phenomena in auditory cortex and in some of the cortical regions to which it connects.
| XI. CONCLUSION |
|---|
|
|
|---|
However, if we ignore differences along the auditory neuraxis for a moment and take stock of the variety of responses reviewed, a rather optimistic view emerges of neural mechanisms dedicated to AM processing. Indeed, these responses show some of the key properties that are generally considered indicative for the coding of stimulus parameters. Tuning to modulation frequency is prominently present temporally and in average rate, and the range of optimal modulation frequencies so represented spans perceptually relevant ranges. The tuning can show invariance with SPL, modulation depth, and type of carrier and be predictive of the response to complex modulation waveforms in natural stimuli. There is even suggestive evidence for topographic mapping of modulation frequency. Selectivity to modulation waveforms or modulation paradigms more complex than the basic sinusoidally modulated tone are beginning to be reported.
There are several neurobiological avenues to further explore and strengthen the case for dedicated modulation mechanisms and their link to perception. Review of the available data suggests that the most immediate gain, with existing tools, can be expected from inventive stimulus paradigms. Although sinusoidal AM may be considered a complex stimulus in the frequency domain, it is an elementary but simple stimulus in the modulation domain. The vast majority of studies of modulation processing have used single sinusoidal AM tones and have focused on modulation tuning. This is a necessary starting point, but to make a convincing case for the relevance of the tuning observed, the stimulus arsenal should be expanded. Current technology enables synthesis of more complex stimuli that are amenable to parametric exploration yet a step closer to natural stimuli. There are still basic unanswered questions to be addressed with sinusoidal AM, but it is equally clear that important properties and selectivities are only manifest with the use of nonsinusoidal envelopes or stimulus paradigms that involve modulation in ways that are closer to real-world tasks faced by the auditory system. Clever use of such paradigms is likely to make either the skeptical or optimistic view prevail.
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
During the preparation of this review, P. X. Joris was supported by the Fund for Scientific Research-Flanders Grants G.0297.98 and G.0083.02 and Research Fund K.U. Leuven Grant OT/01/42; C. E. Schreiner was supported by National Institutes of Health Grants DC-02260 and NS-34835; and A. Rees was supported by the Wellcome Trust.
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: P. X. Joris, Laboratory of Auditory Neurophysiology, K.U. Leuven, Campus Gasthuisberg, B-3000 Leuven, Belgium (E-mail: Philip.Joris{at}med.kuleuven.ac.be).
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H. Asari and A. M. Zador Long-Lasting Context Dependence Constrains Neural Encoding Models in Rodent Auditory Cortex J Neurophysiol, November 1, 2009; 102(5): 2638 - 2656. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Ding and J. Z. Simon Neural Representations of Complex Temporal Modulations in the Human Auditory Cortex J Neurophysiol, November 1, 2009; 102(5): 2731 - 2743. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. F. Brugge, K. V. Nourski, H. Oya, R. A. Reale, H. Kawasaki, M. Steinschneider, and M. A. Howard III Coding of Repetitive Transients by Auditory Cortex on Heschl's Gyrus J Neurophysiol, October 1, 2009; 102(4): 2358 - 2374. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. H. Lim, M. Lenarz, and T. Lenarz Auditory Midbrain Implant: A Review Trends in Amplification, September 1, 2009; 13(3): 149 - 180. [Abstract] [PDF] |
||||
![]() |
M. Schonwiesner and R. J. Zatorre Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI PNAS, August 25, 2009; 106(34): 14611 - 14616. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Itatani and G. M. Klump Auditory Streaming of Amplitude-Modulated Sounds in the Songbird Forebrain J Neurophysiol, June 1, 2009; 101(6): 3212 - 3225. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bertoncini, W. Serniclaes, and C. Lorenzi Discrimination of Speech Sounds Based Upon Temporal Envelope Versus Fine Structure Cues in 5- to 7-Year-Old Children J Speech Lang Hear Res, June 1, 2009; 52(3): 682 - 695. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Lee, E. Skoe, N. Kraus, and R. Ashley Selective Subcortical Enhancement of Musical Intervals in Musicians J. Neurosci., May 6, 2009; 29(18): 5832 - 5840. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zahar, A. Reches, and Y. Gutfreund Multisensory Enhancement in the Optic Tectum of the Barn Owl: Spike Count and Spike Timing J Neurophysiol, May 1, 2009; 101(5): 2380 - 2394. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-R. Geis and J. G. G. Borst Intracellular Responses of Neurons in the Mouse Inferior Colliculus to Sinusoidal Amplitude-Modulated Tones J Neurophysiol, April 1, 2009; 101(4): 2002 - 2016. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. N. Woolley, P. R. Gill, T. Fremouw, and F. E. Theunissen Functional Groups in the Avian Auditory System J. Neurosci., March 4, 2009; 29(9): 2780 - 2793. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zheng and M. A. Escabi Distinct Roles for Onset and Sustained Activity in the Neuronal Code for Temporal Periodicity and Acoustic Envelope Shape J. Neurosci., December 24, 2008; 28(52): 14230 - 14244. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Overath, S. Kumar, K. von Kriegstein, and T. D. Griffiths Encoding of Spectral Correlation over Time in Auditory Cortex J. Neurosci., December 3, 2008; 28(49): 13268 - 13273. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Agapiou and D. McAlpine Low-Frequency Envelope Sensitivity Produces Asymmetric Binaural Tuning Curves J Neurophysiol, October 1, 2008; 100(4): 2381 - 2396. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Krebs, N. A. Lesica, and B. Grothe The Representation of Amplitude Modulations in the Mammalian Auditory Midbrain J Neurophysiol, September 1, 2008; 100(3): 1602 - 1609. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Bendor and X. Wang Neural Response Properties of Primary, Rostral, and Rostrotemporal Core Fields in the Auditory Cortex of Marmoset Monkeys J Neurophysiol, August 1, 2008; 100(2): 888 - 906. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Middlebrooks Auditory Cortex Phase Locking to Amplitude-Modulated Cochlear Implant Pulse Trains J Neurophysiol, July 1, 2008; 100(1): 76 - 91. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Middlebrooks Cochlear-Implant High Pulse Rate and Narrow Electrode Configuration Impair Transmission of Temporal Information to the Auditory Cortex J Neurophysiol, July 1, 2008; 100(1): 92 - 107. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Atencio and C. E. Schreiner Spectrotemporal Processing Differences between Auditory Cortical Fast-Spiking and Regular-Spiking Neurons J. Neurosci., April 9, 2008; 28(15): 3897 - 3910. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Chase and E. D. Young Cues for Sound Localization Are Encoded in Multiple Aspects of Spike Trains in the Inferior Colliculus J Neurophysiol, April 1, 2008; 99(4): 1672 - 1682. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. D Young Neural representation of spectral and temporal information in speech Phil Trans R Soc B, March 12, 2008; 363(1493): 923 - 945. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. K. Pandya, D. L. Rathbun, R. Moucha, N. D. Engineer, and M. P. Kilgard Spectral and Temporal Processing in Rat Posterior Auditory Cortex Cereb Cortex, February 1, 2008; 18(2): 301 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Malone, B. H. Scott, and M. N. Semple Dynamic Amplitude Coding in the Auditory Cortex of Awake Rhesus Macaques J Neurophysiol, September 1, 2007; 98(3): 1451 - 1474. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Tan and J.G.G. Borst Comparison of Responses of Neurons in the Mouse Inferior Colliculus to Current Injections, Tones of Different Durations, and Sinusoidal Amplitude-Modulated Tones J Neurophysiol, July 1, 2007; 98(1): 454 - 466. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wohlgemuth and B. Ronacher Auditory Discrimination of Amplitude Modulations Based on Metric Distances of Spike Trains J Neurophysiol, April 1, 2007; 97(4): 3082 - 3092. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Bartlett and X. Wang Neural Representations of Temporally Modulated Signals in the Auditory Thalamus of Awake Primates J Neurophysiol, February 1, 2007; 97(2): 1005 - 1017. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Nelson and L. H. Carney Neural Rate and Timing Cues for Detection and Discrimination of Amplitude-Modulated Tones in the Awake Rabbit Inferior Colliculus J Neurophysiol, January 1, 2007; 97(1): 522 - 539. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhang and J. B. Kelly Responses of Neurons in the Rat's Ventral Nucleus of the Lateral Lemniscus to Amplitude-Modulated Tones J Neurophysiol, December 1, 2006; 96(6): 2905 - 2914. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Dreyer and B. Delgutte Phase Locking of Auditory-Nerve Fibers to the Envelopes of High-Frequency Sounds: Implications for Sound Localization J Neurophysiol, November 1, 2006; 96(5): 2327 - 2341. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Batra Responses of Neurons in the Ventral Nucleus of the Lateral Lemniscus to Sinusoidally Amplitude Modulated Tones J Neurophysiol, November 1, 2006; 96(5): 2388 - 2398. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Fujioka, B. Ross, R. Kakigi, C. Pantev, and L. J. Trainor One year of musical training affects development of auditory cortical-evoked fields in young children Brain, October 1, 2006; 129(10): 2593 - 2608. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Ahmed, J. A. Garcia-Lazaro, and J. W. H. Schnupp Response linearity in primary auditory cortex of the ferret J. Physiol., May 1, 2006; 572(3): 763 - 773. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Firzlaff, S. Schornich, S. Hoffmann, G. Schuller, and L. Wiegrebe A Neural Correlate of Stochastic Echo Imaging J. Neurosci., January 18, 2006; 26(3): 785 - 791. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. X. Joris, B. van de Sande, A. Recio-Spinoso, and M. van der Heijden Auditory Midbrain and Nerve Responses to Sinusoidal Variations in Interaural Correlation J. Neurosci., January 4, 2006; 26(1): 279 - 289. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Xie, J. Meitzen, and G. D. Pollak Differing Roles of Inhibition in Hierarchical Processing of Species-Specific Calls in Auditory Brainstem Nuclei J Neurophysiol, December 1, 2005; 94(6): 4019 - 4037. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. O. Kanold and P. B. Manis Encoding the Timing of Inhibitory Inputs J Neurophysiol, May 1, 2005; 93(5): 2887 - 2897. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Las, E. A. Stern, and I. Nelken Representation of Tone in Fluctuating Maskers in the Ascending Auditory System J. Neurosci., February 9, 2005; 25(6): 1503 - 1513. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. H. G. Louage, M. van der Heijden, and P. X. Joris Enhanced Temporal Response Properties of Anteroventral Cochlear Nucleus Neurons to Broadband Noise J. Neurosci., February 9, 2005; 25(6): 1560 - 1570. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Ulanovsky, L. Las, D. Farkas, and I. Nelken Multiple Time Scales of Adaptation in Auditory Cortex Neurons J. Neurosci., November 17, 2004; 24(46): 10440 - 10453. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |