• Ei tuloksia

2.3 Acoustics

2.3.2 Acoustic parameters and their perceptual correlates

2.3.2.5 Formant frequencies – sound timbre

Formants are distinctive frequency components of the acoustic signal produced by singing (or speech). They are the broad peaks or the local maxima in the spectrum that result from acoustic resonances of the human vocal tract. When the nasal and

oral cavities remain in the same position, they will amplify the same frequency ranges regardless of the fundamental frequency of the source. The frequency response of the vocal tract filter represents the effect that the vocal tract shape would have on any sound that travels through it (Story, 2016). The term “broad peak” is usually associated with the radiated spectrum (the produced sound itself), and “local maximum” is used when the term formant is used to describe the production mechanisms of sound including both the vocal tract transfer function and the sound source ( Titze, 1994; Titze et al., 2015; Vurma, 2020). The current ANSI (2020) definition for formant is “a range of frequencies in which there is an absolute or relative maximum in the sound spectrum. The frequency at the maximum is the formant frequency.” In this study, formant is defined as the peak of enhanced spectral energy in the output spectrum and resonance as the natural frequency of the vocal tract (Story, 2016; Titze et al., 2015).

The interruption of airflow by the vibrating vocal folds creates a sound pressure wave that consists of the fundamental frequency and multiple overtones, which are called harmonic partials or harmonics (fo is the first harmonic, and it is denoted as 1fo). Partials radiating from the glottis are evenly spaced (they are multiple integers of fo), and they decrease in amplitude the faster they oscillate (the higher their frequency).

As this sound pulse (all the partials of the emitted sound) moves through the air molecules in the vocal tract, it changes as it hits obstacles (narrow and wide spaces).

The obstacles are put in place by moving articulators, and they constantly shape the voice source spectra during singing. The four most influential vocal tract articulatory chambers are the epilarynx, the pharynx, the oral cavity, and a small area located just behind the lower front teeth (in the oral cavity). When sound travels through these chambers, it can be reinforced or dampened. The larger the chamber, the lower the partials that are amplified, and conversely, the smaller the chamber, the higher are the amplified partials. By shortening, lengthening, narrowing, or expanding any of these chambers or by shortening or elongating the whole vocal tract, the resonance frequencies of the vocal tract and thus the frequency regions that are amplified within the spectrum are moved lower or higher within the spectrum. When one or all of the frequency regions is changed, the perceived voice quality also changes (Welch, Thurman, Theimer, Grefsheim, & Feit, 2000).

The product leaving the lips is called the radiated spectrum, and it is from this transformed spectrum that the formant frequencies can be detected. The output pressure spectrum is the combination of the glottal flow spectrum (source spectrum)

and frequency response of the vocal tract. Formant frequencies can be estimated from the frequency spectrum of the sound using a spectrum analyzer. Estimating acoustic resonances of the vocal tract can be done with linear predictive coding, and an intermediate approach can be taken by eliminating the fundamental frequency from the spectral envelope and then looking for the local maxima in the spectral envelope (Laukkanen & Leino, 2001; Pulkki & Karjalainen, 2015). The amplitude at each frequency in the output spectrum is the sum of the amplitudes of the source spectrum and filter resonances. The fundamental frequency and all of the overtones are present in the output spectrum, but their amplitudes have been modified by the vocal tract resonances; harmonics near a formant frequency are enhanced in amplitude, while those distant from the formants are suppressed (Story, 2016).

The ability of the vocal tract to transfer sound is increased near and between two formants if the frequency distance between these formants is decreased either by voluntary or involuntary articulatory movements. This is what happens in the so-called singer’s formant, where formants F3-F5 move closer to each other to form a formant cluster. When two or more sounds contribute to the same sound field, they can affect the total field in a different way. As the broadbands for filter resonance are quite broad, the formants can coexist in the same resonating frequency area, making their pressure values add up. This phenomenon is called the linear superposition of waves. If the sound sources are coherent, meaning that they or their partials have the same frequencies, then depending on their phase they can either add constructively (same phase) or destructively (opposite phase). This corresponds to a 6 dB increase/decrease in sound level. If the sound sources are incoherent, meaning that their frequencies do not coincide, the powers of the signals are summed, adding up to a 3 dB increase/decrease in sound level (Pulkki & Karjalainen, 2015; Sundberg, 1987; Titze, 1994; Titze et al., 2015; Welch et al., 2000). The sound at the formant frequencies is generated by one sound source, which means that even though the sound itself has many partials, they all exist at different frequencies and thus do not coincide with each other. This means that the extra power gained with clustering the formants would give the singer a 3 dB advantage in sound projection when using the “singer’s formant.” The effect the singer’s formant has on loudness is greater than its effect on SPL due to the sensitivity of hearing in the frequency area of the cluster (Pulkki & Karjalainen, 2015; Sundberg, 1987; Titze, 1994; Titze et al., 2015; Welch et al., 2000).