Production of speech

(1)

Modelling of speech

for oral and maxillofacial surgery (COMSPEECH)

Docent Jarmo Malinen, DSc (tech.)

Aalto University,

Dept. Mathematics and Systems Analysis

May 20th, 2016

(2)

Goals and current stage

Thegoals of the projectare:

To develope a method that can be utilized in predicting the changes in the speech production caused by the treatments affecting the anatomy of the vocal tract:

surgery of oral cancer, reconstructive surgery, orthognathic surgery,

prosthodontic rehabilitation of jaw defects, etc.

To clarify anatomical and functional causes of speech disorders.

To contribute to the speech therapy.

To advance the basic knowledge of phonetic phenomena and speech production; in particular, the production of vowels.

Thecurrent stageof the project is inimprovement and validation of the accuracy of our mathematical model “DICO” for speech production in patients undergoing orthognathic surgery.

(3)

Production of speech

Original speech “Speech” without vocal tract

Vocal folds imaged at 2 kHz

(4)

Effects of maxillofacial surgery on speech

(5)

Correction of malocclusions (1)

Severe malocclusions can be treated by a combination of orthodontic and surgical therapy, i.e., byorthognathic surgery.

(6)

Correction of malocclusions (2)

The position of the maxilla and the mandible is changed surgically in relation to the skull base.

Orthodontic therapy is always an essential part of the treatment of these patients.

(7)

Correction of malocclusions (3)

Patients requiring only mandibular surgery are recruited in our validation study.

Cephalometric analyses for the advancement of the mandible.

(8)

Mathematical modelling of speech acoustics

(9)

DICO – a model for vowel production based on MRI

DICO -vowel production model in nutshell:

The mathematical modellingof speech based on vocal tract 3D anatomy, obtained by Magnetic Resonance Imaging.

Numerical simulationof speech sounds based on MRI.

Fine-tuning and validation is based on comparison between simulated speech and speech recorded during the MRI examination.

Model experiments using 3D prints of vocal tracts.

The total error in modelling and data acquisition must be significantly smaller than the effect of surgery on speech.

(10)

DICO (2)

In addition to the vocal tract, we need to model vocal cords, lower airways, and even the exterior space so as to simulate vowels.

Resonators Terminations Couplings

Vocal folds Glottal Flow

Tuning to isospectrality

Vocal tract resonator by the Wave Equation

MRI and post-processing - surface models - area functions - centrelines Subglottal Webster resonator

Vocal tract Webster resonator Perturbation

velocity at glottis

Exterior acoustics Lung impedance

Mouth impedance Constricted

ow

Aero- dynamic force

Subglottal pressure Low-order

mass-spring system with - tissue losses, - Hertz impact model for vocal folds collisions

Bernoulli ow with - Hagen-Poiseulle viscous pressure loss, - a term for turbulence losses

Glottal pulse generator

k

Counter pressure

3D acoustic simulator Extension to control surface

DICO is an interconnected network of mechanical, flow-mechanical and acoustic partial models.

(11)

DICO (3)

Simulated glottal opening areas and sound pressuresat larynx during production of the Finnish vowels [A] and [i].

0 0.005 0.01 0.015 0.02

0 5x 10⁻⁴ U (m3/s)

0 0.005 0.01 0.015 0.02

0 5x 10⁻⁶ Ag (m3)

0 0.005 0.01 0.015 0.02

−4000 0 4000 psp (Pa)

0 0.005 0.01 0.015 0.02

−4000 0 4000 pc (Pa)

0 0.005 0.01 0.015 0.02

−1000 0 1000

psb (Pa)

Time (s)

0 0.005 0.01 0.015 0.02

0 5x 10⁻⁴

U (m3/s)

0 0.005 0.01 0.015 0.02

0 1x 10⁻⁵

Ag (m3)

0 0.005 0.01 0.015 0.02

−2000 0 2000 psp (Pa)

0 0.005 0.01 0.015 0.02

−2000 0 2000 pc (Pa)

0 0.005 0.01 0.015 0.02

−2000 0 2000 psb (Pa)

Time (s)

[A] [i]

(12)

Acquisition of patient data

(13)

Recording speech and MRI (1)

Modelling speech andvalidationof the model requires simultaneous recording of speech and 3D MRI from test subjects.

Metal or electronics cannot be taken inside the MRI scanner.

Some kind of stealth technology is needed!

(14)

Recording speech and MRI (2)

Speech and the noise sample from MRI scanner are transmitted in “garden hose” the first 3 meters.

Two-channel sound collector and thewaveguides.

The sound collector fits on the head and neck coils of Siemens Avanto 1.5T MRI scanner. No moving parts, immune to vibrations.

(15)

Recording speech and MRI (3)

The waveguides lead to a microphone assembly inside a sound-proof Faraday cage, beside the MRI scanner.

(16)

Recording speech and MRI (4)

The signals are transmitted using shielded cables from microphones to a custom-made pre-amplifier and analogue sound processing unit.

Signals are digitised by M-Audio Delta AD-converter.

All electronics and the computers are in a movable rack so as to speed up its installation in MRI laboratory.

The whole experimental arrangement is (almost fully) automatised for improved efficiency.

(17)

Post-processing of MRI and sound signals

(18)

From pixels to surface models...

Custom 3D image processing software produces (almost) automatically surface models from the (somewhat) blurry pixel data of the MRI scanner.

The air-tissue interface is separated from other anatomic structures in surface models.

To solve acoustic equations, the surface models need be further processed to Finite Element Meshes.

(19)

... the atlas of Finnish vowels...

a e i o

u y ä ö

(20)

...and even to animations.

[Let us show them on full screen]

(21)

Acoustic measurements from vocal tract

models produced by fast prototyping

(22)

Physical models by 3D printing (1)

(23)

Physical models by 3D printing (2)

By carrying out laboratory measurements using 3D prints, it is possible to obtain independent comparison data for, e.g., model validation.

(24)

Accuracy of modelling and measurements

of speech production at current stage

(25)

Accuracy (1)

Vowel spectrograms ([A], [i], and [u]) of one test subject measured fromactual speech during MRIand from 3D prints by “sweeping”.

100 500 1000 2000 4000

Frequency (Hz) -120

-100 -80 -60 -40 -20 0

Magnitude (dB)

a

100 500 1000 2000 4000

Frequency (Hz) -120

-100 -80 -60 -40 -20 0

Magnitude (dB)

i

100 5001000 2000 4000

Frequency (Hz) -120

-100 -80 -60 -40 -20 0

Magnitude (dB)

u

Vertical dashed lines are resonances computed from MRI numerically.

(26)

Accuracy (2)

This is so nice that we better magnify it!

100 500 1000 2000 4000

Frequency (Hz) -120

-100 -80 -60 -40 -20 0

Magnitude (dB)

i

(27)

Environment acoustics

(an epilogue)

(28)

The effect of the surrounding space (1)

The single most significant remainingacousticsource of error is the effect of the surrounding space. This iswork in progress.

The environment may be, e.g., open space or the MRI scanner coil.

(29)

The effect of the surrounding space (2)

Resonant sound pressure fields of [a], [i], [u] in a constrained environment.

(30)

Members of the COMSPEECH Research Consortium

1 Institute of Behavioural Sciences, University of Helsinki University of Alberta; Edmonton

Department of Mathematics and Systems Analysis, Aalto University, Helsinki Institute of Dentistry, University of Turku and Turku University Hospital