• Ei tuloksia

Sound Event Detection in Multisource

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Sound Event Detection in Multisource"

Copied!
17
0
0

Kokoteksti

(1)

Sound Event Detection in Multisource Environments Using Source Separation

Toni Heittola1, Annamaria Mesaros1, Tuomas Virtanen1, Antti Eronen2

1Tampere University of Technology, Department of Signal Processing

2Nokia Research Center Tampere 1st September, 2011

(2)

Sound event detection

• Aims at detecting acoustic events in an audio signal

• Predefined event classes = supervised classification

• Estimate the start and end time of each event

(3)

Environmental audio data

•Audio from everyday environments: street, office, grocery store, in a car, etc.

•Application areas of environmental sound event detection:

context-aware devices, automatic annotation of videos

(4)

Outline of the presentation

•Sound event detection and environmental audio data

•Monophonic detection system

•Sound source separation based polyphonic detection system

•Evaluation & demonstration

(5)

Monophonic event detection system

A. Mesaros, T. Heittola, A. Eronen, T. Virtanen. Acoustic event detection in real life recordings. In proc. EUSIPCO 2010.

•HMM classifier

•61 event classes: (e.g. speech, music, beep, car, car door, bird, dog barking, footsteps, keyboard, coughing…)

•Each class modeled with a 3-state HMM (16 Gaussians per state, MFCC features).

•Train model for each event class separately using audio segments that are annotated to include the event

(6)

•To model the whole signal, any event is allowed to follow any event

Monophonic event detection system

A. Mesaros, T. Heittola, A. Eronen, T. Virtanen. Acoustic event detection in real life recordings. In proc. EUSIPCO 2010.

(7)

Output of the monophonic system

•The output is a sequence of non-overlapping events

(8)

Non-negative spectrogram factorization based signal separation

•One-channel input signal is separated into multiple tracks

•NMF-based separation: magnitude spectrogram matrix represented as a of product of two non-negative matrices

•Represents the signal as a sum of components having fixed spectrum and time-varying gain

•Unsupervised separation: no prior knowledge about the sounds

(9)

Example of separated signals: kitchen

(10)

Example of separated signals:

basketball game

(11)

Polyphonic event detection system

•Separation used as a preprocessing step

•Monophonic recognizer applied on all the separated tracks separately -> events obtained from the tracks are combined

•Training: All the tracks are pooled to the training data of an events

(12)

Acoustic database

• Material for the database was gathered from ten contexts

• basketball game, beach, inside a bus, inside a car, hallway, office, restaurant, grocery store, street and stadium with track and field sports

• Each context is represented by 8 to 14 recordings, to a total of 103 recordings included in the database.

• In total ~19 hours of audio

• In total ~10.000 annotated events

(13)

Annotations

• Recordings were manually annotated indicating the start and end time of all clearly audible sound events

• Annotated sound events present in the recordings were grouped into 61 event classes

• Event classes include e.g. speech, laughter, applause, car door, road, dishes, door, chair, music, and footsteps

(14)

Demonstration

(15)

Evaluation metrics

• Detected events are regarded only at the block level, within 30 seconds

• Precision and recall is calculated inside the blocks, and combined into F-score

• Data divided into 70%

training / 30% testing sets, 5 folds

(16)

Event detection performance (average F-score %)

Monophonic Polyphonic

Overall 28.2 52.6

Context

Basketball 30.3 68.2

Beach 23.0 38.7

Bus 24.4 57.6

Car 18.8 46.7

Hallway 37.0 51.1

Office 30.1 49.7

Restaurant 25.4 54.2

Shop 27.7 56.2

Street 26.4 50.1

Track&Field 41.7 57.4

(17)

Conclusions

• NMF-based sound source separation can be used to do polyphonic event detection

• It improves significantly the performance of a monophonic event detection system

• It is possible to detect prominent sound events even in diverse real-world environments to some degree

Viittaukset

LIITTYVÄT TIEDOSTOT

The output event of both models (controller and plant) is connected to the input event of this module (event), and the output event of this module (changed) is connected to the

A part of event design is event sound, ambience, and background music, which receives special attention in this research due to it being the main focus of the research.. 2.1

Acoustic scene classification Sound event detection Audio tagging. Google Scholar hits for DCASE related

The sound event detection setup familiar from previous DCASE challenges deals with audio material containing target sound events and a reference annotation containing the labels,

DCASE 2017 Challenge consists of four tasks: acoustic scene clas- sification, detection of rare sound events, sound event detection in real-life audio, and large-scale weakly

When the number of sources is known and assump- tions about them can be made reliably, the resulting separated au- dio components correspond to the involved sound sources [7], and

Compared to previous work using sound source separation [12], the presented work increases significantly (52.6 % to 60.9 % in A30) the performance through using event priors and

Previous studies related to sound event detection consider audio scenes with overlapping events that are explicitly anno- tated, but the detection results are presented as a