Boolean OR of AND s detector combination - Efficient and Robust Methods for Audio and Video Sig

problem, thus iterative algorithms based on different heuristics are utilized. Related, but slightly easier, is the problem of finding thresholds for continuous valued attributes, when the form of the Boolean function is predefined. Even in this case, the search space is unfeasibly large and gradient descent type algorithms likely fail due to the discontinuous nature of the error surface of a non-differentiable Boolean function.

Among the classification tree learning algorithms, the ID3 -algorithm has been devel-oped for utilizing nominal valued data, whereas the C4.5 and CART algorithms have the capability of handling continuous valued attributes also. Algorithms finding an input space partition which perfectly classifies the training data are e.g.A^q (Michalski (1983)) and LAD (Hammer (1986)) -based algorithms and the OCAT-RA1 -algorithm (Deshpande and Triantaphyllou (1998)). TheA^qand LAD-based algorithms iteratively find conjunctions of the form V

idi(xi) to be combined disjunctively, for building a Boolean function in disjunctive normal form (DNF)B=W

idij(xij). The OCAT-RA1 algorithms iteratively finds disjunctionsW

idi(xi)of a CNF (conjunctive normal form) Boolean functionB =V

idij(xij).

Algorithms specifically designed for combining threshold tunable detectors and utilized as reference algorithms in Publication VI are the Boolean Algebra of ROC (receiver operating characteristic) curves, proposed in Oxley et al. (2007), and iterative exhaustive search algorithm, proposed in Tao and Veldhuis (2009). These algorithms aim at finding thresholds for available tunable detectorsd_i(x_i;θ_i), i= 1...I when combined within a predefined Boolean function, which in the original setting is either disjunctiveB_I = WI

i=1di(xi;θi)or conjunctiveBI = VI

i=1di(xi;θi). At each iteration iof the Boolean algebra of ROC curves -algorithm, thresholds for the new combinationBiis selected based on the true positive and false positive rates (tprandfpr) of the ROC curves ofBi−1

anddi. The method is computationally very efficient, but the drawback is its implicit assumption that all the component detectors are conditionally independent of each other. The iterative exhaustive search -algorithm similarly, using a predefined Boolean operator, iteratively adds new component detectors intoB. For combiningd_i+1intoB_i an exhaustive search over combinations of all the possible operating points on the ROC curves ofB_iandd_i+1is performed.

In the Publication VI a BOATS algorithm has been proposed for finding thresholds for a DNF Boolean function. The proposed algorithm also tunes the predefined form of the function for best performance. The algorithm is shortly described in the next subsection.

For evaluation of the BOATS algorithm in Publication VI, the two above mentioned algorithms are implemented for comparison such that they facilitate learning of a DNF Boolean function.

4.4 Boolean

of

AND

s detector combination

In Publication VI I present a DNF BooleanORofANDs (BOA) detector combination and the BOATS algorithm to set its parameters. The BOA is built of unrestricted number of detectors on M different modeling functions fm(xm), m = 1...M for detecting a target phenomenon. Each function utilizes its own input feature setx_mand outputs a target likelihood scorel_m, that isf_m(x_m) =l_m, which represents target probability as its monotone transformation. The target likelihood scorel_mis compared to a threshold

40 Chapter 4. Classifying independent samples

valueθm, which defines a detector d_m(x;θ_m) =

( true, iffm(x)≥θm

false, otherwise. (4.15)

This detector is sensitivity tunable, that is, by changing the value of thresholdθ, the detector operates at different operation points. Since the BOA combination is a Boolean function in disjunctive normal form, individual detectors are evaluated within conjunc-tionsof the BOA function, which are then joined together as a disjunction. That is, within a BOA combination, multipleconjunctions of the formV

m∈zdm(x;θm), where some detectorsdm(x;θm), m∈z⊆ {1, ..., M}are combined withANDoperators, are primarily utilized. The conjunctions are joined together disjunctively withORoperators to form the BOA combination as

where the detector function identifiersmofdmare given within listsz_q, q= 1...Qof M_q identifiers each, e.g. ifz₁= (1), z₂= (1,2) andz₃= (1,3),M₁ = 1, M₂ = 2 and M₃ = 2. The number of conjunctions over the same listz_q is denoted withN_q. Every conjunction, enumerated byq= 1...Qandn= 1...N_q, operates with a distinct set ofM_q thresholds{θ^q,n_m |i= 1...M_q, m=z_q(i)}. Thus the operating pointθof the BOA function (4.16) stands forPQ

q=1Nq·Mqthresholds, one for each detector instance within the BOA.

Ideally, a – possible infinite – BOA combination has capability of producing any mono-tonic decision boundary for the two classes, the target and the clutter class. The decision boundary of a finite BOA combination is monotonic and piecewise linear in the space (l1, l2, ..., lM)of target class likelihoods. An example of a BOA decision boundary is

shown in Figure 4.5.

The BOA combination is sensitivity tunable, as values of the thresholdsθ_m^q,n, q= 1...Q, n= 1...Nq, m=zq(i), i=1...Mq in a setθmay be changed to vary the detection behaviour of the BOA combination. Below, an algorithm is presented to find setsθαof thresh-old values for a BOA combination to account for different sensitivity levelsα. That is, the training provides the user with one sensitivity parameterαsuch that the BOA combinationB(x;θα)may be denoted asB(x;α).

For the cascade interpretation of the BOA function, which will be explained in Section 5.3, the negation¬B(x;θ)of the BOA function in disjunctive normal form is needed. The

¬B(x;θ)in the disjunctive normal form hasK=QQ

q=1Mq^N^q conjunctions, each of which containsPQ

q=1Nqdetectors. That is, for each conjunction of¬B(x;θ)in the disjunctive normal form, one detector is picked from each conjunction(q, n)of the corresponding BOAB(x;θ). Using a detector indexing function

I(k, q, n) =

¬B(x;θ)in disjunctive normal form is given by

¬B(x;θ) =

4.4. BooleanORofANDs detector combination 41

Figure 4.5: Example of data partition with a BOA detector. Data is represented in terms of two scores l1 and l2, which represent the likelihood of target class. Red crosses denote data samples from the target class, and samples representing the non-target class are shown with blue dots. The angular decision boundary of BOA combinationB(x;θ) = d1(x1;θ₁^1,1)∨ W3

n=1

d1(x1;θ^2,n₁ )∧d2(x2;θ^2,n₂ )

is shown with the bold line. Each individual detector threshold θ^q,nm , m= 1,2, q= 1,2, n= 1,2,3is illustrated with a thin line. The detector score space where B(x;θ) =trueis colored with red background, and the space where¬B(x;θ)) =trueis colored with blue background. The pale red and pale blue backgrounds illustrate the subspaces, where the decision is done based onl1, using the model functionf1only.

Possibly more comprehensible notation for the¬B(x;θ)in the disjunctive normal form is given by using multipleOR-operators, each of them referring to one conjunction(q, n) ofBas

-operators , i.e.M₁^N¹conjunctions

-operators, i.e.M₂^N²conjunctions

· · ·

-operators, i.e.M_Q^NQconjunctions



Now the indicesiq,n, which enumerate the disjunctions of¬B(x;θ)directly give the detector identifier indexiq,nfrom a listzq.

An algorithm to train a BOA combination

In Publication VI I have presented an algorithm, which finds sets of thresholds to be used within any Boolean combination of BOA form. The algorithm finds unique sets

42 Chapter 4. Classifying independent samples θα = { θ_m^q,n | q= 1...Q, n= 1...Nq, m=zq(i), i= 1...Mq} of thresholds for a set of sensitivity valuesα∈[0...1]of BOA. The trained BOA combination may now be referred to asB(x, α)≡B(x,θ_α). The details of the algorithm can be found in the Publication VI, but here I give a brief description of it.

The algorithm starts by setting all the BOA thresholds inθof (4.16) to infinity, to account for the settingB(x,0)whereα= 0, and the BOA function does notacceptany samples, that isB(x,0) = false ∀x. Then the thresholds are mitigated such that only one (if possible) training sample which represents the target class will be accepted, and as few false positives as possible will be accepted along. The newly obtained set of thresholds θαis saved to account forα=t/T, wheret, is the number of accepted samples out of T training samples representing the target category. The algorithm continues dimin-ishing the thresholds of the BOA step-wise, such that at every iteration exactly one, if possible, new training sample from target category will be accepted by the BOA, until the thresholds are at a level accepting all the target category samples of the training set.

At each iteration, the found setθαof BOA thresholds is saved to account for the BOA operating pointα=t/T, wheretis the number of accepted training samples from the target category at that operating point. The training algorithm thus produces parameters for a BOA to form a continuumα∈[0,1]0,1of operating points which account for true positive rates fromtpr= 0totpr= 1on the training data.

5 Sequential classification

Sequential decision making means that the decision making process is step-wise. The idea is that after each step of the process there are multiple possible ways to continue, and the operation done at next processing step is defined by the result of an operation at the current step. A sequential decision making process may be illustrated as a flowchart similar to a decision tree. The blocks of the flowchart, corresponding the nodes of a decision tree, consist of functions, which make the decision about the next step or provide the output, i.e. final conclusion, which is denoted as a leaf in terms of a decision tree.

However, while the classification trees commonly use univariate functions at each node, within the sequential decision making process any kind of decision making functions may be used.

The computational advantage of sequential classification strategy over the arithmetic classifier ensembles is, that all the features (attributes) might not always be needed for the classification. That is, for many input samples the sequential classification process may appear such that the classification is done by evaluating only few decision making functions based on only few attributes. The evaluation of the rest of the decision making functions in the system is thus avoided, and computation of some features (attributes) may be omitted.

Boolean classifier combinations utilize the Boolean operators OR (∨), AND (∧), and

NOT(¬) for a combination function of multiple binary classifiers, i.e. detectorsd(x).

Every Boolean function may be expressed as a binary decision tree, whose step-wise evaluation possibly ends up with the classification before all the component detectors of the Boolean function have been evaluated. For example, with a Boolean combination B = d₁(x)∨d₂(x)of two binary classifiersd₁(x)andd₂(x), algorithmically the result B =truewill be released as soon as the detectord₁(x)outputstrue. Similarly, the Boolean combinationB= d₁(x)∧d₂(x)is definite without evaluatingd₂(x), ifd₁(x) =false.

In the following, the experiments and results of Publication IV on sequential classification process using Boolean combinations are discussed in Section 5.1. Then, in Section 5.2, the principle of a classification cascade as a sequential classification strategy is explained and literature on detection cascades is addressed. The cascade classification process of a BOA combination is explained in Section 5.3, and in Section 5.4, the experiments and results of Publication VI using a BOA cascade on laughter vs speech classification task are discussed.

44 Chapter 5. Sequential classification

5.1 Experiments on Boolean combinations for sequential decision making

In the Publication IV I have demonstrated how a sequential classification process with Boolean combinations reduce computational load of classification and improve the clas-sification accuracy over the individual detectors. The sequential clasclas-sification process is experimented on lifelog video material of our CASA2 dataset for video context change detection task. It contains over 7 hours of video data from 23 different types of environ-ments. The videos have been shot with a small spy camera, and the stereo sound tracks are recorded by a pair of in-ear microphones. Video context change detection is a well studied problem having matured solutions. However, our experiments focused on exam-ining the computational efficacy of sequential processing using two signal modalities rather than improving the state-of-the art solutions for video context change detection.

In the work I have utilized three methods,f_A,f_V₁andf_V₂, to provide context change likelihood information from videos. The context change likelihoods are thresholded to obtain three sensitivity tunable detectorsdA(x;θ),dV1(x;θ)anddV2(x;θ), as in (4.15).

One or multiple instances of the three sensitivity tunable detectors at different operating pointsθare combined as a BOA combination. It should be noted that evaluating multiple detectorsdmbased on the same context change likelihood methodfm(x)within a BOA combination is virtually free. This is due to that the heavy part of a detector evaluation is the computation of the likelihoodfm(x) =lm, which is done when the first one of the detectorsdmis run, and may be then saved for late use.

One of the context change likelihood extraction methods operates on audio stream of the video, and two others operate on the image frames of the video. The methodf_Abased on the audio stream of the video is the fastest of the three, and it provides a context change likelihoodf_A(x_A) =l_Afor all the moments of the video in 0.01 times real time on a desktop pc. The second methodfV1, providing context change likelihoodfV1(xV) =lV1

for each video frame, utilizes RGB histograms of video image frames. It takes 29 ms to extract the RGB-histogram from one video frame, that is 0.43 times real time for the videos with 15 frames per second, on the used desktop pc. The heaviest one of the used methodsfV2, which is based on SIFT BoW features, provides a context change likelihood fV2(xV) =lV2for each video frame. It takes 12.3 s, that is 184 times real time, to compute the SIFT BoW features for one video frame on the same desktop pc. Detailed descriptions of the methods can be found in Publication IV. The operating points, defined by the thresholds, of all the detectors within each of the tested Boolean combination are found utilizing a preliminary version of the BOATS algorithm, which was introduced in Section 4.4.

The most important results of Publication IV for video context change detection are presented in Table 5.1. The results are presented in terms of the F₁ score and the computation time in respect to real time, both averaged over the used videos. The achieved best performance is reported with each of the three detectors individually, as well as with some Boolean combinations of them.

Comparing results with BooleanOR(∨) andAND(∧) combinations, we see that con-junctive combinations with∧are far more computationally efficient than disjunctive combinations with∨. With pairs(dA,dV1)and(dA,dV2)of detectors, the difference in classification speed with disjunctive and conjunctive combination is0.44/0.02 = 22-fold and184/0.3 = 613-fold, respectively. The difference in computational speed is due to that the class distribution is heavily unbalanced, the ’no context change’ -class being

5.2. Decision cascades for classification 45

In document Efficient and Robust Methods for Audio and Video Signal Analysis (sivua 52-58)