4.3 Analysis methods
4.3.3 Sleep scoring
The following algorithm introduced by Gurevicius and coworkers  was used in the analysis of vigilance states of APP/PS1 mice. Movement state was assigned to parts of signal with electromyography (EMG) amplitude above the mean. Increase in EEG alpha/gamma ratio (10 – 16 Hz / 30 – 80 Hz) above the mean was assigned to non-REM sleep. If between two non-non-REM epochs this ratio stayed above 25% of the smallest value, the epochs were merged. REM sleep was assigned to parts of the signal with the theta/delta ratio (7 – 9 Hz / 1 – 4 Hz) two times of standard deviation (SD) above the mean. These epochs were merged if this ratio between them remained above the mean. Also, 4 s before this REM epoch had to be already assigned as non-REM sleep, otherwise this and all the other parts of the signal were considered as quite wakefulness.
Another algorithm performed on C57BL/6 mice by Brankačk and his coworkers 
considered cutting of signal with unspecified high EMG amplitude as movement, and when combined with EEG of low amplitude it was judged as quiet waking and with regular theta waves as active waking. Non-REM sleep was assigned to parts with low EMG activity with high delta (1 – 4 Hz) wave intensity, while REM sleep was assigned to parts with simlarly low EMG activity, but with regular theta (4.5 – 8.5 Hz).
23 4.3.4 Entropy
Entropy algorithms are based on nonlinear system theory and commonly used for monitoring the depth of anesthesia as they evaluate the randomness of the signal. For our analysis I calculated a trio of wavelet entropies and approximate entropy.
Wavelet energy is obtained by firstly calculating the relative wavelet energy at each scale 𝒑𝒋. Then, Shannon entropy (𝑺𝑾𝑬), Renyi Entropy (𝑹𝑾𝑬) and Tsallis entropy (𝑻𝑾𝑬) is computed using the following forms (17), where 𝒒 is a non-extensity parameter and 𝒂 is a scale of possible entropy measure .
Approximate entropy was used to describe the unpredictability of the signal. Its definition for a time scaled signal 𝒙(𝒊) of length 𝑵 is in the following equations (18).
Where 𝑪𝒊𝒎(𝒓) represents the probability that any vector 𝒙(𝒊) is within distance (tolerance) 𝒓 of vector 𝒙(𝒋). And finally, 𝜽 is the Heaviside function, 𝒎 is embedding dimension and 𝑨𝒑𝑬𝒏 is the approximate entropy .
𝑺𝑾𝑬 = − ∑ 𝒑𝒋𝐥𝐨𝐠 𝒑𝒋
(17) 𝑻𝑾𝑬 = 𝟏
𝒒 − 𝟏∑ [𝒑𝒋− (𝒑𝒋)𝒒]
𝑹𝑾𝑬 = 𝟏
𝟏 − 𝒂𝐥𝐨𝐠 [∑ (𝒑𝒋)𝒂
24 4.3.5 Statistics
The goal of this study is to detect the phenomena within the whole signal. The final qualitative evaluation of the algorithm is, therefore, explained in the sensitivity and the precision of detection of previously manually annotated events.
The sensitivity also called true positive rate (𝑻𝑷𝑹) is defined as a number of correctly marked events (true positive (𝑻𝑷)) divided by the number of total events to be positively annotated (positive (𝑷)). The precision also described as positive predictive value (𝑷𝑷𝑽) is defined as a number of correctly marked events (𝑻𝑷) divided by the all events marked as positive by the algorithm (sum of 𝑻𝑷 and false positive (𝑭𝑷)) (Fawcett 2006).
𝑷 × 𝟏𝟎𝟎%
(19) 𝑷𝑷𝑽 = 𝑻𝑷
𝑻𝑷 + 𝑭𝑷 × 100%
Standard deviation (SD,𝝈) used in this study was calculated classically using the following equation. It represents how the variables (𝒙𝒊) are spread within the observed dataset with 𝑁 observations.
𝑪𝒊𝒎(𝒓) = 𝟏
𝑵 − 𝒎 + 𝟏 ∑ 𝜽(𝒅𝒊𝒋𝒎− 𝒓)
(18) 𝒅𝒊𝒋𝒎 = 𝒅[𝑿𝒊𝒎, 𝑿𝒋𝒎] = 𝒎𝒂𝒙(|𝒙(𝒊 + 𝒌) − 𝒙(𝒋 − 𝒌)|)
𝜽𝒎(𝒓) = (𝑵 − 𝒎 + 𝟏)−𝟏∑𝑵−𝒎+𝟏𝒊=𝟏 𝐥𝐧 𝑪𝒊𝒎(r)
𝜽𝒎+𝟏(𝒓) = (𝑵 − 𝒎)−𝟏 ∑ 𝐥𝐧 𝑪𝒊𝒎+𝟏
𝑨𝒑𝑬𝒏(𝒎, 𝒓, 𝑵) = 𝜽𝒎(𝒓) − 𝜽𝒎+𝟏(𝒓)
25 𝝈 = √ 𝟏
𝑵 − 𝟏∑(𝒙𝒊− 𝝁)𝟐
Where 𝝁 is mean of 𝒙 .
Box plots are graphical representation of observation distribution within a dataset.
They consist of five components, which are shown in Figure 6. Upper and lower fourth represent 75 % and 25 % quartile, respectively. Upper and lower extremes lie at certain multiple of the inter-fourth range from the median. In this study it was 2.7. If in the dataset consists only of observations that do not exceed these extreme values, then the extreme values of box plot equal to observation extremes. However, if there are observations exceeding these values, they are considered to be outliers .
Figure 6. Boxplot representation with graphic element labels on the left and corresponding statistical terms on the right. Picture taken from .
5.1 Defining the gold standard
First of all, we had to define a unified file format and identification that was decided to be 1 file for one 3 h session of recording of one mouse. All files had 8 channels. The first two were recordings from the left and right screw electrode. The 8th was an EMG signal and the others were channels used for different research projects and were not utilized in this study.
From a database of 230 3-h recordings I randomly chose 9 files to be annotated. Three from each age group. First group of youngest animals were around 3 months of age.
Second group of around 4 – 5 month of age and the last group of old mice with the average age during the recording of around 14 – 15 months.
I created a graphical user interference (GUI) EventScreene.m in Matlab R2017b to grant manual rejection or approval of candidate events. The main screen of this program is shown in Figure 7 and contained the following numbered objects: 1. Trace displaying raw EEG signal with highlighted candidate SWD event using red color. 2.
Trace with EMG signal to support decision making. 3. Trace with filtered EEG by bandpass 6 – 45 Hz, as another way to support decision making by subtracting drift and possible high-frequency noise, including 50 Hz power-line interference. 4.
Graphical presentation of the channels the application is currently working with. 5.
Three checkboxes representing the decision made by the expert. 6. Button to shift all the traces 1 s backwards to allow observation of distant event surrounding. 7. Button to return to the previous candidate event. 8. Button to move to the next candidate event.
9. Button with same function as button 6, but it moves all the traces 1 s forward. 10.
Editable box displaying the current index of the evaluated event together with the total number of candidate events. 11. Button for choosing the file to analyze showing also the file name. 12. Button to save the results of the manual annotation.
Figure 7. Screenshot of GUI EventScreener.m designed for the screening of the candidate events.
Before the annotation performed independently by two persons, an agreement about minimal requirements for the candidate event to be judged as an SWD took a place.
Until this point, the minimum duration for an SWD in the rat EEG was defined as 1 s.
However, implementing such a threshold for mouse EEG would have excluded a major part of the SWDs since their average duration in mice according to Jin and coworkers  is 0.5 – 3 seconds. This led to the criterion of at least 4 whole spikes with 3 waves between them for each SWD event. Sitnikova and colleagues  stated that mean frequency of an SWD is 9.14 ± 0.5 Hz, which is supported also by Richard and others , who studied mouse models of human absence epilepsy. Three waves are therefore approximately 330 ms long. Another requirement was more related to the emphasis of the difference between an SWD and a sleep spindle. Whereas a sleep spindle is symmetric around the 0 C line, the SWD was required to have larger negative spikes than positive waves in absolute values.
Candidate events for the human evaluation were picked from the band-pass filtered signal 6 – 45 Hz using a 5th order Butterworth filter. The candidates were detected by grouping the neighboring negative peaks exceeding 2.6 SD below the mean. These events were then used as an input to previously described GUI EventScreener.m.
Table 2. Matches between two following annotation procedures of one expert and the comparison between the evaluation of both experts.
Mouse No. Intra-rater 1 [%]
Match between the experts [%]
1 89.32 93.20
2 95.44 90.87
3 93.36 88.27
4 82.36 81.51
5 81.62 82.59
6 91.54 83.46
7 92.63 82.11
8 100.00 98.55
9 94.70 91.39
Average 91.22 88.00
One expert annotated each signal two times and only the intersection was then compared with the annotation of another expert creating interpersonal match. This was considered the final result of manual annotation. Percentual matches are shown in Table 2. It is visible that the overall match is not excellent, which proofs the lack of actual definition of SWD. Moreover, an intra-rater comparison reveals a significant mismatch suggesting that even one expert can change her/his opinion on the judgement if the time between the individual annotations is one day or longer. Problems seemed to be a decrease in spike amplitude interfered by low frequency drift artefact or
irregularity of the wave. In general, as expected, the uncertainty of detection increases as the features of an event move further away from the representative example.
5.2 Extracted parameters
Table 3. All parameters extracted from the 4 wave long cuts of EEG candidates. The exact definition of the variables follows in the text.
Time scale Frequency scale
Linearity of the
of wave to spike Spectral profile Mean envelope of relative EEG
The novel idea was to describe aa SWD event by defining its graphical patterns. From previous observations of these phenomena is known that its main spectral components are within the range of 7 – 12 Hz [2, 33]. Therefore, only the neighboring spikes at a distance of 0.08 – 0.14 s were considered. This temporal condition represents a
frequency range of 7.1 – 12.5 Hz. The locations of the spikes were extracted from band-pass (BP) filtered signal 6 – 45 Hz by a 5th order Butterworth filter avoiding low frequency drift and 50 Hz noise. For setting of the amplitude threshold the filtered signal was split into 10 parts of equivalent length for which 2.6 times SD below the mean of the particular part of the signal was calculated. A median of 10 values was calculated and used as the final threshold. That corresponds to approx. 130 – 210 µV below the zero-line depending on the signal nature. Detected spikes were grouped into individual events by setting the distance between them according to the above mentioned temporal condition. This prevented an unwanted detection of spiky phenomena that have not developed longer positive wave part. Groups of fewer than 4 spikes (3 waves) were omitted. The resulting groups represented first SWD candidates, and the following parts of the algorithm were performed on them. All parameters used for analyzing are listed in Table 3.
5.2.1 Time scale parameters
Next, I developed a method to describe the linearity and gradient of ascending and descending parts of the spike. For this purpose, I found helpful to compare them to a linear line. As shown in Figure 8a, linear lines rise from local extreme representing the bottom of the spike. From each sample of this line I calculated the distance to the spike and at the end expressed how big part of line was closer than two samples from the spike in percentage. Moreover, the shift of the line to negative direction had to be implemented (Figure 8b), because of filtering that unsharpened all the extremes including spikes and therefore rounding can be observed. Also, for this reason, the first 4 samples of a spike were not considered. The final variables that were chosen were steepness (first derivative) and match.
Figure 8 a) The blue curve is one detected wave, of which parameters are investigated. Colored linear lines of different steepness are compared to ascending part of the spike. b) The detected
wave and linear lines of same steepness but shifted to compensate filtering activity.
For each spike I calculated its relative amplitude. Amplitude ratio between preceding/following wave and a spike was computed because it was expected that the wave amplitude is smaller than the absolute value of spike amplitude. Also, the width of the wave and the spike was compared at zero and at half spike amplitude. Another important parameter to express randomness of the signal was approximate entropy (4.3.4).
5.2.2 Frequency scale parameters
The first set of the frequency scaled parameters was calculated on a BP filtered signal using the 5th order Butterworth filter between 5-85 Hz with a notch filter on 50 Hz. As the judgement during the manual annotation was that an event must have at least 3
SWD waves, each candidate event was split into 3 wave long (4 spike long) cuts.
Therefore, the following parameters were extracted for a group of adjacent 3 waves.
To express what is the overall intensity of EEG, EEG (in theta range) and EMG, the average peak envelope was calculated.
Additional frequency scale parameters were extracted from filtered signal by CWT using a morlet wavelet and a range between 5-85 Hz. Furthermore, this spectrum was binned into 80 bins with 40% overlap. This was performed at each 3 wave candidate EEG cut. To involve background (BG) into investigation, 1000 (~0.5 s) sample long EEG cut 0.25 s before each event was considered. If this background sample overlapped with another SWD candidate, 1000 samples before that candidate were considered and so on. From this BG sample, I estimated the CWT spectrum. This CWT spectrum of an event was used for calculation of SWE, TWE, RWE. Moreover, it was used for calculation of spectral profile of an event and its preceding background using formula (21), where 𝑵 is the number of spectral bins and 𝑿𝒊 is one relative spectral bin component.
For calculation of the harmonicity of candidates, I searched for each 3 waves cut a maximum spectral intensity within the range of 6 – 12 Hz. This was considered to be the fundamental frequency. For searching of the harmonics, autocorrelation of spectrum was performed with the following search of peaks. Relative spectral intensities in frequencies representing the first 5 harmonic components were extracted.
Moreover, a minimum intensity from the range 12 – 15 Hz was extracted as this range should present only sleep spindles.
5.2.3 Characterizing three waves candidate event cuts
The agreement during manual annotation was that an event must have at least three SWD waves. However, most candidate events were longer than that and this
annotation provides only information that within the event there are in unspecified locations at least three SWD waves. For this reason, each candidate event was represented by three waves that had the highest harmonic energy (Sum of relative harmonic energy) and had all waves positive. The distributions for all variables were manually checked and according to them it was decided which range of values is characteristic for SWDs. Each iteration of this step excluded variable values outside of the range. The whole process of manual distribution evaluation is graphically presented in following set of figures Figure 9 - Figure 16. The variables that were not utilized in this step did not show significant difference when including the outliers in distribution between SWDs and other candidates.
Figure 9. Distribution of average ratio of wave to spike width at zero line. Class 1 states annotated SWDs and class 0 other
Average ratio of wave to spike width
at zero line
The width of the wave is larger than the width of the spike. This is one of the graphical time space definitions.
Figure 10. Distribution of Event Spectrum profile. Class 1 states annotated SWDs and class 0 other candidates.
Event Spectrum profile
The profile spectrum represents overall EEG energy of the events. The spectrum profile is the only non-relative variable; however, excluding candidates with very low values should not present a problem with another datasets.
Figure 11. Distribution of average ratio of wave to spike width at half spike amplitude. Class 1 states annotated SWDs and
class 0 other candidates.
Average ratio of wave to spike width at half
The width of the wave at half spike amplitude is larger than the width of the spike.
Together with zero width variable it describes the graphical pattern of the phenomena.
Figure 12. Distribution of average ratio of wave to spike amplitude. Class 1 states annotated SWDs and class 0 other
Average ratio of wave to spike
The average wave-to-spike amplitude describes the graphical pattern of a spike being more negative than surrounding waves. This can be taken as another general rule of SWD classification.
Figure 13. Distribution of SD of Spike to Spike duration in samples. Class 1 states annotated SWDs and class 0 other
SD of Spike to Spike duration in samples
< 70 (~ 34 ms)
SD of spike-to-spike duration has to be within reasonable range. For instance, 34 ms represents a quite high number as spike-to-spike duration is between 80-120 ms. It is
also important to include some less regular examples and also not too small SD is actually vital, as SWD tends to decrease in frequency over time within one event.
Figure 14. Distribution of relative harmonic intensity as sum of 2.-5. harmonic. Class 1 states annotated SWDs and class 0
Relative harmonic intensity (2.-5. harmonic)
The relative harmonic intensity is as expected raised compared to other candidates, which was also the observation of other studies [39, 44].
Figure 15. Distribution of approximate entropy. Class 1 states annotated SWDs and class 0 other candidates.
The approximate entropy of SWD events is low which is due its regularity.
Development of SWD is temporarily reducing the randomness feature of EEG and can be used for its detection and description.
Figure 16. Distribution of background spectrum profile. Class 1 states annotated SWDs and class 0 other candidates.
Background spectrum profile
The preceding BG profile spectrum represents overall EEG energy preceding an SWD.
The spectrum profile is the only non-relative variable; however, excluding candidates with very high value should not present a problem with another dataset. These high values most likely represent deep sleep, where SWD should not be present.
At the beginning of this variable distribution evaluation there were 5844 candidates out of which 336 were confirmed as SWDs. At the end all SWDs were still present, but out of only 1875 candidates. Even though this step significantly reduced the number of candidates, it is not by any means a final classification tool. This simple distribution comparison failed as classification method using the above mentioned variables.
It was expected that sleep scoring algorithms would help to exclude a big part of candidates within the REM sleep, non-REM sleep and mobility. However, both tested algorithms could not separate a stage without any annotated SWDs in it. Therefore, the conclusion was that they do not provide sufficient time resolution. For the final classification RF method is introduced in the next chapter.
5.3 Random forest (RF) and its proximity matrix
For the training of the RF of 1000 trees, I randomly chose 75% of the data as teaching and the remaining 25% as validation dataset. Firstly, for training of the classification trees, I used all the variables and predictor importance using random oob permutation was calculated. Followingly, were classification trees retrained but using only variables with importance higher than 0.5. A validation dataset was tested and resulted with highly sub-ideal sensitivity of 30.4 % and precision 79.7 %. The variables scoring in variable importance higher than 0.5 are listed below:
o Relative harmonic intensity (2.-5. harmonic) o Event spectral profile
o SD wave to spike amplitude ratio o Approximate Entropy
o Average spike width at half-spike amplitude o Average ratio of wave to spike width at zero line o Average relative spike amplitude
o Average of ratio of:
Wave to spike width at half spike / Wave to spike zero width
40 o SD of ratio:
Wave to spike width at half spike / Wave to spike zero width
However, a look at the proximity matrix of the classification revealed an interesting distribution, where most of the annotated SWDs had very near proximity. Inspired by this, I created a new variable called proximity ratio. Proximity ratio can be expressed for each observation as an average proximity to 100 closest annotated SWDs divided by the average proximity to 500 closest annotated non-SWD candidates.
It can be seen from Figure 17 that annotated SWDs of the teaching dataset are not overlapping with the distribution of other candidates if outliers are excluded. The distribution of the validation dataset in Figure 18 is not that clear and SWDs are significantly overlapping with the other candidates. Both validation and teaching datasets have many outliers in non-SWD group.
Figure 17. Distribution of proximity ratio for teaching dataset. Class 1 states annotated SWDs and class 0 other candidates. Bottom distribution is zoomed from upper one.