• Ei tuloksia

Probabilistic Modeling

2.3 WLAN Positioning

2.3.3 Probabilistic Modeling

A further sophistication of location fingerprinting is the probabilistic modeling of the signal space. Initially this probabilistic model was described through his-tograms or Gaussian kernels [RMT+02], but later the Gaussian density function description became a more popular technique [HFL+04, YA05]. In practice this approach consists of storing not only the average signal strength value, but the estimated standard deviation as well. This description allows determining the probability of a newly measured value sm through

P(sm|L) = 1

√2πσ exp−(µ−sm)2

2 , (2.3)

where µ and σ correspond to the average and standard deviation of the signal in location L, respectively. Though this Gaussian description to some extent is in conflict with the study in [KK04] that measurements tend to have non-Gaussian skew, it was found early on that a non-Gaussian assumption helps smooth out temporal variations and missing values from the measurements [HFL+04].

This description also has the benefit of decreasing the complexity of the proba-bility model [HFL+04], which can have a significant impact on not only storage capacity, but also processing speeds on CPU and energy constrained mobile de-vices – a typical end client in WLAN positioning systems.

Given this model of the signal for location L, the position estimate is then typically modelled using Bayesian inference [RMT+02]:

P(L|sm) = P(sm|L)P(L)

P(sm) . (2.4)

HereP(sm|L) corresponds to the likelihood of the signal given the location – the fundamental component of a probabilistic positioning system – i.e. Equation 2.3 in the Gaussian approach. Though not a strict requirement in approaches satis-fied with the location with maximum likelihood, this quantity is usually normal-ized by the likelihood of the data, in this context the measured signal strength

sample, P(sm), to provide a probability distribution over locations. A typical measure of this likelihood is Pm

i=1P(sm|Li), or the sum of the likelihood of the same signal over all other location candidates.

For the sake of simplicity, the probability P(L|sm) is often treated (in a na¨ıve Bayes sense) as being independent for each access point, meaning it can be solved for each location through the product of the individual probabilities. Arguably, this independence assumption is violated when the user moves [RMT+02], but in practice this independence assumption has served the task well.

A maximum likelihood approach could at this stage provide a position esti-mate as the location with the highest likelihood (or probability, if normalized).

Usually, however, added context can be ascribed by taking into account the prior probability of the location P(L). A uniform prior would essentially provide no information, but the distribution could also be initialized based on personal be-havior [CCKM01]. In other words, the system could be primed to locate users based on their specific day-to-day movement patterns. Another alternative is to use the prior for Bayesian inference by using the previous location probability to inform the next, in a hidden markov model (HMM) sense. This has the effect of improving the tracking accuracy, i.e. updating the location as the user moves.

The increase in accuracy through the use of probabilistic modeling was quickly apparent across contemporary publications. [RMT+02] reached a median accu-racy of approximately 1.5 meters, whereas [YA05] – through additional improve-ments such as an autoregressive model of sequential signal strength samples and an access point clustering module – achieved a median accuracy of around 0.5 meters. In this work – as in many others – validation was performed in an office environment and not in complex everyday environments with less constraints pro-vided by the environment topology. In environments with more open spaces, such as supermarkets or malls, the correspondence between the defined environment topology and the spatial variability of the signal might no longer apply.

To enable a probabilistic model typically involves either using human context labels like rooms or discretizing the environment in a uniform way in order to define the location primitive over which to calculate summary statistics. This is traditionally a manual effort that only scales well for uniform discretization, which in turn has to contend with each location inevitably having different prop-agation characteristics. Two potential causes of uncertainty are at risk of devel-oping at this stage. First, if a location with multiple modes in signal space is modelled as one cohesive region, the variance of the signal model will increase. In terms of a Gaussian fit, this will directly translate to greater uncertainty about the location even if the measured signal matches any one of the modes in the

2.3 WLAN Positioning 19 distribution. This is illustrated in an example scenario in Figure 2.2. Though the received signal strength measurement (-60 dBm) falls exactly on the mode of the signal model in Location B, Location A will appear more likely simply be-cause its model is less uncertain. Using non-parametric probabilistic models like histograms could alleviate such issues, but to the detriment of model complexity, storage and computational efficiency.

Figure 2.2: Impact of uncertainty in the probabilistic model. A greater variance could make even exact matches appear less likely than competing hypotheses.

Second, if a location with only one mode is partitioned into two or more regions, these locations will appear equally likely in terms of the signal model.

Though the position estimate might still be constrained by the union of these regions, the robustness of the estimate will suffer because even a slight change in the measurement might make one region appear more likely than the other.

To provide a serviceable level of accuracy typically tens of measurements per location are required to ensure the probability model describes the signal in a statistically sound way. To maintain such a system further requires that the model is updated whenever significant changes to the environment occurs.

These measurements tend to require manual effort to obtain, especially during the labeling process, which can quickly become untenable for large spaces. These and the previously described limitations of the empirical approach to WLAN positioning are the focus of our contributions described in Chapter 3.