• Ei tuloksia

Distance measures for classification of numerical features

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Distance measures for classification of numerical features"

Copied!
6
0
0

Kokoteksti

(1)

Distance measures for classification of numerical features

Georgy Minaev Robert Pich´ e Ari Visa Tampere University of Technology,

Finland June 14, 2018

1 List of functions

All tables show the distance functions. The formulad= 1−sis used to transform similarity to distance. Please readPas PN

i=1. N represents the vector length. Addition variables which are used in formulas are shown at the Comment column.

Table 1: Power (p,r) distances [1, chapter 17.2]

id Name d(P,Q) Comment Source

1. Power (p,r) Eu- clidean distance

(P|Pi−Qi|p)1/r p= 2, r= 2 [1]

2. Power (p,r) Man- hattan distance

(P

|Pi−Qi|p)1/r p= 1, r= 1 [1]

3. Power (p,r) Cheby- shev distance

(P|Pi−Qi|p)1/r p=inf,r=inf,

implementation: d = max(|P−Q|)

[1]

4. Power (p,r) squared Euclidean distance

(P

|Pi−Qi|p)1/r p= 2, r= 1,d4=d21 [1]

(2)

Table 2: Distances on distribution laws [1, chapter 14.2]

id Name d(P,Q) Comment Source

5. Fidelity distance 1−P√

PiQi [1]

6. Matusita distance pP (√

Pi−√

Qi)2 d6=√

d7 [1]

7. Squared-chord dis- tance

P(√ Pi−√

Qi)2 d7=d26 [1]

8. Harmonic mean distance

1−2P((PiQi)/(Pi+Qi)) [1]

9. Bhattacharya 1 dis- tance

(arccos(√

PiQi))2 twice of this is Fisher Dis- tance

[1]

10. Bhattacharya 2 dis- tance

−ln(√

PiQi) [1]

11. Pearsonχ2distance P((Pi−Qi)2/Qi) aka χ2-distance, quasi- distance, non-symmetric

[1]

12. Neyman χ2 dis- tance

P((Pi−Qi)2/Pi) quasi-distance, non- symmetric

[1]

13. Probabilistic sym- metricχ2 distance

2P

((Pi−Qi)2/(Pi+Qi)) [1]

14. Separation quasi- distance

max(1−Pi/Qi) quasi-distance, non- symmetric

[1]

15. Kullback-Leibler distance

PPiln(Pi/Qi) aka relative entropy, information deviation, information gain, KL- distance, quasi-distance, non-symmetric

[1]

16. Skew divergence KL(P, αQ+ (1−α)P) α= 0.1, KL is Kullback- Leibler distance, quasi- distance, non-symmetric

[1]

17. K-divergence dis- tance

KL(P, αQ+ (1−α)P) α= 1/2, KL is Kullback- Leibler distance, quasi- distance, non-symmetric

[1]

18. Jeffrey divergence P

(Pi−Qi) ln(Pi/Qi) aka J-divergence, di- vergence distance, KL2-divergence, non- symmetric

[1]

19. Jensen-Shannon di- vergence

Z=αPi+ (1−α)Qi

αKL(P, Z) + (1−α)KL(Q, Z))

α= 0.1, KL is Kullback- Leibler distance, non- symmetric

[1]

20. Topsøe distance Z= 12(Pi+Qi) P(PilnZPi

i+QilnQZi

i)

aka information statistics [1]

21. Taneja distance Z= 12(Pi+Qi) PZiln(Zi/√

PiQi)

[1]

22. Resistor-average distance

(1/KL(P, Q) + 1/KL(Q, P))−1 KL is Kullback-Leibler distance

[1]

23. Chernoff distance max(−lnP(PitQ1−ti )1−t) t = 0.1, 0 ≤ t < 1,aka R´enyi cross-entropy, non- symmetric

[1]

24. R´enyi distance 1−t1 lnPQi(QPi

i)t) t= 0.1, 0≤t <1, quasi- distance, if t = 1 than Kullback-Leibler distance, non-symmetric

[1]

(3)

Table 3: Similarities and distances for numerical data [1, chapter 17.1]

id Name d(P,Q) Comment Source

25. Ruzicka distance P

|Pi−Qi|/P

max(Pi, Qi) aka Soergel, Tanimoto [1]

26. Roberts distance 1 − P

(((Pi +

Qi)max(Pmin(Pi,Qi)

i,Qi))/P(Pi+Qi))

[1]

27. Intersection dis- tance

1 −

Pmin(Pi, Qi)/min(PPi,PQi)

[1]

28. Motyka distance 1−Pmin(Pi, Qi)/P(Pi+Qi) aka Sørensen distance, Czekanowski distance, d28= 12+12d29

[1]

29. Bray-Curtis dis- tance

1−N( ¯P+ ¯2 Q)Pmin(Pi, Qi) aka Renkonen % similar- ity or percentage similar- ity

[1]

30. Canberra distance P

(|Pi−Qi|/|Pi|+|Qi|) [1]

31. Kulczynski 1 dis- tance

P|Pi−Qi|/P

min(Pi, Qi) [1]

32. Kulczynski 2 dis- tance

1−N2(P1¯ +Q1¯)Pmin(Pi, Qi) [1]

33. Baroni-Urbani- Buser distance

1−

P(min(Pi, Qi)) +pP

(min(Pi, Qi))P(max(P)−max(Pi, Qi)) P(max(Pi, Qi)) +pP

(min(Pi, Qi))P(max(P)−max(Pi, Qi)) [1]

(4)

Table 4: Relatives of Euclidean distance [1, chapter 17.2]

id Name d(P,Q) Comment Source

34. Penrose size dis- tance

√NP|Pi−Qi| [1]

35. Mean character dis- tance

1 N

P|Pi−Qi| aka Gower distance [1]

36. Lorentzian distance P

ln(1 +|Pi−Qi|) [1]

37. Penrose shape dis- tance

pP((Pi−P¯)−(Qi−Q))¯ 2 [1]

38. Clark distance (N1 P

((Pi − Qi)/(|Pi| +

|Qi|))2)1/2

[1]

39. Meehl distance PN−1

i=1 (Pi−Qi−Pi+1+Qi+1)2 [1]

40. Hellinger distance q

2P(p

Pi/P¯−p

Qi/Q)¯ 2 [1]

41. Whittaker index of association distance

1 2

P Pi/P¯−Qi/Q¯

[1]

42. Symmetric χ2 dis-

tance s

X P¯+ ¯Q N( ¯P+ ¯Q)2

(PiQ¯−QiP¯)2 Pi+Qi

aka chi-distance [1]

Table 5: Spectra distances

id Name d(P,Q) Comment Source

43. Spearman Corre- lation Coefficient, Pearson Correla- tion Coefficient

1−√P[(QiQ)(P¯ iP)]¯

P(QiQ)¯ 2P(PiP)¯2 [3, 4, 5].

44. Similarity Index (SI)

r

P{PiQiQi×100}2

N non-symmetric [6]

45. Improved Similar- ity Index

q1 N

P{PPi−Qi

i+Qi ×100}2 [6]

46. Absolute Value Dis- tance

(1 +

P(|Qi−Pi|)

P(Pi) )−1 non-symmetric [7]

47. Dot-Product (co- sine)

(PQiPi)2

PQ2iPPi2 [6]

48. Spectral Contrast Angle

PQiPi

PQ2iPPi2 d48=√

d47 [6]

(5)

Table 6: Distances from Cha

id Name d(P,Q) Comment Source

49. Wave Hedges dis- tance

P |Pi−Qi|

max(Pi,Qi) [2]

50. Cosine distance 1−√PPPiQi

Pi2P

Q2i [2]

51. Jaccard distance PPi2P+P(PiQ−Q2iiP)2PiQi [2]

52. Dice distance PPP(Pi2i+−QPiQ)22i [2]

53. Inner Product dis- tance

1−PPiQi [2]

54. Divergence dis- tance

2P(Pi−Qi)2

(Pi+Qi)2 [2]

55. Additive symmetric χ2 distance

P(Pi−Qi)2(Pi+Qi)

PiQi [2]

56. Jensen difference P[12(PilnPi + QilnQi) − (Pi+Q2 i) ln(Pi+Q2 i)]

[2]

57. Kumar-Johnson distance

P(2(P(Pi2−Q2i)2

iQi)3/2) [2]

58. Avg(L1, L) dis- tance

1 2(P

(|Pi−Qi|)+maxi|Pi−Qi|) d59= 12(d3+d2) [2]

59. Vicis-Wave Hadges distance

P |Pi−Qi|

min(Pi,Qi) [2]

60. Vicis-Symmetricχ2 1 distance

P (Pi−Qi)2

min(Pi,Qi)2 [2]

61. Vicis-Symmetricχ2 2 distance

P (Pi−Qi)2

min(Pi,Qi) [2]

62. Vicis-Symmetricχ2 3 distance

P (Pi−Qi)2

max(Pi,Qi) [2]

63. max-Symmetric χ2 distance

max(P(Pi−Qi)2

Pi ,P(Pi−Qi)2

Qi ) [2]

64. min-Symmetric χ2 distance

min(P(Pi−Qi)2

Pi ,P(Pi−Qi)2

Qi ) [2]

Table 7: Distances to compare compressed data

id Name d(P,Q) Comment Source

65. Hausdorff distance maxx⊆Aminy⊆Qd(x, y)

where: d(x,y) is the Euclidean distance between points x and y implementation:

[P P, QQ] = meshgrid(P, Q);

maximinj|P Pi,j−QQi,j|

ultrametric, Hausdorff metric

[8].

66. Edit distance See reference, implementation (Matlab 2017b):

tol = 0.1;

edr(P,Q,tol);

Editing metric [9]

67. Levenshtein dis- tance

See reference, implementa- tion: https://en.wikibooks.

org/wiki/Algorithm_

Implementation/Strings/

Editing metric [1]

(6)

References

[1] M. M. Deza and E. Deza,Encyclopaedia of Distances, Springer, Berlin, Heidelberg, 2009.

[2] S. H. Cha,Comprehensive survey on distance/similarity measures between probability density, International Journal of Mathematics and Methods in Applied Sciences, vol. 1, no. 4, pp. 300–307, 2007.

[3] C. Saraiva, R., Lovisolo, L. RF Fingerprinting Location Techniques. in S.A.Zekavat, R.M.Buehrer, eds., Handbook of Position Location: Theory, Practice and Advances, Hoboken, NJ, USA: Wiley, 2011.

[4] G. Minaev, A. Visa, R. Pich´e, Comprehensive Survey of Similarity Measures for Ranked Based Location Fingerprinting Algorithm, International Conference on Indoor Positioning and Indoor Navigation (IPIN), Sept 2017.

[5] Y. Xie, Y. Wang, A. Nallanathan, L. Wang,An Improved K-Nearest-Neighbor Indoor Localization Method Based on Spearman Distance. IEEE Signal Processing Letters 23, No. 3, 351355, 2016.

[6] K. Wan, I. Vidavsky, L.M. Gross, Comparing similar spectra: from similarity index to spectral contrast angle, Journal of the American Society for Mass Spectrometry 13, 85 88, 2002, URL: http://www.

sciencedirect.com/science/article/pii/S1044030501003270.

[7] S.E. Stein, D.R. Scott, Optimization and testing of mass spectral library search algorithms for compound identification, Journal of the American Society for Mass Spectrometry 5, 859-866, 1994, URL: http:

//www.sciencedirect.com/science/article/pii/1044030594870098.

[8] M. Atallah,A linear time algorithm for the Hausdorff distance between convex polygons, Inf. Process, Lett.

17, 207-209, 1983.

[9] R.A. Wagner, M.J. Fischer, The string-to-string correction problem. J. ACM 21, 168-173, 1974, URL:

http://doi.acm.org/10.1145/321796.321811.

Viittaukset

LIITTYVÄT TIEDOSTOT

Using the floor plan information in the motion model enables more efficient particle filtering than the conventional particle filter [1] that uses the random-walk motion model

Among all tested measures, a soft token-level measure that combines set matching and q-grams at the character level perform best.. A gap between human perception

ML techniques for indoor positioning are performed on the open source Wi-Fi radio data from Tampere University (formerly Tampere University of Technology), Tampere,

A comprehensive survey of error measures for evaluating binary decision making in data science.. Frank Emmert-Streib 1 | Salisou Moutari 2 | Matthias

The similarity measures, which showed the best position prediction accuracy results, should be tested with different methods, for instance, with Kalman Filter as it is shown in

In this paper, we have investigated the potential of DS data fusion on the context of WiFi indoor positioning via fingerprinting and we compared various data fusion

Unlike at the 2.4GHz band, the results at the 5GHz band indicate that the interpolation and extrapolation methods without enough original fingerprint data are not able to improve

Secondly, instead of incorporating the location of both the robot and the ANs into the state vector of an EKF algorithm, we derive and formulate the EKF-based positioning algorithm