• Ei tuloksia

We have studied the problems of K-means clustering and context quantization. We have proposed and improved several clustering algorithms in the framework of K-means clustering. We presented a simple genetic clustering algorithm in the case of an unknown number of clusters, which is implemented through the randomized swapping of one reference vector and a simple local repartition clustering algorithm. The genetic clustering algorithm is able to detect the number of clusters successfully with an appropriate clustering evaluation function.

We investigated the k-center clustering problem with binary data objects by minimizing the stochastic complexity. For this purpose, we proposed a new clustering dissimilarity function, the ∆SC-distance. The SC-distance outperforms the two other distances when clustering binary data objects by minimizing stochastic complexity:

the L2-distance and the Shannon code-length distance. We have remarked that the design scheme of the ∆SC-distance is general in nature as it can be extended to any other cost function. We have succeeded to extend the design scheme of ∆SC-distance to minimization of the MSE distortion in solving the K-means clustering problem.

Accordingly, we have presented another heuristic dissimilarity function, the Delta-MSE dissimilarity, which is superior to the traditional L2 distance.

We have proposed two suboptimal K-means clustering algorithms by using the linear Fisher discriminant analysis and the kernel principal component analysis. The suboptimal K-means clustering algorithms are designed in the spirit of estimating an initial clustering solution that approaches the global optimum. Both of the two suboptimal K-means algorithms select their initial clustering solutions by using dynamic programming either in Fisher discriminant subspace or in the kernel principal component subspace. The proposed suboptimal K-means algorithms have been shown to outperform the previous suboptimal K-means clustering algorithms. We reminded

the readers that usability of the two algorithms is not limited to dealing with the small-scale datasets.

We studied and formulated the problem of context clustering or context quantization in lossless image compression in a framework of JPEG-LS standard. The problem of context clustering was formulated as the K-means clustering in terms of the Kullback-Leibler distance, which was tackled by the generalized Lloyd method.

Then the prediction residuals in JPEG-LS were divided into two parts and coded separately by the clustered contexts. Regardless of the dependency between the two separated parts, the context clustering method is still successful in approaching the JPEG-LS coding rate.

We continued our progress in context quantization by proposing the new context quantizer design algorithms based on multi-class Fisher discriminant and the kernel Fisher discriminant. The quantizer mapping function is simplified by using the dynamic programming technique over the multi-class linear Fisher discriminant and kernel Fisher discriminant. Since the kernel Fisher discriminant is able to describe linearly inseparable quantizer cells ideally, the kernel Fisher discriminant based design algorithm successfully approaches the optimal context quantizer designed in the probability simplex space but with the practical implementation of a simple context quantizer mapping function.

The main contribution of this work is to investigate the problem of K-means clustering and the problem of context quantization by using different data reduction techniques and dynamic programming. However, the data reduction techniques used in this work only extract an appropriate feature or the principal subspace from the input data globally, which assumes that the input data is homogenously distributed.

A future extension of this work is to incorporate the local data reduction techniques (e.g. local PCA) into the conventional clustering and classification problems. For example, one can choose the median vector from the subcluster centers formed by using PCA and dynamic programming on each cluster, which is able to efficiently

reduce the computational complexity of the K-median algorithm. Of course, this design scheme can be also extended to the online or sequential clustering algorithm (e.g. the MBAS algorithm) in the sense of offering more robustness on additive noise.

A more direct extension is to apply local PCA first and then conduct the dynamic programming on each clustered component, but this imposes another difficulty on how to determine the number of subclusters for each cluster. We argue that the determination of the number of subcluster for each cluster can be approximately cast to an integer programming problem, which will be an interesting direction of our future work.

References

[AKM87] A. Aggarval, M. Klave, S. Moran, P. Shor and R. Wilber, “Geometric applications of a matrix-searching algorithm,” Algorithmica, vol. 2, pp. 195-208, 1987.

[AST94] A. Aggarwal, B. Schieber and T. Tokuyama, “Finding a minimum-weight k-link path in graphs with the concave Monge property and applications,” Discrete Computing Geometry, vol. 12, pp.263-280, 1994.

[AYA97] M. Arimura, H. Yamamoto and S. Arimoto, “A bitplane tree weighting method for lossless compression of gray scale images,” IEICE Trans. Fundamentals, vol. E80-A, no. 11, pp. 2268-2271, Nov. 1997.

[BA00] G. Baudat and F. Anouar, “Generalized discriminant analysis using a kernel approach,” Neural Computation, vol. 12, no. 10, pp. 2385-2404, 2000.

[Bez80] J. C. Bezdek, “A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms,” IEEE Trans. on PAMI, vol. 2, no.1, pp.1-8, 1980.

[BFR99] P. Bradley, U. Fayyad and C. Reina, “Scaling EM clustering to large databases,” Technical report, MSR-TR-98-35,Microsoft Research, 1999.

[BH67] G. Ball and D. Hall, “A clustering technique for summarizing multivariate data,” Behavioral Science, vol.2, pp. 153-155, March 1967.

[BIR] http://www.cs.uic.edu/~dyu/birch.ppt

[Bis95] C. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.:

Clarendon, 1995.

[BLS99] H. Bischof, A. Leonardis and A. Selb, “MDL principle for robust vector quantization,” Pattern Analysis and Applications, vol. 2, no. 1, pp. 59-72, 1999.

[BMS97] P. S. Bradley, O. L. Mangasarian and W. N. Street, “Clustering via Concave Minimization,” Advances in Neural Information Processing Systems 9 (NIPS9), pp.

368-374, MIT Press, Cambridge, MA 1997.

[Bot92] R. Bottemiller, “Comments on a new vector quantization clustering algo- rithm,” IEEE Trans. on Signal Processing, vol. 40, no. 2, pp. 455-456, Feb. 1992.

[BP98] J. Bezdek and N. Pal, “Some new indexes of cluster validity,” IEEE Trans. on Systems, Man, and Cybernetics, Part B, vol. 28, no.3, pp. 301-315, 1998.

[Bruc64] J.D. Bruce, Optimum quantization, ScD thesis, MIT, May 14, 1964.

[BRY98] A.Barron, J.Rissanen and B.Yu, “The minimum description length principle in coding and modeling,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2743-2760, 1998.

[CBL04] S. Chen, C. Bouman and M.Lowe, “Clustered components analysis for functional MRI,” IEEE Trans. on Medical Imaging, vol. 23, no. 1, pp. 85 – 98, Jan.

2004.

[CCI92] CCITT Draft Recommendation T.82, ISO/IEC Draft International Standard 11544, Coded Representation of Picture and Audio Information – Progressive Bi-level Image Compression, Apr. 1992.

[CDB86] R. Cannon, J. Dave and J. Bezdek, “Efficient implementation of the fuzzy c-means clustering algorithms,” IEEE Trans. on PAMI, vol. 8, no. 2, pp. 248--255, 1986.

[CF00] D. Charlesa and C. Fyfe, “Kernel Factor Analysis with Varimax Rotation,” in IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00), vol. 3, Como, Italy, July 24 - 27, 2000.

[CG98] K. Chang and J. Ghosh, “Principal curves for nonlinear feature extraction and classification,” SPIE Applications of Artificial Neural Networks in Image Processing III, vol. 3307, pp. 120—129, 1998.

[CG01] K. Chang and J. Ghosh, “A unified model for probabilistic principal surfaces,”

IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 23, no. 1, pp. 22-41, 2001.

[CGS00] I. Cadez, S. Gaffney and P. Smyth, “A general probabilistic framework for clustering individuals,” in Proc. of the sixth ACM SIGKDD International Conference, pp. 140 – 149, Boston, Massachusetts, United States, 2000.

[Chav98] M. Chavent, “A monothetic clustering method,” Pattern Recognition Letters, vol. 19, pp. 989-996, September 1998.

[Che04] J. Chen, “Context modeling based on context quantization with application in wavelet image coding,” IEEE Trans. on Image Processing, vol. 13, no. 1, Jan. 2004.

[CH91] S. Chen and W Hsieh, “Fast algorithm for VQ codebook design,” in Proc. Inst.

Elect. Eng., vol. 138, pp. 357–362, Oct. 1991.

[CS95] C. Chinrungrueng and C. H. Sequin, “Optimal adaptive k-means algorithm with dynamic adjustment of learning rate,” IEEE Trans. on Neural Network, vol. 6, no.

1, pp. 157 -169, January 1995.

[CTC94] Tzi-Dar Chiueh, Tser-Tzi Tang and Liang-Gee Chen, “Vector Quantization Using Tree-Structured Self-Organizing Feature Maps,” IEEE Journal on Selected Areas in Communications, vol. 12, no. 9, pp. 1594-1599, December 1994.

[CT03] G.C. Cawley and N.L.C. Talbot, “Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers,” Pattern Recognition, vol. 36, no. 11, pp. 2585-2592, November 2003.

[DB79] D.L. Davies and D.W. Bouldin, “A cluster separation measure,” IEEE Trans.

on Pattern Analysis and Machine Intelligence, vol. 1, no. 2, pp. 224-227, 1979.

[Del01] P. Delicado, “Another look at principal curves and surfaces,” Journal of Multivariate Analysis, vol. 77, pp. 84-116, 2001.

[DH73] R. Duda and P. Hart, Pattern Classification and Scene Analysis, John Wiley Sons, 1973.

[DH02] C. Ding and X. He, “Cluster merging and splitting in hierarchical clustering algorithms Data Mining,” in Proc. of 2002 IEEE International Conference on Data Mining, pp.139 – 146, Maebashi City, Japan, 9-12 December, 2002.

[DK96] K. Diamantaras and S. Kung, Principal Component Neural Networks: Theory and Applications, Wiley, New York, March 1996.

[DK97] R. Dave and R. Krishnapurum, “Robust Clustering Methods: A Unified View,” IEEE Trans. on Fuzzy Systems, vol. 5, no. 2, pp. 270-293, May 1997.

[DLR77] A. P. Dempster, N. M. Laird and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society B, vol.

39, no. 1, pp. 1-38, 1977.

[DL92] M. Das and N. Loh, “New studies on adaptive predictive coding of images using multiplicative autoregressive models,” IEEE Trans. on Image Processing, vol. 1, no. 1, pp. 106-111, January 1992.

[Dunn74] J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting Compact Well-Separated Clusters,” Journal of Cybernetics, vol. 3, no. 3, pp.

32-57, 1974.

[ECH01] V. Estivill-Castro and M.E. Houle, “Robust Distance-Based Clustering with Applications to Spatial Data Mining,” Algorithmica -- Special Issue on Algorithms for Geographic Information, vol. 30, no. 2, pp. 216-242, June 2001.

[ECY04] V. Estivill-Castro and J. Yang, “Fast and Robust General Purpose Clustering Algorithms,” Data Mining and Knowledge Discovery, vol. 8, no. 2, pp. 127 – 150, March 2004.

[Eks96] N. Ekstrand, “Lossless compression of grayscale images via context tree weighting,” in Proc. of IEEE Data Compression Conference, pp. 132-139, Apr. 1996.

[Equ89] W. Equitz, “A new vector quantization clustering algorithm,” IEEE Trans.

Acoust., Speech, Signal Process., vol. 37, pp. 1568-1575, Oct. 1989.

[FDAR03] FDA Draft Assessment of the Relative Risk to Public Health from Foodborne Listeria monocytogenes Among Selected Categories of Ready-to-Eat Foods, FDA/Center for Food Safety and Applied Nutrition, USDA/Food Safety and Inspection Service, Centers for Disease, Centers for Disease Control and Prevention, September 2003.

[FDB04] G. Fung, M. Dundar, J. Bi and B. Rao, “A fast iterative algorithm for fisher discriminant using heterogeneous kernels,” in Carla E. Brodley editor, Proc. of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004.

[FGG00] P. Fränti, H.G. Gyllenberg, M. Gyllenberg, J. Kivijärvi, T. Koski, T. Lund and O. Nevalainen, “Minimizing stochastic complexity using local search and GLA with applications to classification of bacteria,” Biosystems, vol. 57, no. 1, pp. 37-48, June 2000.

[FKSC00] P. Fränti, T. Kaukoranta, D. Shen and K. Chang, “Fast and memory efficient implementation of the exact PNN,” IEEE Trans. on Image Processing, vol. 9, no. 5, pp. 773-777, May 2000.

[FK99] H. Frigui and R. Krishnapuram, “A Robust Competitive Clustering Algorithm With Applications in Computer Vision,” IEEE Trans. on PAMI, vol. 21, no.5, pp.450-465, 1999.

[For65] E. Forgy, “Cluster analysis of multivariate data: efficiency versus interpretability of classifications,” Biometrics, vol. 21, no. 3, pp.768, 1965.

[Fran99] P. Fränti, “On the usefulness of self-organizing maps for the clustering problem in vector quantization,” in Proc. of 11th Scandinavian Conf. on Image Analysis (SCIA’99), Kangerlussuaq, Greenland, pp. 415-422, 1999.

[Fran00] P. Fränti, “Genetic algorithm with deterministic crossover for vector quantization,” Pattern Recognition Letters, vol. 21, no. 1, pp. 61-68, 2000.

[FR98] C.Fraley and AE Raftery, “How many clusters? Which clustering method?

Answers via model based cluster analysis,” Computer Journal, no.41, pp. 578–588, 1998.

[FS75] D. Foley and J. Sammon, “A optimal set of discriminant vectors,” IEEE Trans.

on Computers, vol. 3, no. 24, pp. 281-289, 1975.

[FV03] P. Fränti and O. Virmajoki, “Genetic algorithm using iterative shrinking for solving clustering problems,” in Proc. Wessex Data Mining Conf. 2003, Rio de Janeiro, Brasil, pp. 193-204, December 2003.

[FWA04] S. Forchhammer, X. Wu and J.D. Andersen, “Lossless image data sequence compression using optimal context quantization,” IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 509-517, Apr. 2004.

[FWM99] L. Fabrigar, D. Wegener, R. MacCallum and E. Strahan, “Evaluating the use of exploratory factor analysis in psychological research,” Psychological Methods, 3, pp. 272-299, 1999.

[Giro02] M. Girolami, “Mercer Kernel Based Clustering in Feature Space,” IEEE Trans. on Neural Networks, vol. 13, no. 4, pp. 780 – 784, 2002.

[GKV94] M. Gyllenberg, M. Koski and M. Verlaan, ‘Clustering and quantization of binary vectors with stochastic complexity,” in Proc. IEEE International Symposium on Information Theory, Trondheim, Germany, 1994.

[GYZ98] D. Greene, F. Yao and T. Zhang, “A linear algorithm for optimal context clustering with application to bi-level image coding,” in Proc. 1998 Int’l. Conf. Image Processing, pp.508-511, 1998.

[HC95] Q. Huo and C. Chan, “Contextual Vector Quantization for Speech Recognition with Discrete Hidden Markov Model,” Pattern Recognition, vol.28, no.4, pp. 513-517, 1995.

[HS89] T. Hastie and W. Stuetzle, “Principal curves,” Journal of the American Statistical Association, vol. 84, pp. 502-516, 1989.

[Jarv78] R. Jarvis, “Shared near neighbor maximal spanning trees for cluster analysis,” in Proc. of the 4th International Joint Conference on Pattern Recognition, pp. 308--313, November 1978.

[JD88] A. Jain and R. Dubes, Algorithms for clustering data, Prentice-Hall Inc., Upper Saddle River, NJ, 1988.

[Jel99] F. Jelinek, Statistical Methods for Speech Recognition, The MIT Press, Cambridge, MA, 1999.

[JMF99] A. K. Jain, M. N. Murty and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys (CSUR), vol. 31 no. 3, pp. 264-323, September 1999.

[Kai58] H. Kaiser, “The varimax criterion for analytic rotation in factor analysis,”

Psychometrika, vol. 23, pp. 187–200, 1958.

[Ker97] Paul R. Kersten, “Implementation Issues in the Fuzzy c-Medians Algorithm,”

in Proc. of the 6th International Conference on Fuzzy Systems, pp. 957--962, Barcelona, Spain, 1997.

[KFN00] T. Kaukoranta, P. Fränti and O. Nevalainen, “A fast exact GLA based on code vector activity detection,” IEEE Trans. on Image Processing, vol. 9, no. 8, pp.

1337-1342, August 2000.

[KFN03] J. Kivijärvi, P. Fränti and O. Nevalainen, ‘Self-adaptive genetic algorithm for clustering,” Journal of Heuristics, vol. 9, no. 2, pp. 113-129, March 2003.

[King67] B. King, "Step-wise clustering procedures," Journal of American Statistics Association, no. 69, pp. 86-101, 1967.

[KKL00] B. Kégl, A. Krzyzak, T. Linder and K. Zeger, “Learning and design of principal curves,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 3, pp. 281-297, 2000.

[KK02] B. Kégl and A. Krzyzak, “Piecewise linear skeletonization using principal curves,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, pp.

59-74, 2002.

[KL97] N. Kambhatla and T.K. Leen, “Dimension Reduction by Local Principal Component Analysis,” Neural Computation, vol. 9, no. 7, pp. 1493-1516, 1997.

[KMN02] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman and A. Wu,

“An Efficient k-Means Clustering Algorithm: Analysis and Implementation,” IEEE Trans. on PAMI, vol. 24, no. 7, pp. 881-892, July 2002.

[KMW04] P. Kontkanen, P. Myllymäki, W. Buntine, J. Rissanen and H. Tirri, “An MDL Framework for Data Clustering,’ in Advances in Minimum Description Length:

Theory and Applications, Eds: P. Grünwald, I. Myung and M. Pitt, The MIT Press, 2004.

[KNF75] W. Koontz, P. Narendra and K. Fukunaga, “A Branch and Bound Clustering Algorithm” IEEE Trans. on Computer, vol.24, no.9, pp. 908-915, 1975.

[Koh95] T. Kohonen, Self-Organizing Maps, Springer, Berlin, 1995.

[KR87] L. Kaufman and P. J. Rousseeuw, “Clustering by means of medoids,” in Statistical Data Analysis Based on the L1-Norm, Y. Dodge, Ed. Amsterdam, pp. 405-416, The Netherlands: North-Holland, 1987.

[KS98] F. Kossentini and M.J.T. Smith, “A fast pnn design algorithm for entropy- constrained residual vector quantization,” IEEE Trans. on Image Processing, vol. 7, no. 7, pp. 1045-1050, 1998.

[KS03] P. Kennedy and S. Simoff, “CONGO: Clustering on the Gene Ontology,” in Proc. 2nd Australasian Data Mining Workshop ADM03, pp. 181-198, Sydney University of Technology, Australia, 2003.

[LBG80] Y. Linde, A. Buzai and R. Gray, “An algorithm for vector quantizer design,”

IEEE Trans. on Communications, vol. 28, no. 1, pp. 84-85, January 1980.

[LBM02] A. Leonardis, H. Bischof and J. Maver, “Multiple eigenspaces,” Pattern Recognition, pp. 2613-2627, vol. 11, no. 35, 2002.

[LVV03] A. Likas, N. Vlassis and J. J. Verbeek, “The global k-means clustering algorithm,” Pattern Recognition, vol. 36, no. 2, pp. 451-461, 2003.

[Mac67] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” In L. M. LeCam and J. Neyman, editors, Proc. Fifth Berkeley Symposium on Math. Stat. and Prob., pp. 281—297, University of California Press, 1967.

[MMW03] M. Mrak, D. Marpe and T. Wiegand, “A context modeling algorithm and its application in video compression,” in Proc. 2003 Int. Conf. on Image Processing, Barcelona, Spain, Sept. 2003.

[Mog02] B. Moghaddam, “Principal manifolds and probabilistic subspaces for visual recognition,” IEEE Trans. on PAMI., vol. 24, no. 6, pp. 780–788, June 2002.

[MRM01] S. Mika, G. Rätsch and K.-R. Müller, “A mathematical programming approach to the Kernel Fisher algorithm,” in T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pp. 591-597, MIT Press, 2001.

[MRW99] S. Mika, G. Rätsch, J. Weston, B. Schölkopf and K. R. Müller, “Fisher discriminant analysis with kernels,” in Y.-H. Hu, J. Larsen, E. Wilson, and S. Douglas, editors, Neural Networks for Signal Processing IX, pp. 41-48, IEEE, 1999.

[MRW00] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A.J. Smola and K.-R. Müller,

“Invariant feature extraction and classification in kernel spaces,” in S.A. Solla, T.K.

Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pp. 526-532. MIT Press, 2000.

[MRW03] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola and K. R. Müller,

“Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces,” IEEE Trans. on PAMI., vol. 25, no. 5, pp. 623 – 633, 2003.

[MSS01] S. Mika, A.J. Smola and B. Schölkopf, “An improved training algorithm for kernel fisher discriminants,” in T. Jaakkola and T. Richardson, editors, Proc. AISTATS 2001, pp. 98-104, San Francisco, CA, 2001.

[Murt83] F. Murtagh, “A survey of recent advances in hierarchical clustering algorithms,” The Computer Journal, no. 26, pp. 354-359, 1983.

[NMC97] S. Nicholson, B. Milner and S. Cox, “Evaluating Feature Set performance using the F-ratio and J-measures,” in 5th European Conference on Speech Communication and Technology, pp. 413—416, Rhodes, USA, 1997.

[NS02] P. Navarrete and J. Ruiz-del-Solar, “On the Generalization of Kernel Machines,” in First International Workshop on Pattern Recognition with Support Vector Machine - Lecture Notes in Computer Science 2388, Springer, pp. 24-39, Niagara Falls, Canada, August 10, 2002.

[Oja82] E. Oja, “A simplified neuron model as a principal component analyzer,”

Journal of Mathematical Biology, vol. 15, pp. 267-273, 1982

[OO02] C. Ordonez and E. Omiecinski, “FREM: fast and robust EM clustering for large data sets,” in Proc. of ACM International Conference on Information and Knowledge Management, pp. 590-599, McLean, VA, USA, November 4-9, 2002.

[PM99] D. Pelleg and A. Moore, “Accelerating exact k-means algorithms with geometric reasoning,” in Proc. of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 277 – 281, San Diego, California, United States, 1999.

[PZ99] V. Pan and Z. Zhao, “The complexity of the matrix eigenproblem,” in Proc. of the thirty-first annual ACM symposium on Theory of computing, Atlanta, pp. 507 - 516, Georgia, United States, 1999.

[RB78] V. Raghavan and K. Birchard, “A clustering strategy based on a formalism of the reproductive process in natural systems,” in Proc. of the 2nd International Conference on Research and Development in Information Retrieval, pp. 10-22, 1978 [RB93] L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, USA, 1993.

[RG99] S. Roweis and Z. Ghahramani, “A unifying review of linear Gaussian models,” Neural Computation, vol.11 no.2, pp.305-345, Feb. 15, 1999.

[RG01] R. Rosipal and M. Girolami, “An Expectation-Maximization Approach to Nonlinear Component Analysis,” Neural Computation, vol. 13, no.3, pp. 505-510, 2001.

[RH94] T. Raymond and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” in Proc. of the 20th International Conference on Very Large Data Bases, pp. 144-155, 1994.

[Ris83] J. Rissanen, “A universal data compression system,” IEEE Trans. Info. Theory, vol. 29, no. 5, pp. 656-664, Sept. 1983.

[Ris89] J. Rissanen, Stochastic Complexity in Statistical Inquiry, World Scientific, 1989.

[RL87] P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, J.

Wiley, New York, 1987.

[Sch78] G. Schwarz, “Estimating the dimension of a model,” Annual of Statistics, pp.

461-464, 1978.

[Shar78] D. Sharma, “Design of absolutely optimal quantizers for a wide class of distortion measures,” IEEE Trans. Inform. Theory,vol.24, no. 6, pp. 693-702, 1978.

[SJ03] C. Sugar and G. James, “Finding the Number of Clusters in a Data Set: An Information Theoretic Approach,” Journal of the American Statistical Association, no.

98, pp. 750-763, 2003.

[SMB99] B. Schölkopf, S. Mika, C.J.C. Burges, P. Knirsch, K. R. Müller, G. Rätsch and A.J. Smola, “Input space vs. feature space in kernel-based methods,” IEEE Trans.

on Neural Networks, vol. 10, no. 5, pp. 1000-1017, September 1999.

[SMS98] B. Schölkopf, S. Mika, A. Smola, G. Rätsch and K. R. Müller, “Kernel PCA pattern reconstruction via approximate pre-images,” in Proc. of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing, pp.

147-152. Springer Verlag, Berlin, 1998.

[Smy00] P.Smyth, “Model selection for probabilistic clustering using cross-validated likelihood,” Statistics and Computing, vol. 10, no.1, pp. 63 – 72, January 2000.

[Spa80] H. Späth, Cluster Analysis Algorithms for Data Reduction and Classification of Objects, Ellis Horwood Publ., Chichester, U.K., 1980.

[Spa85] H. Spath, Cluster Dissection and Analysis, Chichester, England: Ellis Horwood, 1985.

[SS00] A. J. Smola and B. Schölkopf, ‘Sparse Greedy Matrix Approximation for Machine Learning,” in P. Langley editor, Proc. of the Seventeenth International Conference on Machine Learning (ICML 2000), pp. 911-918, Stanford University, Stanford, CA, USA, June 29 - July 2, 2000.

[Tib92] R. Tibshirani, “Principal curves revisited,” Statistics and Computation, vol. 2, pp. 183-190, 1992.

[TK99] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, San Diego, USA, 1999.

[TWH01] R. Tibshirani, G. Walther and T. Hastie, ‘Estimating the Number of Data Clusters via the Gap Statistic,” Journal of Statistics Society B, no. 63, pp. 411—423, 2001.

[VFT01] O. Virmajoki, P. Fränti and T. Kaukoranta, “Practical methods for speeding-up the PNN method,” Optical Engineering, vol. 40, no. 11, pp. 2495-2504, November 2001.

[VF03] O. Virmajoki and P. Fränti, ‘Fast pairwise nearest neighbor based algorithm

[VF03] O. Virmajoki and P. Fränti, ‘Fast pairwise nearest neighbor based algorithm