Skip to main content
Log in

Efficient algorithms for agglomerative hierarchical clustering methods

  • Authors Of Articles
  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Whenevern objects are characterized by a matrix of pairwise dissimilarities, they may be clustered by any of a number of sequential, agglomerative, hierarchical, nonoverlapping (SAHN) clustering methods. These SAHN clustering methods are defined by a paradigmatic algorithm that usually requires 0(n 3) time, in the worst case, to cluster the objects. An improved algorithm (Anderberg 1973), while still requiring 0(n 3) worst-case time, can reasonably be expected to exhibit 0(n 2) expected behavior. By contrast, we describe a SAHN clustering algorithm that requires 0(n 2 logn) time in the worst case. When SAHN clustering methods exhibit reasonable space distortion properties, further improvements are possible. We adapt a SAHN clustering algorithm, based on the efficient construction of nearest neighbor chains, to obtain a reasonably general SAHN clustering algorithm that requires in the worst case 0(n 2) time and space.

Whenevern objects are characterized byk-tuples of real numbers, they may be clustered by any of a family of centroid SAHN clustering methods. These methods are based on a geometric model in which clusters are represented by points ink-dimensional real space and points being agglomerated are replaced by a single (centroid) point. For this model, we have solved a class of special packing problems involving point-symmetric convex objects and have exploited it to design an efficient centroid clustering algorithm. Specifically, we describe a centroid SAHN clustering algorithm that requires 0(n 2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • AHO, A.V., HOPCROFT, J.E., and ULLMAN, J.D. (1974),The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • ANDERBERG, M.R. (1973),Cluster Analysis for Applications, New York: Academic.

    Google Scholar 

  • BATAGELJ, V. (1981), “Note on Ultrametric Hierarchical Clustering Algorithms,”Psychometrika, 46, 351–352.

    Google Scholar 

  • BENZECRI, J.P. (1982), “Construction d'une Classification Ascendante Hierarchique par la Recherche en Chaîne des Voisins Réciproques,”Les Cahiers de l'Analyse des Données, VII, 209–218.

    Google Scholar 

  • BRUYNOOGHE, M. (1978), “Classification Ascendante Hiérarchique des Grands Ensembles de Données: un Algorithme Rapide Fondé sur la Construction des Voisinages Réductibles,”Les Cahiers de l'Analyse des Données, III, 7–33.

    Google Scholar 

  • CORMACK, R.M. (1971), “A Review of Classification,”Journal of the Royal Statistical Society, Series A, 134, 321–367.

    Google Scholar 

  • COXETER, H.S.M. (1963), “An Upper Bound for the Number of Equal Nonoverlapping Spheres that can Touch Another of the Same Size,” inProceedings of the Symposium in Pure Mathematics VII, Convexity, ed. V. Klee, Providence, Rhode Island: American Mathematical Society, 53–71.

    Google Scholar 

  • DEFAYS, D. (1977), “An Efficient Algorithm for a Complete Link Method,”Computer Journal, 20, 364–366.

    Google Scholar 

  • DUBIEN, J.L., and WARDE, W.D. (1979), “A Mathematical Comparison of the Members of an Infinite Family of Agglomerative Clustering Algorithms,”Canadian Journal of Statistics, 7, 29–38.

    Google Scholar 

  • EDELSBRUNNER, H., and VAN LEEUWEN, J. (1983), “Multidimensional Data Structures and Algorithms — a Bibliography,” Report F 104, Institute für Informationsverarbeitung, Technische Universität Graz, Graz, Austria.

    Google Scholar 

  • EVERITT, B. (1980).Cluster Analysis (Second Edition), London: Heinemann.

    Google Scholar 

  • FLORIAN, A. (1980), “Newtonsche und Hadwigersche Zahlen,” Report 145, Institut für Mathematik, Universität Salzburg, Salzburg, Austria.

    Google Scholar 

  • GOWER, J.C., and ROSS, G.J.S. (1969), “Minimum Spanning Trees and Single Linkage Cluster Analysis,”Applied Statistics, 18, 54–64.

    Google Scholar 

  • GROEMER, H. (1961), “Abschätzungen für die Anzahl der konvexen Körper, die einen konvexen Körper berühren,”Monatshefte für Mathematik, 65, 74–81.

    Google Scholar 

  • GRUNBAUM, B. (1961), “On a Conjecture of H. Hadwiger,”Pacific Journal of Mathematics, 11, 215–219.

    Google Scholar 

  • HADWIGER, H. (1957), “Uber Treffanzahlen bei translationsgleichen Eikörpern,”Archiv der Mathematik, 8, 212–213.

    Google Scholar 

  • HADWIGER, H., and DEBRUNNER, H. (1955),Kombinatorische Geometrie in der Ebene, Genève: Monographies de l'Enseignement Mathématique.

    Google Scholar 

  • HARTIGAN, J.A. (1975),Clustering Algorithms, New York: John Wiley.

    Google Scholar 

  • HUBERT, L., and SCHULTZ, J. (1975), “Hierarchical Clustering and the Concept of Space Distortion,”British Journal of Mathematical and Statistical Psychology, 28, 121–133.

    Google Scholar 

  • HWANG, F.K. (1979), “An 0(n log n) Algorithm for Rectilinear Minimal Spanning Tree,”Journal of the Association for Computing Machinery, 26, 177–182.

    Google Scholar 

  • JARDINE, N., and SIBSON, R. (1971),Mathematical Taxonomy, London: John Wiley.

    Google Scholar 

  • JOHNSON, S.C. (1967), “Hierarchical Clustering Schemes,”Psychometrika, 32, 241–254.

    PubMed  Google Scholar 

  • JUAN, J. (1982), “Programme de Classification Hiérarchique par l'Algorithme de la Recherche en Chaîne des Voisins Réciproques,”Les Cahiers de l'Analyse des Données, VII, 219–225.

    Google Scholar 

  • LANCE, G.N., and WILLIAMS, W.T. (1966), “A Generalized Sorting Strategy for Computer Classifications,”Nature, 212, 218.

    Google Scholar 

  • LANCE, G.N., and WILLIAMS, W.T. (1967), “A General Theory of Classificatory Sorting Strategies. 1. Hierarchical Systems,”Computer Journal, 9, 373–380.

    Google Scholar 

  • MILLIGAN, G.W. (1979), “Ultrametric Hierarchical Clustering Algorithms,”Psychometrika, 44, 343–346.

    Google Scholar 

  • MURTAGH, F. (1983), “A Survey of Recent Advances in Hierarchical Clustering Algorithms,”Computer Journal, 26, 354–359.

    Google Scholar 

  • ROHLF, F.J. (1973), “Algorithm 76. Hierarchical Clustering Using the Minimum Spanning Tree,”Computer Journal, 16, 93–95.

    Google Scholar 

  • ROHLF, F.J. (1977), “Computational Efficiency of Agglomerative Clustering Algorithms,” Report RC 6831, IBM T.J. Watson Research Center, Yorktown Heights, New York.

    Google Scholar 

  • ROHLF, F.J. (1978), “A Probabilistic Minimum Spanning Tree Algorithm,”Information Processing Letters, 7, 44–48.

    Google Scholar 

  • ROHLF, F.J. (1982), “Single-link Clustering Algorithms,” inHandbook of Statistics, Vol. 2, eds. P.R. Krishnaiah and L.N. Kanal, Amsterdam and New York: North Holland, 267–284.

    Google Scholar 

  • ROSS, G.J.S. (1969), “Algorithm AS 15. Single Linkage Cluster Analysis,”Applied Statistics, 18, 106–110.

    Google Scholar 

  • SHAMOS, M.I., and HOEY, D. (1975), “Closest-point Problems,”Sixteenth Symposium on Foundations of Computer Science, New York: Institute of Electrical and Electronics Engineers, 151–162.

    Google Scholar 

  • SIBSON, R. (1973), “SLINK: an Optimally Efficient Algorithm for the Single-link Cluster Method,”Computer Journal, 16, 30–34.

    Google Scholar 

  • SNEATH, P.H.A., and SOKAL, R.R. (1973),Numerical Taxonomy, San Francisco: W.H. Freeman.

    Google Scholar 

  • WARD, Jr., J.H. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.

    Google Scholar 

  • WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”Computing Surveys, 9, 291–313.

    Google Scholar 

  • WISHART, D. (1969), “256 Note: An Algorithm for Hierarchical Classifications,”Biometrics, 25, 165–170.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Day, W.H.E., Edelsbrunner, H. Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification 1, 7–24 (1984). https://doi.org/10.1007/BF01890115

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01890115

Keywords

Navigation