|
Clustering
C++20 header-only: DBSCAN, HDBSCAN, k-means.
|
Classes | |
| struct | MstEdge |
| One edge of the minimum spanning tree of mutual-reachability distances. More... | |
| struct | MstOutput |
| Frozen output contract of every MST backend. More... | |
| class | AutoMstBackend |
| MST backend that dispatches between Prim, Boruvka, and NN-Descent on input shape. More... | |
| class | BoruvkaMstBackend |
| KDTree-accelerated Boruvka MST backend over mutual-reachability distances. More... | |
| struct | NnDescentMstConfig |
| Tuning knobs for the NN-Descent MST backend. More... | |
| class | NnDescentMstBackend |
| Approximate minimum-spanning-tree backend over mutual-reachability distance via an NN-Descent-built kNN graph plus Kruskal, with a connectivity fallback that closes any disconnected components. More... | |
| class | PrimMstBackend |
| Exact minimum-spanning-tree backend over mutual-reachability distance, streaming Prim. More... | |
Concepts | |
| concept | MstBackendStrategy |
| Contract for an MST backend satisfying the frozen clustering::hdbscan::MstOutput shape. | |
Enumerations | |
| enum class | ClusterSelectionMethod : std::uint8_t { kEom , kLeaf } |
| Cluster extraction method on the condensed tree. More... | |
| enum class | MinSamplesConvention : std::uint8_t { kSklearn , kCampello } |
Semantics of the minSamples parameter at core-distance extraction. More... | |
Variables | |
| constexpr std::size_t | kPrimMaxN = std::size_t{16384} |
| Compute budget that gates the streaming Prim backend, expressed as the maximum point count it will accept. | |
| constexpr std::size_t | kPrimMrdMatrixByteBudget = kPrimMaxN * kPrimMaxN * sizeof(float) |
Equivalent byte-budget phrasing of kPrimMaxN, kept so callers that gate on n*n*sizeof(T) <= kPrimMrdMatrixByteBudget compile unchanged. | |
| constexpr std::size_t | kPrimDenseCoreMinN = 1024 |
| Thresholds gating the dense symmetric core-distance pass. | |
| constexpr std::size_t | kPrimDenseCoreMinD = 17 |
| constexpr std::size_t | kPrimDenseCoreMaxMinSamples = 64 |
| constexpr std::size_t | kPrimPersistentRelaxMinWorkers = 8 |
| constexpr std::size_t | kPrimPersistentRelaxMinOpsPerWorker = std::size_t{1} << 14 |
|
strong |
Cluster extraction method on the condensed tree.
The two methods differ in traversal behaviour, not in output shape. Excess-of-mass is the default and optimizes per-cluster stability over the full tree; leaf returns every leaf of the condensed tree as its own cluster.
| Enumerator | |
|---|---|
| kEom | Excess-of-mass selection (the published default). |
| kLeaf | Leaf-cluster selection; every condensed-tree leaf becomes a cluster. |
|
strong |
Semantics of the minSamples parameter at core-distance extraction.
Campello 2015 defines core_k(p) as the distance from p to its k-th nearest neighbour excluding p itself. scikit-learn and the original hdbscan package count p itself as one of the k neighbours, so for a given min_samples = k those implementations return what Campello would call core_{k-1}. Both conventions are widely deployed and produce different MSTs on high-dimensional near-uniform inputs; the enum is the API boundary between them.
|
inlineconstexpr |
Definition at line 53 of file prim_mst_backend.h.
|
inlineconstexpr |
Definition at line 52 of file prim_mst_backend.h.
|
inlineconstexpr |
Thresholds gating the dense symmetric core-distance pass.
The dense pass computes all n*(n-1)/2 squared distances once and feeds both endpoints into a per-row top-minSamples tracker. Its total work is O(n^2 * d) distance compute plus O(n^2 * minSamples) top-k bookkeeping. It beats the KDTree kNN fallback once minSamples is small enough for the top-k write amp to stay bounded and (n, d) is large enough to amortise the symmetric scan's fixed cost. Below these bounds the KDTree path wins; above kPrimDenseCoreMaxMinSamples the top-k rescan per update dominates.
Definition at line 51 of file prim_mst_backend.h.
|
inlineconstexpr |
Compute budget that gates the streaming Prim backend, expressed as the maximum point count it will accept.
The streaming Prim variant materialises only O(n) state (no n*n MRD matrix), but the inner relax recomputes d-wide squared-Euclidean distances per popped vertex, costing O(n^2 * d) total scalar work. Beyond this n the dispatcher should prefer NN-Descent (high d) or Boruvka (low d), both of which scale better than O(n^2).
Definition at line 34 of file prim_mst_backend.h.
|
inlineconstexpr |
Equivalent byte-budget phrasing of kPrimMaxN, kept so callers that gate on n*n*sizeof(T) <= kPrimMrdMatrixByteBudget compile unchanged.
Numerically equal to kPrimMaxN squared, scaled by sizeof(float).
Definition at line 39 of file prim_mst_backend.h.
|
inlineconstexpr |
Definition at line 55 of file prim_mst_backend.h.
|
inlineconstexpr |
Definition at line 54 of file prim_mst_backend.h.