|
| | HDBSCAN (std::size_t minClusterSize, std::size_t minSamples=0, hdbscan::ClusterSelectionMethod method=hdbscan::ClusterSelectionMethod::kEom, std::size_t nJobs=0, hdbscan::MinSamplesConvention convention=hdbscan::MinSamplesConvention::kSklearn) |
| | Construct a reusable HDBSCAN fitter.
|
| | HDBSCAN (const HDBSCAN &)=delete |
| HDBSCAN & | operator= (const HDBSCAN &)=delete |
| | HDBSCAN (HDBSCAN &&)=delete |
| HDBSCAN & | operator= (HDBSCAN &&)=delete |
| | ~HDBSCAN ()=default |
| void | run (const NDArray< T, 2 > &X) |
| | Fit to X.
|
| const NDArray< std::int32_t, 1 > & | labels () const noexcept |
| | Length-n assignment; -1 marks noise.
|
| const NDArray< T, 1 > & | outlierScores () const noexcept |
| | Length-n per-point GLOSH outlier scores in [0, 1].
|
| std::size_t | nClusters () const noexcept |
| | Total number of clusters discovered by the most recent run, or 0 if no fit has produced a result yet.
|
| CondensedTreeView | condensedTree () const noexcept |
| | Borrowed view over the condensed tree from the most recent run, or an empty view if no fit has produced a result yet.
|
| void | reset () |
| | Release every scratch buffer. The next run call reallocates against its shape.
|
template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
requires hdbscan::MstBackendStrategy<MstBackend, T>
class clustering::HDBSCAN< T, MstBackend >
Hierarchical density-based clustering over mutual-reachability distances.
HDBSCAN* extends DBSCAN with a hierarchical condensation step that auto-selects density thresholds, produces per-cluster stability, and yields GLOSH outlier scores as a byproduct. The MST boundary is the only template axis; everything downstream (condensed tree, cluster extraction, outlier scoring) is monomorphic. The default MstBackend is hdbscan::AutoMstBackend, which dispatches between Prim, Boruvka, and NN-Descent on the input shape; callers who want to pin a specific backend may supply it as the second template argument.
- Note
HDBSCAN does NOT own X. The caller must keep the NDArray alive for the duration of every run call. Data-dependent indices (KDTree, kNN graph) are rebuilt on every fit so in-place buffer mutations through a borrowed view can never produce a silent cache miss. Shape-indexed scratch (heaps, reusable buffers) may be amortized at fixed (n, d, minSamples); a shape change rebuilds. reset returns the instance to fresh-constructed state.
-
On a freshly-constructed or just- reset instance, all result accessors return empty values (an empty label array, empty outlier-score array, zero cluster count, and an empty condensed-tree view).
-
Labels and outlier scores follow the Campello 2015 formula over Euclidean mutual-reachability distances, matching the reference implementation.
-
The
minSamples argument is interpreted per clustering::hdbscan::MinSamplesConvention. The default (kSklearn) treats the query point itself as one of the minSamples neighbours; pass kCampello to count non-self neighbours only. The two conventions differ by one neighbour and produce different MSTs on high-dimensional inputs.
- Thread safety
- A single
HDBSCAN instance is not safe to drive concurrently; run mutates internal state. Separate instances on distinct inputs are safe when each instance spawns its own internal pool (the default). The internal pool obeys a no-nested-dispatch invariant: worker tasks never re-submit to the pool.
- Template Parameters
-
Definition at line 109 of file hdbscan.h.