Clustering
C++20 header-only: DBSCAN, HDBSCAN, k-means.
Loading...
Searching...
No Matches
clustering::HDBSCAN< T, MstBackend > Class Template Reference

Hierarchical density-based clustering over mutual-reachability distances. More...

#include <clustering/hdbscan.h>

Classes

struct  CondensedTreeView
 Read-only view over the condensed-tree result. More...

Public Member Functions

 HDBSCAN (std::size_t minClusterSize, std::size_t minSamples=0, hdbscan::ClusterSelectionMethod method=hdbscan::ClusterSelectionMethod::kEom, std::size_t nJobs=0, hdbscan::MinSamplesConvention convention=hdbscan::MinSamplesConvention::kSklearn)
 Construct a reusable HDBSCAN fitter.
 HDBSCAN (const HDBSCAN &)=delete
HDBSCANoperator= (const HDBSCAN &)=delete
 HDBSCAN (HDBSCAN &&)=delete
HDBSCANoperator= (HDBSCAN &&)=delete
 ~HDBSCAN ()=default
void run (const NDArray< T, 2 > &X)
 Fit to X.
const NDArray< std::int32_t, 1 > & labels () const noexcept
 Length-n assignment; -1 marks noise.
const NDArray< T, 1 > & outlierScores () const noexcept
 Length-n per-point GLOSH outlier scores in [0, 1].
std::size_t nClusters () const noexcept
 Total number of clusters discovered by the most recent run, or 0 if no fit has produced a result yet.
CondensedTreeView condensedTree () const noexcept
 Borrowed view over the condensed tree from the most recent run, or an empty view if no fit has produced a result yet.
void reset ()
 Release every scratch buffer. The next run call reallocates against its shape.

Detailed Description

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
requires hdbscan::MstBackendStrategy<MstBackend, T>
class clustering::HDBSCAN< T, MstBackend >

Hierarchical density-based clustering over mutual-reachability distances.

HDBSCAN* extends DBSCAN with a hierarchical condensation step that auto-selects density thresholds, produces per-cluster stability, and yields GLOSH outlier scores as a byproduct. The MST boundary is the only template axis; everything downstream (condensed tree, cluster extraction, outlier scoring) is monomorphic. The default MstBackend is hdbscan::AutoMstBackend, which dispatches between Prim, Boruvka, and NN-Descent on the input shape; callers who want to pin a specific backend may supply it as the second template argument.

Note
HDBSCAN does NOT own X. The caller must keep the NDArray alive for the duration of every run call. Data-dependent indices (KDTree, kNN graph) are rebuilt on every fit so in-place buffer mutations through a borrowed view can never produce a silent cache miss. Shape-indexed scratch (heaps, reusable buffers) may be amortized at fixed (n, d, minSamples); a shape change rebuilds. reset returns the instance to fresh-constructed state.
On a freshly-constructed or just- reset instance, all result accessors return empty values (an empty label array, empty outlier-score array, zero cluster count, and an empty condensed-tree view).
Labels and outlier scores follow the Campello 2015 formula over Euclidean mutual-reachability distances, matching the reference implementation.
The minSamples argument is interpreted per clustering::hdbscan::MinSamplesConvention. The default (kSklearn) treats the query point itself as one of the minSamples neighbours; pass kCampello to count non-self neighbours only. The two conventions differ by one neighbour and produce different MSTs on high-dimensional inputs.
Thread safety
A single HDBSCAN instance is not safe to drive concurrently; run mutates internal state. Separate instances on distinct inputs are safe when each instance spawns its own internal pool (the default). The internal pool obeys a no-nested-dispatch invariant: worker tasks never re-submit to the pool.
Template Parameters
TElement type. Only float is supported in this class; a double specialization is out of scope.
MstBackendBackend satisfying clustering::hdbscan::MstBackendStrategy. Defaults to clustering::hdbscan::AutoMstBackend which picks Prim, Boruvka, or NN-Descent on input shape.

Definition at line 109 of file hdbscan.h.

Constructor & Destructor Documentation

◆ HDBSCAN() [1/3]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
clustering::HDBSCAN< T, MstBackend >::HDBSCAN ( std::size_t minClusterSize,
std::size_t minSamples = 0,
hdbscan::ClusterSelectionMethod method = hdbscan::ClusterSelectionMethod::kEom,
std::size_t nJobs = 0,
hdbscan::MinSamplesConvention convention = hdbscan::MinSamplesConvention::kSklearn )
inlineexplicit

Construct a reusable HDBSCAN fitter.

Parameters
minClusterSizeThe smallest allowable cluster; must be at least 2.
minSamplesNeighbour count used to compute core distances, interpreted per convention. A value of 0 is a sentinel meaning "resolve to @c minClusterSize at fit time"; the fit entry asserts the resolved value is valid for the active convention and strictly less than N.
methodCluster selection method; defaults to excess-of-mass.
nJobsWorker count for the internal thread pool. A value of 0 is clamped upward to std::thread::hardware_concurrency().
conventionInterpretation of minSamples at core-distance extraction. See clustering::hdbscan::MinSamplesConvention.

Definition at line 152 of file hdbscan.h.

◆ HDBSCAN() [2/3]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
clustering::HDBSCAN< T, MstBackend >::HDBSCAN ( const HDBSCAN< T, MstBackend > & )
delete

◆ HDBSCAN() [3/3]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
clustering::HDBSCAN< T, MstBackend >::HDBSCAN ( HDBSCAN< T, MstBackend > && )
delete

◆ ~HDBSCAN()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
clustering::HDBSCAN< T, MstBackend >::~HDBSCAN ( )
default

Member Function Documentation

◆ condensedTree()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
CondensedTreeView clustering::HDBSCAN< T, MstBackend >::condensedTree ( ) const
inlinenodiscardnoexcept

Borrowed view over the condensed tree from the most recent run, or an empty view if no fit has produced a result yet.

Definition at line 291 of file hdbscan.h.

◆ labels()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
const NDArray< std::int32_t, 1 > & clustering::HDBSCAN< T, MstBackend >::labels ( ) const
inlinenodiscardnoexcept

Length-n assignment; -1 marks noise.

Empty on a freshly-constructed or just- reset instance.

Definition at line 279 of file hdbscan.h.

◆ nClusters()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
std::size_t clustering::HDBSCAN< T, MstBackend >::nClusters ( ) const
inlinenodiscardnoexcept

Total number of clusters discovered by the most recent run, or 0 if no fit has produced a result yet.

Definition at line 287 of file hdbscan.h.

◆ operator=() [1/2]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
HDBSCAN & clustering::HDBSCAN< T, MstBackend >::operator= ( const HDBSCAN< T, MstBackend > & )
delete

◆ operator=() [2/2]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
HDBSCAN & clustering::HDBSCAN< T, MstBackend >::operator= ( HDBSCAN< T, MstBackend > && )
delete

◆ outlierScores()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
const NDArray< T, 1 > & clustering::HDBSCAN< T, MstBackend >::outlierScores ( ) const
inlinenodiscardnoexcept

Length-n per-point GLOSH outlier scores in [0, 1].

Empty on a freshly-constructed or just- reset instance.

Definition at line 283 of file hdbscan.h.

◆ reset()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
void clustering::HDBSCAN< T, MstBackend >::reset ( )
inline

Release every scratch buffer. The next run call reallocates against its shape.

Definition at line 301 of file hdbscan.h.

◆ run()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
void clustering::HDBSCAN< T, MstBackend >::run ( const NDArray< T, 2 > & X)
inline

Fit to X.

Every precondition fires a CLUSTERING_ALWAYS_ASSERT before any work begins so failures surface at the call site regardless of build configuration.

Parameters
XContiguous n x d dataset. The caller retains ownership; X must outlive this run call.
Warning
X must remain alive and unchanged for the full duration of this call.

Definition at line 183 of file hdbscan.h.


The documentation for this class was generated from the following file: