Hierarchical density-based clustering over mutual-reachability distances. More...

#include <clustering/hdbscan.h>

Classes
struct	CondensedTreeView
	Read-only view over the condensed-tree result. More...

Public Member Functions
	HDBSCAN (std::size_t minClusterSize, std::size_t minSamples=0, hdbscan::ClusterSelectionMethod method=hdbscan::ClusterSelectionMethod::kEom, std::size_t nJobs=0, hdbscan::MinSamplesConvention convention=hdbscan::MinSamplesConvention::kSklearn)
	Construct a reusable HDBSCAN fitter.
	HDBSCAN (const HDBSCAN &)=delete
HDBSCAN &	operator= (const HDBSCAN &)=delete
	HDBSCAN (HDBSCAN &&)=delete
HDBSCAN &	operator= (HDBSCAN &&)=delete
	~HDBSCAN ()=default
void	run (const NDArray< T, 2 > &X)
	Fit to `X`.
const NDArray< std::int32_t, 1 > &	labels () const noexcept
	Length-n assignment; -1 marks noise.
const NDArray< T, 1 > &	outlierScores () const noexcept
	Length-n per-point GLOSH outlier scores in [0, 1].
std::size_t	nClusters () const noexcept
	Total number of clusters discovered by the most recent run, or `0` if no fit has produced a result yet.
CondensedTreeView	condensedTree () const noexcept
	Borrowed view over the condensed tree from the most recent run, or an empty view if no fit has produced a result yet.
void	reset ()
	Release every scratch buffer. The next run call reallocates against its shape.

Detailed Description

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>
requires hdbscan::MstBackendStrategy<MstBackend, T>
class clustering::HDBSCAN< T, MstBackend >

Hierarchical density-based clustering over mutual-reachability distances.

HDBSCAN* extends DBSCAN with a hierarchical condensation step that auto-selects density thresholds, produces per-cluster stability, and yields GLOSH outlier scores as a byproduct. The MST boundary is the only template axis; everything downstream (condensed tree, cluster extraction, outlier scoring) is monomorphic. The default MstBackend is hdbscan::AutoMstBackend, which dispatches between Prim, Boruvka, and NN-Descent on the input shape; callers who want to pin a specific backend may supply it as the second template argument.

Note: HDBSCAN does NOT own X. The caller must keep the NDArray alive for the duration of every run call. Data-dependent indices (KDTree, kNN graph) are rebuilt on every fit so in-place buffer mutations through a borrowed view can never produce a silent cache miss. Shape-indexed scratch (heaps, reusable buffers) may be amortized at fixed (n, d, minSamples); a shape change rebuilds. reset returns the instance to fresh-constructed state.; On a freshly-constructed or just- reset instance, all result accessors return empty values (an empty label array, empty outlier-score array, zero cluster count, and an empty condensed-tree view).; Labels and outlier scores follow the Campello 2015 formula over Euclidean mutual-reachability distances, matching the reference implementation.; The minSamples argument is interpreted per clustering::hdbscan::MinSamplesConvention. The default (kSklearn) treats the query point itself as one of the minSamples neighbours; pass kCampello to count non-self neighbours only. The two conventions differ by one neighbour and produce different MSTs on high-dimensional inputs.

Thread safety: A single HDBSCAN instance is not safe to drive concurrently; run mutates internal state. Separate instances on distinct inputs are safe when each instance spawns its own internal pool (the default). The internal pool obeys a no-nested-dispatch invariant: worker tasks never re-submit to the pool.

Template Parameters

T	Element type. Only `float` is supported in this class; a `double` specialization is out of scope.
MstBackend	Backend satisfying clustering::hdbscan::MstBackendStrategy. Defaults to clustering::hdbscan::AutoMstBackend which picks Prim, Boruvka, or NN-Descent on input shape.

Definition at line 109 of file hdbscan.h.

Constructor & Destructor Documentation

◆ HDBSCAN() [1/3]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

clustering::HDBSCAN< T, MstBackend >::HDBSCAN	(	std::size_t	minClusterSize,
		std::size_t	minSamples = 0,
		hdbscan::ClusterSelectionMethod	method = hdbscan::ClusterSelectionMethod::kEom,
		std::size_t	nJobs = 0,
		hdbscan::MinSamplesConvention	convention = hdbscan::MinSamplesConvention::kSklearn )

inlineexplicit

Construct a reusable HDBSCAN fitter.

Parameters

minClusterSize	The smallest allowable cluster; must be at least 2.
minSamples	Neighbour count used to compute core distances, interpreted per `convention`. A value of `0` is a sentinel meaning "resolve to @c minClusterSize at fit time"; the fit entry asserts the resolved value is valid for the active convention and strictly less than `N`.
method	Cluster selection method; defaults to excess-of-mass.
nJobs	Worker count for the internal thread pool. A value of `0` is clamped upward to `std::thread::hardware_concurrency()`.
convention	Interpretation of `minSamples` at core-distance extraction. See `clustering::hdbscan::MinSamplesConvention`.

Definition at line 152 of file hdbscan.h.

◆ HDBSCAN() [2/3]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

clustering::HDBSCAN< T, MstBackend >::HDBSCAN ( const HDBSCAN< T, MstBackend > & )

delete

◆ HDBSCAN() [3/3]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

clustering::HDBSCAN< T, MstBackend >::HDBSCAN ( HDBSCAN< T, MstBackend > && )

delete

◆ ~HDBSCAN()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

clustering::HDBSCAN< T, MstBackend >::~HDBSCAN ( )

default

Member Function Documentation

◆ condensedTree()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

CondensedTreeView clustering::HDBSCAN< T, MstBackend >::condensedTree ( ) const

inlinenodiscardnoexcept

Borrowed view over the condensed tree from the most recent run, or an empty view if no fit has produced a result yet.

Definition at line 291 of file hdbscan.h.

◆ labels()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

const NDArray< std::int32_t, 1 > & clustering::HDBSCAN< T, MstBackend >::labels ( ) const

inlinenodiscardnoexcept

Length-n assignment; -1 marks noise.

Empty on a freshly-constructed or just- reset instance.

Definition at line 279 of file hdbscan.h.

◆ nClusters()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

std::size_t clustering::HDBSCAN< T, MstBackend >::nClusters ( ) const

inlinenodiscardnoexcept

Total number of clusters discovered by the most recent run, or 0 if no fit has produced a result yet.

Definition at line 287 of file hdbscan.h.

◆ operator=() [1/2]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

HDBSCAN & clustering::HDBSCAN< T, MstBackend >::operator= ( const HDBSCAN< T, MstBackend > & )

delete

◆ operator=() [2/2]

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

HDBSCAN & clustering::HDBSCAN< T, MstBackend >::operator= ( HDBSCAN< T, MstBackend > && )

delete

◆ outlierScores()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

const NDArray< T, 1 > & clustering::HDBSCAN< T, MstBackend >::outlierScores ( ) const

inlinenodiscardnoexcept

Length-n per-point GLOSH outlier scores in [0, 1].

Empty on a freshly-constructed or just- reset instance.

Definition at line 283 of file hdbscan.h.

◆ reset()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

void clustering::HDBSCAN< T, MstBackend >::reset ( )

inline

Release every scratch buffer. The next run call reallocates against its shape.

Definition at line 301 of file hdbscan.h.

◆ run()

template<class T, class MstBackend = hdbscan::AutoMstBackend<T>>

void clustering::HDBSCAN< T, MstBackend >::run ( const NDArray< T, 2 > & X )

inline

Fit to X.

Every precondition fires a CLUSTERING_ALWAYS_ASSERT before any work begins so failures surface at the call site regardless of build configuration.

Parameters

X	Contiguous n x d dataset. The caller retains ownership; `X` must outlive this `run` call.

Warning: X must remain alive and unchanged for the full duration of this call.

Definition at line 183 of file hdbscan.h.

The documentation for this class was generated from the following file:

include/clustering/hdbscan.h

Classes

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ HDBSCAN() [1/3]

◆ HDBSCAN() [2/3]

◆ HDBSCAN() [3/3]

◆ ~HDBSCAN()

Member Function Documentation

◆ condensedTree()

◆ labels()

◆ nClusters()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ outlierScores()

◆ reset()

◆ run()