Clustering
C++20 header-only: DBSCAN, HDBSCAN, k-means.
Loading...
Searching...
No Matches
clustering::index::NnDescentIndex< T > Class Template Reference

Approximate k -nearest-neighbor graph via the NN-Descent algorithm (Dong, Charikar, Li 2011) with random-projection-tree initialization. More...

#include <clustering/index/nn_descent.h>

Classes

struct  KnnEntry
 Per-node kNN entry returned by neighbors. Squared Euclidean distance carried as T. More...

Public Member Functions

 NnDescentIndex (std::size_t k, std::size_t maxIter=10, T delta=T{0.001}, std::uint64_t seed=0)
 Construct an index targeting k neighbors per node.
 NnDescentIndex (const NnDescentIndex &)=delete
NnDescentIndexoperator= (const NnDescentIndex &)=delete
 NnDescentIndex (NnDescentIndex &&)=default
 Defaulted move constructor; transfers the neighbor bank and scratch.
NnDescentIndexoperator= (NnDescentIndex &&)=default
 Defaulted move assignment; transfers the neighbor bank and scratch.
 ~NnDescentIndex ()=default
void build (const NDArray< T, 2 > &X, math::Pool pool)
 Build (or rebuild) the approximate kNN graph for X.
const std::vector< std::vector< KnnEntry > > & neighbors () const noexcept
 Per-node k -NN neighbor list, sorted ascending by squared distance.
bool isConnected () const
 Whether the undirected k-NN graph covers every node in a single connected component.
std::size_t k () const noexcept
 k specified at construction.
std::size_t lastIterations () const noexcept
 Number of join iterations actually executed during the most recent build.

Detailed Description

template<class T>
class clustering::index::NnDescentIndex< T >

Approximate k -nearest-neighbor graph via the NN-Descent algorithm (Dong, Charikar, Li 2011) with random-projection-tree initialization.

Maintains a per-node bounded max-heap of k current-best neighbors. Each iteration performs a "local join": for every node u the step takes u 's current neighbors and the neighbors-of-neighbors as candidates, computing squared Euclidean distances and admitting any candidate that beats the heap's current worst. Convergence is reached when the fraction of updated slots falls below delta.

Initialization via one random-projection tree seeds each node with coarse neighbors from the same tree leaf. Dong 2011 shows RP-tree initialization typically cuts iteration count by half compared to uniform-random seeding on real workloads; defaults here follow that pattern.

Warm start
Rebuilding on the same (data pointer, @c n, @c d, @c k) tuple reuses the existing neighbor graph as warm state. The join loop runs without RP-tree re-initialization, so the second build converges measurably faster (typically in a single iteration). A different k, a different input pointer, or a different shape forces a cold start (fresh bank + RP-tree init).
Thread safety
Not thread-safe: a single NnDescentIndex instance must not be driven concurrently. The internal join step uses a Pool injection to parallelise across nodes per build; no nested pool dispatch.
Template Parameters
TElement type; only float is currently supported.

Definition at line 47 of file nn_descent.h.

Constructor & Destructor Documentation

◆ NnDescentIndex() [1/3]

template<class T>
clustering::index::NnDescentIndex< T >::NnDescentIndex ( std::size_t k,
std::size_t maxIter = 10,
T delta = T{0.001},
std::uint64_t seed = 0 )
inlineexplicit

Construct an index targeting k neighbors per node.

Parameters
kNeighbors per node; must be at least 1. Zero-argument construction is rejected: k has no safe default and the consuming code (HDBSCAN's NN-Descent MST backend) always knows k at call time.
maxIterIteration cap on the join loop. A default of 10 gives ample headroom at the workload shapes this class targets; most inputs converge in three to five.
deltaConvergence fraction: terminate when the ratio of updated neighbor slots drops below this. Smaller values spend more iterations.
seedPRNG seed for RP-tree partition choices. Two builds with identical seed on identical input produce bit-for-bit identical neighbor graphs at single-threaded execution.

Definition at line 74 of file nn_descent.h.

◆ NnDescentIndex() [2/3]

template<class T>
clustering::index::NnDescentIndex< T >::NnDescentIndex ( const NnDescentIndex< T > & )
delete

◆ NnDescentIndex() [3/3]

template<class T>
clustering::index::NnDescentIndex< T >::NnDescentIndex ( NnDescentIndex< T > && )
default

Defaulted move constructor; transfers the neighbor bank and scratch.

◆ ~NnDescentIndex()

template<class T>
clustering::index::NnDescentIndex< T >::~NnDescentIndex ( )
default

Member Function Documentation

◆ build()

template<class T>
void clustering::index::NnDescentIndex< T >::build ( const NDArray< T, 2 > & X,
math::Pool pool )
inline

Build (or rebuild) the approximate kNN graph for X.

Runs RP-tree initialization followed by the NN-Descent join loop. Rebuilding on the same (data pointer, @c n, @c d, @c k) tuple reuses the existing neighbors as warm state; any change triggers a cold start.

Parameters
XRow-major n x d point matrix; must outlive the call but not the index.
poolParallelism injection for the inner join loop.

Definition at line 98 of file nn_descent.h.

◆ isConnected()

template<class T>
bool clustering::index::NnDescentIndex< T >::isConnected ( ) const
inlinenodiscard

Whether the undirected k-NN graph covers every node in a single connected component.

DFS over the union of forward and reverse edges from the published sorted view. Safe to call any time after build; returns true trivially for n <= 1.

Definition at line 171 of file nn_descent.h.

◆ k()

template<class T>
std::size_t clustering::index::NnDescentIndex< T >::k ( ) const
inlinenodiscardnoexcept

k specified at construction.

Definition at line 207 of file nn_descent.h.

◆ lastIterations()

template<class T>
std::size_t clustering::index::NnDescentIndex< T >::lastIterations ( ) const
inlinenodiscardnoexcept

Number of join iterations actually executed during the most recent build.

0 until the first build returns.

Definition at line 211 of file nn_descent.h.

◆ neighbors()

template<class T>
const std::vector< std::vector< KnnEntry > > & clustering::index::NnDescentIndex< T >::neighbors ( ) const
inlinenodiscardnoexcept

Per-node k -NN neighbor list, sorted ascending by squared distance.

Empty until the first build returns.

Definition at line 161 of file nn_descent.h.

◆ operator=() [1/2]

template<class T>
NnDescentIndex & clustering::index::NnDescentIndex< T >::operator= ( const NnDescentIndex< T > & )
delete

◆ operator=() [2/2]

template<class T>
NnDescentIndex & clustering::index::NnDescentIndex< T >::operator= ( NnDescentIndex< T > && )
default

Defaulted move assignment; transfers the neighbor bank and scratch.


The documentation for this class was generated from the following file: