Clustering
C++20 header-only: DBSCAN, HDBSCAN, k-means.
Loading...
Searching...
No Matches
clustering::KMeans< T, Algo, Seeder > Class Template Reference

Lloyd-family k-means. More...

#include <clustering/kmeans.h>

Public Member Functions

 KMeans (std::size_t k, std::size_t nJobs=0)
 Construct a reusable k-means fitter.
 KMeans (const KMeans &)=delete
KMeansoperator= (const KMeans &)=delete
 KMeans (KMeans &&)=delete
KMeansoperator= (KMeans &&)=delete
 ~KMeans ()=default
void run (const NDArray< T, 2 > &X, std::size_t maxIter=300, T tol=T{1e-4}, std::uint64_t seed=0)
 Fit to X.
const NDArray< std::int32_t, 1 > & labels () const noexcept
 Length-n assignment; each entry is in [0, k).
const NDArray< T, 2, Layout::Contig > & centroids () const noexcept
 k x d fitted centroids.
double inertia () const noexcept
 Final inertia: Kahan-summed f64 total of per-point squared distance to assignment.
std::size_t nIter () const noexcept
 Iterations executed before tol or maxIter fired.
bool converged () const noexcept
 True iff the last run stopped because centroid shift fell at or below tol.
void reset ()
 Release every scratch buffer. The next run call reallocates against its shape.

Detailed Description

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
requires kmeans::LloydStrategy<Algo, T> && kmeans::SeederStrategy<Seeder, T>
class clustering::KMeans< T, Algo, Seeder >

Lloyd-family k-means.

The algorithm and seeder are template parameters with concept constraints. The default instantiation carries LloydFusedGemm<T> and AutoSeeder<T>, the latter picking between greedy k-means++ and AFK-MC2 against workload shape at run time. Callers who want to pin a specific combination spell it out, e.g. KMeans<float, LloydFusedGemm<float>, AfkMc2Seeder<float>>.

Note
KMeans does NOT own X. The caller must keep the NDArray alive for the lifetime of every run call on this instance. An n_init > 1 harness constructs KMeans once and calls run repeatedly against the same X so policy scratch amortizes across runs at a fixed (n, d, k, nJobs) tuple.
Template Parameters
TElement type. Only float is supported; add a double specialization to extend.
AlgoLloyd driver satisfying clustering::kmeans::LloydStrategy.
SeederSeeder satisfying clustering::kmeans::SeederStrategy.

Definition at line 40 of file kmeans.h.

Constructor & Destructor Documentation

◆ KMeans() [1/3]

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
clustering::KMeans< T, Algo, Seeder >::KMeans ( std::size_t k,
std::size_t nJobs = 0 )
inlineexplicit

Construct a reusable k-means fitter.

Parameters
kNumber of clusters (>= 1).
nJobsWorker count for the internal thread pool. A value of 0 is clamped upward to std::thread::hardware_concurrency() so the pool is always usable by the math::Pool helpers.

Definition at line 53 of file kmeans.h.

◆ KMeans() [2/3]

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
clustering::KMeans< T, Algo, Seeder >::KMeans ( const KMeans< T, Algo, Seeder > & )
delete

◆ KMeans() [3/3]

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
clustering::KMeans< T, Algo, Seeder >::KMeans ( KMeans< T, Algo, Seeder > && )
delete

◆ ~KMeans()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
clustering::KMeans< T, Algo, Seeder >::~KMeans ( )
default

Member Function Documentation

◆ centroids()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
const NDArray< T, 2, Layout::Contig > & clustering::KMeans< T, Algo, Seeder >::centroids ( ) const
inlinenodiscardnoexcept

k x d fitted centroids.

Definition at line 110 of file kmeans.h.

◆ converged()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
bool clustering::KMeans< T, Algo, Seeder >::converged ( ) const
inlinenodiscardnoexcept

True iff the last run stopped because centroid shift fell at or below tol.

Definition at line 118 of file kmeans.h.

◆ inertia()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
double clustering::KMeans< T, Algo, Seeder >::inertia ( ) const
inlinenodiscardnoexcept

Final inertia: Kahan-summed f64 total of per-point squared distance to assignment.

Definition at line 114 of file kmeans.h.

◆ labels()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
const NDArray< std::int32_t, 1 > & clustering::KMeans< T, Algo, Seeder >::labels ( ) const
inlinenodiscardnoexcept

Length-n assignment; each entry is in [0, k).

Definition at line 108 of file kmeans.h.

◆ nIter()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
std::size_t clustering::KMeans< T, Algo, Seeder >::nIter ( ) const
inlinenodiscardnoexcept

Iterations executed before tol or maxIter fired.

Definition at line 116 of file kmeans.h.

◆ operator=() [1/2]

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
KMeans & clustering::KMeans< T, Algo, Seeder >::operator= ( const KMeans< T, Algo, Seeder > & )
delete

◆ operator=() [2/2]

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
KMeans & clustering::KMeans< T, Algo, Seeder >::operator= ( KMeans< T, Algo, Seeder > && )
delete

◆ reset()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
void clustering::KMeans< T, Algo, Seeder >::reset ( )
inline

Release every scratch buffer. The next run call reallocates against its shape.

Definition at line 121 of file kmeans.h.

◆ run()

template<class T, class Algo = kmeans::LloydFusedGemm<T>, class Seeder = kmeans::AutoSeeder<T>>
void clustering::KMeans< T, Algo, Seeder >::run ( const NDArray< T, 2 > & X,
std::size_t maxIter = 300,
T tol = T{1e-4},
std::uint64_t seed = 0 )
inline

Fit to X.

Parameters
XContiguous n x d dataset. The caller retains ownership; X must outlive this run call and every subsequent call that intends to reuse scratch.
maxIterIteration cap on the inner Lloyd loop.
tolConvergence tolerance relative to the mean column variance of X (sklearn convention). The effective sum-of-shift-squared threshold is tol * mean(var(X, axis=0)); iteration stops when the Kahan-summed per- centroid shift-squared falls at or below that threshold.
seedPRNG seed. Identical (seed, nJobs, X, maxIter, tol) produces bit-identical labels, centroids, and inertia at nJobs=1.
Warning
X must remain alive and unchanged for the full duration of this call.

Definition at line 82 of file kmeans.h.


The documentation for this class was generated from the following file: