Clustering
C++20 header-only: DBSCAN, HDBSCAN, k-means.
Loading...
Searching...
No Matches
clustering::kmeans::AfkMc2Seeder< T > Class Template Reference

AFK-MC2 seeder (Bachem, Lucic, Hassani, Krause, NeurIPS 2016). More...

#include <clustering/kmeans/policy/afkmc2_seeder.h>

Public Member Functions

 AfkMc2Seeder ()
void run (const NDArray< T, 2, Layout::Contig > &X, std::size_t k, std::uint64_t seed, math::Pool pool, NDArray< T, 2, Layout::Contig > &outCentroids)
 Seed k centroids from X into outCentroids.

Static Public Attributes

static constexpr std::size_t kFloor = 100
 Minimum k below which the AFK-MC2 chain's log-k bound is too loose to win.
static constexpr std::size_t chainLengthDefault = 200
 Default Markov-chain length per centroid pick.

Detailed Description

template<class T>
class clustering::kmeans::AfkMc2Seeder< T >

AFK-MC2 seeder (Bachem, Lucic, Hassani, Krause, NeurIPS 2016).

Sublinear-in-n MCMC approximation to k-means++: draws the first centroid uniformly, builds a length-n proposal distribution q(i) = 0.5 * D(x_i, c_1)^2 / sum_D2 + 0.5 * 1/n, and then for each remaining centroid runs a Markov chain of length m that accepts a proposal with probability min(1, proposed_weight / current_weight) where the weight is the squared distance to the current centroid set divided by the proposal density.

Chain execution is strictly serial and thread-unaware so the PRNG draw order is fixed by (seed, n, k, m) regardless of pool worker count. The preprocessing sweep fans out across pool; the per-thread work is pure read with no shared mutation.

Degenerate guard: when all points coincide with the first centroid (sum_D2 == 0) the proposal collapses to uniform q(i) = 1/n so the chain remains ergodic.

The chain's log-k approximation bound degrades at small k: below k = AfkMc2Seeder::kFloor the bound is too loose to beat greedy k-means++, and callers at that regime should pin GreedyKmppSeeder (directly or via AutoSeeder, which picks it by shape).

Template Parameters
TElement type; float or double.

Definition at line 41 of file afkmc2_seeder.h.

Constructor & Destructor Documentation

◆ AfkMc2Seeder()

template<class T>
clustering::kmeans::AfkMc2Seeder< T >::AfkMc2Seeder ( )
inline

Definition at line 75 of file afkmc2_seeder.h.

Member Function Documentation

◆ run()

template<class T>
void clustering::kmeans::AfkMc2Seeder< T >::run ( const NDArray< T, 2, Layout::Contig > & X,
std::size_t k,
std::uint64_t seed,
math::Pool pool,
NDArray< T, 2, Layout::Contig > & outCentroids )
inline

Seed k centroids from X into outCentroids.

Parameters
XData matrix (n x d), contiguous.
kNumber of centroids to seed.
seedRNG seed; identical seed + (X, k) produces identical centroids.
poolParallelism injection (preprocessing sweep only).
outCentroidsOutput centroid matrix (k x d), contiguous; populated in row order.

Definition at line 86 of file afkmc2_seeder.h.

Member Data Documentation

◆ chainLengthDefault

template<class T>
std::size_t clustering::kmeans::AfkMc2Seeder< T >::chainLengthDefault = 200
staticconstexpr

Default Markov-chain length per centroid pick.

Definition at line 72 of file afkmc2_seeder.h.

◆ kFloor

template<class T>
std::size_t clustering::kmeans::AfkMc2Seeder< T >::kFloor = 100
staticconstexpr

Minimum k below which the AFK-MC2 chain's log-k bound is too loose to win.

Definition at line 57 of file afkmc2_seeder.h.


The documentation for this class was generated from the following file: