|
Clustering
C++20 header-only: DBSCAN, HDBSCAN, k-means.
|
AFK-MC2 seeder (Bachem, Lucic, Hassani, Krause, NeurIPS 2016). More...
#include <clustering/kmeans/policy/afkmc2_seeder.h>
Public Member Functions | |
| AfkMc2Seeder () | |
| void | run (const NDArray< T, 2, Layout::Contig > &X, std::size_t k, std::uint64_t seed, math::Pool pool, NDArray< T, 2, Layout::Contig > &outCentroids) |
Seed k centroids from X into outCentroids. | |
Static Public Attributes | |
| static constexpr std::size_t | kFloor = 100 |
Minimum k below which the AFK-MC2 chain's log-k bound is too loose to win. | |
| static constexpr std::size_t | chainLengthDefault = 200 |
| Default Markov-chain length per centroid pick. | |
AFK-MC2 seeder (Bachem, Lucic, Hassani, Krause, NeurIPS 2016).
Sublinear-in-n MCMC approximation to k-means++: draws the first centroid uniformly, builds a length-n proposal distribution q(i) = 0.5 * D(x_i, c_1)^2 / sum_D2 + 0.5 * 1/n, and then for each remaining centroid runs a Markov chain of length m that accepts a proposal with probability min(1, proposed_weight / current_weight) where the weight is the squared distance to the current centroid set divided by the proposal density.
Chain execution is strictly serial and thread-unaware so the PRNG draw order is fixed by (seed, n, k, m) regardless of pool worker count. The preprocessing sweep fans out across pool; the per-thread work is pure read with no shared mutation.
Degenerate guard: when all points coincide with the first centroid (sum_D2 == 0) the proposal collapses to uniform q(i) = 1/n so the chain remains ergodic.
The chain's log-k approximation bound degrades at small k: below k = AfkMc2Seeder::kFloor the bound is too loose to beat greedy k-means++, and callers at that regime should pin GreedyKmppSeeder (directly or via AutoSeeder, which picks it by shape).
| T | Element type; float or double. |
Definition at line 41 of file afkmc2_seeder.h.
|
inline |
Definition at line 75 of file afkmc2_seeder.h.
|
inline |
Seed k centroids from X into outCentroids.
| X | Data matrix (n x d), contiguous. |
| k | Number of centroids to seed. |
| seed | RNG seed; identical seed + (X, k) produces identical centroids. |
| pool | Parallelism injection (preprocessing sweep only). |
| outCentroids | Output centroid matrix (k x d), contiguous; populated in row order. |
Definition at line 86 of file afkmc2_seeder.h.
|
staticconstexpr |
Default Markov-chain length per centroid pick.
Definition at line 72 of file afkmc2_seeder.h.
|
staticconstexpr |
Minimum k below which the AFK-MC2 chain's log-k bound is too loose to win.
Definition at line 57 of file afkmc2_seeder.h.