|
Clustering
C++20 header-only: DBSCAN, HDBSCAN, k-means.
|
Greedy k-means++ seeder. More...
#include <clustering/kmeans/policy/greedy_kmpp_seeder.h>
Public Member Functions | |
| GreedyKmppSeeder () | |
| void | run (const NDArray< T, 2, Layout::Contig > &X, std::size_t k, std::uint64_t seed, math::Pool pool, NDArray< T, 2, Layout::Contig > &outCentroids) |
Seed k centroids from X into outCentroids. | |
Greedy k-means++ seeder.
Picks k initial centroid rows from the dataset. The first centroid is drawn uniformly; each subsequent centroid is the best of L = 2 + floor(ln(k)) candidates sampled with probability proportional to D(x)^2 – the squared distance from each point to its nearest already-chosen centroid. The candidate that yields the smallest resulting sum of squared minimum distances wins.
Scratch is private: the candidate pack, the transposed candidate layout, the per-point per-candidate distance cache, the cumulative-distance array, and the per-point running min-squared-distance all live inside the policy. Repeated run calls at a stable (n, d, k) shape pay no reallocation.
| T | Element type; float or double. |
Definition at line 317 of file greedy_kmpp_seeder.h.
|
inline |
Definition at line 322 of file greedy_kmpp_seeder.h.
|
inline |
Seed k centroids from X into outCentroids.
| X | Data matrix (n x d), contiguous. |
| k | Number of centroids to seed (>= 1). |
| seed | RNG seed; identical seed + (X, k) produces identical centroids. |
| pool | Parallelism injection. Reserved for a future per-chunk fan-out of the scoring loop. |
| outCentroids | Output centroid matrix (k x d), contiguous; populated in row order. |
Definition at line 337 of file greedy_kmpp_seeder.h.