菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-12
📄 Abstract - A Generic Framework for Fair Consensus Clustering in Streams

Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.

顶级标签: machine learning systems theory
详细标签: fair clustering streaming algorithms consensus clustering approximation algorithms k-median 或 搜索:

一种适用于数据流的公平共识聚类通用框架 / A Generic Framework for Fair Consensus Clustering in Streams


1️⃣ 一句话总结

这篇论文提出了首个适用于数据流场景的公平共识聚类算法框架,它能在内存有限的情况下,通过仅存储少量数据就高效地整合多个聚类结果,并保证公平性,同时该通用框架也适用于离线和更广泛的聚类问题。

源自 arXiv: 2602.11500