Abstract:
Numerical vector aggregation has numerous applications in privacy-sensitive scenarios, such as distributed gradient estimation in federated learning, and statistical analysis on key-value data. Within the framework of local differential privacy, this work gives tight minimax error bounds of O(d s/(n epsilon^2)), where d is the dimension of the numerical vector and s is the number of non-zero entries. An attainable mechanism is then designed to improve from existing approaches suffering error rate of O(d^2/(n epsilon^2)) or O(d s^2/(n epsilon^2)). To break the error barrier in the local privacy, this work further consider privacy amplification in the shuffle model with anonymous channels, and shows the mechanism satisfies centralized (14 ln(2/delta) (s e^epsilon+2s-1)/(n-1))^0.5, delta)-differential privacy, which is domain independent and thus scales to federated learning of large models. We experimentally validate and compare it with existing approaches, and demonstrate its significant error reduction.