Abstract:
Sorting is an elementary building block of modern software. In machine learning and statistics, it is commonly used in robust statistics, order statistics and ranking metrics. However, sorting is a piecewise linear function and as a result includes many kinks at which it is non-differentiable. More problematic, the ranking operator is a piecewise constant function, meaning that its derivatives are null or undefined. While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one could expect from a sorting or ranking operation. In this paper, we propose the first differentiable sorting and ranking operators with $O(n \log n)$ time and $O(n)$ space complexity. Our proposal in addition enjoys exact computation and differentiation. We achieve this feat by casting differentiable sorting and ranking as projections onto a permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization. Empirically, we confirm that our approach is an order of magnitude faster than existing approaches. We also showcase two novel applications: differentiable Spearman's rank coefficient and differentiable least trimmed squares.