The Impact of Record Linkage on Learning from Feature Partitioned Data

Abstract: There has been recently a significant boost to machine learning with distributed data, in particular with the success of federated learning. A common and very challenging setting is that of vertical or feature partitioned data, when multiple data providers hold different features about common entities. In general, training needs to be preceded by record linkage (RL), a step that finds the correspondence between the observations of the datasets. RL is prone to mistakes in the real world. Despite the importance of the problem, there has been so far no formal assessment of the way in which RL errors impact learning models. Work in the area either use heuristics or assume that the optimal RL is known in advance. In this paper, we provide the first assessment of the problem for supervised learning. For wide sets of losses, we provide technical conditions under which the classifier learned after noisy RL converges (with the data size) to the best classifier that would be learned from mistake-free RL. This yields new insights on the way the pipeline RL + ML operates, from the role of large margin classification on dampening the impact of RL mistakes to clues on how to further optimize RL as a preprocessing step to ML. Experiments on a large UCI benchmark validate those formal observations.

03/05/2021

Christian Henning, Maria Cervera, Francesco D'Angelo and
Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento

The Impact of Record Linkage on Learning from Feature Partitioned Data

Richard Nock, Stephen J Hardy, Wilko Henecka, Hamish Ivey-Law, Jakub Nabaglo, Giorgio Patrini, Guillaume Smith, Brian Thorne

Comments

Similar Papers

Theoretical bounds on estimation error for meta-learning

James Lucas, Mengye Ren, Irene Raissa KAMENI KAMENI and Toniann Pitassi, Richard Zemel

Keywords Abstract Paper

meta learning, minimax risk, few-shot, lower bounds, learning theory

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Abstract Paper

Self-Damaging Contrastive Learning

Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Keywords Abstract Paper

Algorithms, Unsupervised Learning

Good classifiers are abundant in the interpolating regime

Ryan Theisen, Jason Klusowski, Michael Mahoney

Keywords Abstract Paper

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Dibya Ghosh, Jad Rahme, Aviral Kumar and Amy Zhang, Ryan Adams, Sergey Levine

Keywords Abstract Paper

reinforcement learning and planning

Towards Sample-efficient Overparameterized Meta-learning

Yue Sun, Adhyyan Narang, Ibrahim Gulluk and Samet Oymak, Maryam Fazel

Keywords Abstract Paper

theory, machine learning, meta learning, representation learning, few shot learning

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Abstract Paper

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

Zhuang Liu, Xuanlin Li, Bingyi Kang, trevor darrell

Keywords Abstract Paper

Deep Reinforcement Learning, Regularization, Continuous Control, Policy Optimization

Meta Label Correction for Noisy Label Learning

Guoqing Zheng, Ahmed Hassan Awadallah, Susan Dumais

Keywords Abstract Paper

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro and Aaron Courville, Marc Bellemare

Keywords Abstract Paper

reinforcement learning and planning

Semi-supervised Sequence Classification through Change Point Detection

Nauman Ahad, Mark A. Davenport

Keywords Abstract Paper

Data Valuation using Reinforcement Learning

Jinsung Yoon, Sercan Arik, Tomas Pfister

Keywords Abstract Paper

Deep Learning - Algorithms

Robust Learning from Discriminative Feature Feedback

Sanjoy Dasgupta, Sivan Sabato

Keywords Abstract Paper

Large deviations for the perceptron model and consequences for active learning

Hugo Cui, Luca Saglietti, Lenka Zdeborova

Keywords Abstract Paper

Posterior Meta-Replay for Continual Learning

Christian Henning, Maria Cervera, Francesco D'Angelo and Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento

Keywords Abstract Paper

deep learning, continual learning

The Non-IID Data Quagmire of Decentralized Machine Learning

Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip Gibbons

Keywords Abstract Paper

Optimization - Large Scale, Parallel and Distributed

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Abstract Paper

Theory, Statistical Learning Theory

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Abstract Paper

Adversarial Regression with Doubly Non-negative Weighting Matrices

Tam Le, Truyen Nguyen, Makoto Yamada and Jose Blanchet, Viet Anh Nguyen

Keywords Abstract Paper

machine learning

Disambiguation of Weak Supervision leading to Exponential Convergence rates

Vivien Cabannnes, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

Theory, Models of Learning and Generalization

Submodular Meta-Learning

Arman Adibi, Aryan Mokhtari, Hamed Hassani

Keywords Abstract Paper

Model Performance Scaling with Multiple Data Sources

James Lucas, Mengye Ren, Irene Raissa KAMENI KAMENI and
Toniann Pitassi, Richard Zemel

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dibya Ghosh, Jad Rahme, Aviral Kumar and
Amy Zhang, Ryan Adams, Sergey Levine

Keywords Paper

Yue Sun, Adhyyan Narang, Ibrahim Gulluk and
Samet Oymak, Maryam Fazel

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro and
Aaron Courville, Marc Bellemare

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Christian Henning, Maria Cervera, Francesco D'Angelo and
Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tam Le, Truyen Nguyen, Makoto Yamada and
Jose Blanchet, Viet Anh Nguyen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ragav Sachdeva, Filipe R. Cordeiro, Vasileios Belagiannis and
Ian Reid, Gustavo Carneiro

Keywords Paper

Mi Luo, Fei Chen, Dapeng Hu and
Yifan Zhang, Jian Liang, Jiashi Feng

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Keywords Paper

Keywords Paper

Giancarlo Kerg, bhargav104 Kanuparthi, Anirudh Goyal ALIAS PARTH GOYAL and
Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

Keywords Paper

Keywords Paper

Keywords Paper