16/11/2020

An information theoretic view on selecting linguistic probes

Zining Zhu, Frank Rudzicz

Keywords: supervised classification, diagnostic classification, neural representations, diagnostic classifier

Abstract: There is increasing interest in assessing the linguistic knowledge encoded in neural representations. A popular approach is to attach a diagnostic classifier -- or ″probe″ -- to perform supervised classification from internal representations. However, how to select a good probe is in debate. Hewitt and Liang (2019) showed that a high performance on diagnostic classification itself is insufficient, because it can be attributed to either ″the representation being rich in knowledge″, or ″the probe learning the task″, which Pimentel et al. (2020) challenged. We show this dichotomy is valid information-theoretically. In addition, we find that the ″good probe″ criteria proposed by the two papers, *selectivity* (Hewitt and Liang, 2019) and *information gain* (Pimentel et al., 2020), are equivalent -- the errors of their approaches are identical (modulo irrelevant terms). Empirically, these two selection criteria lead to results that highly agree with each other.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers