Abstract:
The goal of text generation models is to fit the
underlying real probability distribution of text.
For performance evaluation, quality and diversity
metrics are usually applied. However, it is still
not clear to what extend can the quality-diversity
evaluation reflect the distribution-fitting goal. In
this paper, we try to reveal such relation in a
theoretical approach. We prove that under certain
conditions, a linear combination of quality and
diversity constitutes a divergence metric between
the generated distribution and the real distribution.
We also show that the commonly used BLEU/Self-BLEU metric pair fails to match any divergence
metric, thus propose CR/NRR as a substitute for
quality/diversity metric pair.