Abstract:
Understanding the relationships between biomedical terms like viruses, drugs, and
symptoms is essential in the fight against diseases. Many attempts have been made
to introduce the use of machine learning to the scientific process of hypothesis
generation (HG), which refers to the discovery of meaningful implicit connections
between biomedical terms. However, most existing methods fail to truly capture
the temporal dynamics of scientific term relations and also assume unobserved
connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To
break these limits, we formulate this HG problem as future connectivity prediction
task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then,
the key is to capture the temporal evolution of node pair (term pair) relations
from just the positive and unlabeled data. We propose a variational inference
model to estimate the positive prior, and incorporate it in the learning of node
pair embeddings, which are then used for link prediction. Experiment results on
real-world biomedical term relationship datasets and case study analyses on a
COVID-19 dataset validate the effectiveness of the proposed model.