Active Learning on Attributed Graphs via Graph Cognizant Logistic Regression and Preemptive Query Generation

Abstract: Node classification in attributed graphs is an important task in multiple practical settings, but it can often be difficult or expensive to obtain labels. Active learning can improve the achieved classification performance for a given budget on the number of queried labels. The best existing methods are based on graph neural networks, but they often perform poorly unless a sizeable validation set of labelled nodes is available in order to choose good hyperparameters. We propose a novel graph-based active learning algorithm for the task of node classification in attributed graphs. Our algorithm uses graph cognizant logistic regression, equivalent to a linearized graph-convolutional neural network (GCNN), for the prediction phase and maximizes the expected error reduction in the query phase. To reduce the delayexperienced by a labeller interacting with the system, we derive a preemptive querying system that calculates a new query during the labelling process. To address the setting where learning starts with almost no labelled data, we also develop a hybrid algorithm that performs adaptive model averaging of label propagation and linearized GCNN inference. We conduct experiments on four public benchmark datasets, demonstrating a significant improvement over state-of-the-art approaches. We illustrate the practical value of the method by applying it to a private commercial dataset that is used for the task of identifying faulty links in a microwave link network.