Abstract:
Several sampling algorithms with variance reduction have been
proposed for accelerating the training of Graph Convolution Networks (GCNs). However, due to the intractable computation of optimal sampling distribution,
these sampling algorithms are suboptimal for GCNs and are not
applicable to more general graph neural networks (GNNs) where
the message aggregator contains learned weights rather than
fixed weights, such as Graph Attention Networks (GAT).
The fundamental reason is that the embeddings of the neighbors or learned weights involved in the optimal sampling distribution
are \emph{changing} during the training and \emph{not known a priori},
but only \emph{partially observed} when sampled, thus
making the derivation of an optimal variance reduced samplers non-trivial.
In this paper, we formulate the optimization of the sampling
variance as an adversary bandit problem, where the rewards are related to the node embeddings and learned weights,
and can vary constantly. Thus a good sampler needs to acquire
variance information about more neighbors (exploration) while at the same time optimizing the immediate sampling variance (exploit). We theoretically
show that our algorithm asymptotically approaches the optimal variance within a factor of 3. We show the efficiency and effectiveness of our approach on multiple
datasets.