Abstract:
A big challenge existing in genetic functionality prediction is that genetic datasets comprise few samples but massive unclear structured features, i.e., ’large p, small N’ problem. To tackle this problem, we propose Non-local Self-attentive Autoencoder (NSAE) which applies attention-driven genetic variant modelling. The backbone attention layer captures long-range dependency relationship among cells (i.e., features) and thus allocates weights to construct attention maps based on cell significance. Utilizing attention maps, NSAE can effectively seize and leverage significant features in a non-local way from numerous cells. Our proposed NSAE outperforms the state-of-the-art algorithms on two genomics datasets from Roadmap projects. The visualization of the attention layer also validates NSAE’s ability to highlight important features.