Abstract:
We present an end-to-end information retrieval system with domain-specific custom language models for accurate search terms expansion. The text mining pipeline tackles several challenges faced in an industry-setting, including multi-lingual jargon-rich unstructured text and privacy compliance. Combined with a novel statistical approach for word embedding evaluations, the models can be monitored in a production setting. Our approach is used in the real world in risk management in the financial sector and has wide applicability to other domains.