Abstract:
Differential diagnostic systems provide a ranked list of highly prob-able diseases given a patient’s profile and symptoms. Evaluation of diagnostic algorithms in literature has been limited to a small set of hand-crafted patient vignettes. Testing with high coverage and gaining insights for improvements are challenging because of thesize and complexity of the knowledge base. Furthermore, scalable practical methodologies for evaluation and deployment of such systems are missing in the literature. Here, we address this challenge using a novel patient vignette simulation algorithm within an iterative clinician-in-the-loop methodology for semi-automatically evaluating and deploying medical diagnostic systems in production.We evaluate our algorithms and methodology through a case study of a real product and knowledge base curated by medical experts.We conduct multiple iterations of the methodology, report novel accuracy measures, and discuss insights from our experience in applying this method to production