Abstract:
Looking for health information is one of the most popular activities online. However, the specificity of language on this domain is frequently an obstacle to comprehension, especially for the ones with lower levels of health literacy. For this reason, search engines should consider the readability of health content and, if possible, adapt it to the user behind the search. In this work, we explore methods to assess the readability of health content automatically. We propose features capable of measuring the specificity of a medical text and estimate the knowledge necessary to comprehend it. The features are based on information retrieval metrics and the log-likelihood of a text with lay and medico-scientific language models. To evaluate our methods, we built and used a dataset composed of health articles of Simple English Wikipedia and the respective documents in ordinary Wikipedia. We achieved a maximum accuracy of 88