De-identification of Emergency Medical Records in French: Survey and Comparison of State-of-the-Art Automated Systems

Loick Bourdois; Marta Avalos; Gabrielle Chenais; Frantz Thiessard; Philippe Revel; Cedric Gil-Jardine; Emmanuel Lagarde

doi:10.32473/flairs.v34i1.128480

Autores/as

Loick Bourdois University of Bordeaux - BPH INSERM U1219
Marta Avalos University of Bordeaux - BPH INSERM U1219 - INRIA SISTM https://orcid.org/0000-0002-5471-2615
Gabrielle Chenais University of Bordeaux - BPH INSERM U1219
Frantz Thiessard University of Bordeaux - BPH INSERM U1219
Philippe Revel University of Bordeaux - BPH INSERM U1219 - University Hospital of Bordeaux, Pole of Emergency Medicine
Cedric Gil-Jardine University of Bordeaux - BPH INSERM U1219 - University Hospital of Bordeaux, Pole of Emergency Medicine
Emmanuel Lagarde University of Bordeaux - BPH INSERM U1219 https://orcid.org/0000-0001-8031-7400

DOI:

https://doi.org/10.32473/flairs.v34i1.128480

Palabras clave:

Healthcare Informatics, Applied Natural Language Processing, Protected health information, Low-resource languages, Pre-training, NER, Transformers, French, Emergency room, Injury epidemiology

Resumen

In France, structured data from emergency room (ER) visits are aggregated at the national level to build a syndromic surveillance system for several health events. For visits motivated by a traumatic event, information on the causes are stored in free-text clinical notes. To exploit these data, an automated de-identification system guaranteeing protection of privacy is required.
In this study we review available de-identification tools to de-identify free-text clinical documents in French. A key point is how to overcome the resource barrier that hampers NLP applications in languages other than English. We compare rule-based, named entity recognition, new Transformer-based deep learning and hybrid systems using, when required, a fine-tuning set of 30,000 unlabeled clinical notes. The evaluation is performed on a test set of 3,000 manually annotated notes.
Hybrid systems, combining capabilities in complementary tasks, show the best performance. This work is a first step in the foundation of a national surveillance system based on the exhaustive collection of ER visits reports for automated trauma monitoring.

De-identification of Emergency Medical Records in French: Survey and Comparison of State-of-the-Art Automated Systems

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Publicado

Cómo citar

Número

Sección

Desarrollado por

Enviar un artículo

Idioma