09/10/2024
Jamile Araújo, with supervision by Júlia Lins (Fiocruz Bahia)
A study by Fiocruz Bahia used text mining to extract unstructured data from a Long-COVID research conducted at a university hospital in São Paulo. The aim of the work is to contribute to a deeper understanding of this chronic condition and its implications for global health systems. The model developed has potential for application in other healthcare settings, supporting broader research efforts and clinical decision-making for patients with Long-COVID.
The work was authored by researcher Pilar Tavares Veras Florentino, from the Center for Data and Knowledge Integration for Health (Cidacs) at Fiocruz Bahia, and was coordinated by researchers Manoel Barral-Netto, from Fiocruz Bahia, and Soraya S. Smaili, from the Federal University of São Paulo (Unifesp). The article was published in the journal Nature Cell Death and Disease.
Long-COVID is characterized by the persistence of coronavirus symptoms for over a month, which still requires definitive clinical characterization. Its varied presentation in different populations and health systems poses significant challenges for understanding its manifestations and clinical implications.
For the study, the experts analyzed Electronic Health Records (EHR) and created a model that can be applied in other hospitals. The phonetic text clustering (PTC) method allows exploiting unstructured EHR data to unify different written forms of similar terms into a single phonemic representation.
A text mining workflow was built to extract structured medical information from clinical notes in Brazilian Portuguese. This method, together with the validated text tokens, could be used as a platform for future analyses of Long-COVID in hospitals that use different systems. The method was applied back to the training dataset (Sivep-Gripe), enriching the national database and resulting in more detailed clinical characterizations of Sars in Brazil over the last decade.
The researchers concluded that the model developed in the study has the potential for scalability and applicability in other healthcare settings, including areas with limited resource settings, thus supporting broader research efforts and informing clinical decision-making for patients with Long-COVID. They also point out that the method and modeling presented in the work and the use of data cohorts to predict and treat patients with the disease will be crucial, and more studies must be carried out to not only increase knowledge, but also develop the necessary care and rehabilitation methods, as well as the planning of the primary health system.