Summary of Weakly-supervised Diagnosis Identification From Italian Discharge Letters, by Vittorio Torri et al.
Weakly-supervised diagnosis identification from Italian discharge letters
by Vittorio Torri, Elisa Barbieri, Anna Cantarutti, Carlo Giaquinto, Francesca Ieva
First submitted to arxiv on: 19 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel weakly-supervised pipeline is proposed for recognizing diseases from Italian discharge letters, a classic document classification problem typically requiring supervised learning. The pipeline uses a fine-tuned version of the Italian Umberto model to extract diagnosis-related sentences and apply two-level clustering. Weak labels are generated by mapping clusters to targeted diseases, which are then used to train a BERT-based model for disease detection. A case study shows promising results, with an AUC of 77.7% and F1-Score of 75.1%, outperforming non-supervised methods and showing robustness to cluster selection. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research develops a way to identify diseases from Italian hospital discharge letters without needing labeled data. The method uses a special language processing pipeline that extracts important sentences, groups similar ones together, and then trains a model to recognize specific diseases. This approach can help doctors and researchers quickly analyze large amounts of clinical text without having to manually label each piece of information. |
Keywords
» Artificial intelligence » Auc » Bert » Classification » Clustering » F1 score » Supervised