Loading Now

Summary of Weakly-supervised Diagnosis Identification From Italian Discharge Letters, by Vittorio Torri et al.


Weakly-supervised diagnosis identification from Italian discharge letters

by Vittorio Torri, Elisa Barbieri, Anna Cantarutti, Carlo Giaquinto, Francesca Ieva

First submitted to arxiv on: 19 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel weakly-supervised pipeline is proposed for recognizing diseases from Italian discharge letters, a classic document classification problem typically requiring supervised learning. The pipeline uses a fine-tuned version of the Italian Umberto model to extract diagnosis-related sentences and apply two-level clustering. Weak labels are generated by mapping clusters to targeted diseases, which are then used to train a BERT-based model for disease detection. A case study shows promising results, with an AUC of 77.7% and F1-Score of 75.1%, outperforming non-supervised methods and showing robustness to cluster selection.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research develops a way to identify diseases from Italian hospital discharge letters without needing labeled data. The method uses a special language processing pipeline that extracts important sentences, groups similar ones together, and then trains a model to recognize specific diseases. This approach can help doctors and researchers quickly analyze large amounts of clinical text without having to manually label each piece of information.

Keywords

» Artificial intelligence  » Auc  » Bert  » Classification  » Clustering  » F1 score  » Supervised