Loading Now

Summary of Qalam : a Multimodal Llm For Arabic Optical Character and Handwriting Recognition, by Gagan Bhatia et al.


Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

by Gagan Bhatia, El Moatez Billah Nagoudi, Fakhraddin Alwajih, Muhammad Abdul-Mageed

First submitted to arxiv on: 18 Jul 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This study presents Qalam, a novel foundation model designed specifically for Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR). The model is built on a SwinV2 encoder and RoBERTa decoder architecture, outperforming existing methods with a Word Error Rate (WER) of 0.80% in HWR tasks and 1.18% in OCR tasks. Qalam is trained on a diverse dataset including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. The model demonstrates exceptional handling of Arabic diacritics, processing high-resolution inputs with remarkable accuracy.
Low GrooveSquid.com (original content) Low Difficulty Summary
Arabic writing has special challenges because the script is cursive and depends on context. This study creates Qalam, a new way to recognize written Arabic using computers. It’s better than previous methods, getting 0.80% wrong out of 100 in recognizing handwriting and 1.18% wrong in recognizing printed text. The model was trained on many images and texts, including old manuscripts. Qalam is good at understanding special marks called diacritics that are important for writing Arabic correctly.

Keywords

» Artificial intelligence  » Decoder  » Encoder