Summary of Not (yet) the Whole Story: Evaluating Visual Storytelling Requires More Than Measuring Coherence, Grounding, and Repetition, by Aditya K Surikuchi et al.

Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

by Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle

First submitted to arxiv on: 5 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper introduces a novel method to evaluate the quality of stories generated by models given temporally ordered image sequences. The proposed method focuses on three key aspects: visual grounding, coherence, and repetitiveness, which are crucial for human-like story understanding. The authors apply this method to several models, including LLaVA and TAPM, a smaller visual storytelling model. Surprisingly, the smaller model obtains competitive performance with significantly fewer parameters than LLaVA. To further improve performance, the authors upgrade the visual and language components of TAPM, achieving competitive results while reducing the number of parameters. The study concludes that a ‘good’ story may require more than just human-like levels of visual grounding, coherence, and repetition.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps us understand how to tell better stories using computers. Right now, it’s hard for computers to decide what makes a good story because there isn’t one agreed-upon way to measure it. The authors come up with a new method that looks at three important things: whether the images match the words, if the story makes sense, and if the story repeats itself in a meaningful way. They use this method to test several computer models that can create stories from pictures. They find that one model, called TAPM, does surprisingly well even though it’s much smaller than another popular model, LLaVA. By making some adjustments to TAPM, they’re able to get similar results with fewer calculations. Finally, the authors ask people to rate the stories and discover that there might be more to a good story than just making sure the images match the words.

Keywords

* Artificial intelligence * Grounding

Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

by Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Goalplace: Begin with the End in Mind, by Anthony Agnesina et al.

Summary of Remembering Everything Makes You Vulnerable: a Limelight on Machine Unlearning For Personalized Healthcare Sector, by Ahan Chatterjee et al.

Related Posts