Summary of Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-language Models, by Shimin Chen et al.
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Modelsby Shimin Chen,…