Loading Now

Summary of X-portrait: Expressive Portrait Animation with Hierarchical Motion Attention, by You Xie et al.


X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

by You Xie, Hongyi Xu, Guoxian Song, Chao Wang, Yichun Shi, Linjie Luo

First submitted to arxiv on: 23 Mar 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes X-Portrait, a conditional diffusion model for generating expressive and temporally coherent portrait animation. Given a single portrait as appearance reference, the model animates it with motion derived from a driving video, capturing both dynamic and subtle facial expressions along with wide-range head movements. The core of the model is based on a pre-trained diffusion model’s generative prior, while fine-grained control is achieved through novel controlling signals within ControlNet. Unlike conventional explicit controls like facial landmarks, the motion control module learns to interpret dynamics directly from RGB inputs. Local control modules enhance motion accuracy by focusing on small-scale nuances like eyeball positions. To mitigate identity leakage, the model is trained with scaling-augmented cross-identity images. Experimental results show X-Portrait’s effectiveness across diverse portraits and driving sequences, generating captivating animations while maintaining identity characteristics.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper introduces a new way to create realistic animated portraits from single photos and videos of faces. The system can animate a face in various ways, such as smiling, raising an eyebrow, or tilting its head. It achieves this by using a combination of existing AI models and new techniques to control the animation. One key innovation is that it learns to interpret the motion directly from the video inputs, rather than relying on explicit facial landmarks. The system is designed to maintain the identity and character of the original face throughout the animation process. The results are impressive, showing the potential for creating realistic and engaging animated portraits.

Keywords

* Artificial intelligence  * Diffusion model