Loading Now

Summary of Generalizable Facial Expression Recognition, by Yuhang Zhang et al.


Generalizable Facial Expression Recognition

by Yuhang Zhang, Xiuqi Zheng, Chenyi Liang, Jiani Hu, Weihong Deng

First submitted to arxiv on: 20 Aug 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel facial expression recognition (FER) pipeline is proposed to improve the zero-shot generalization ability of FER methods on unseen test sets. The approach extracts expression-related features from any given face images, inspired by how humans detect faces and then select expression features. This method builds upon large models like CLIP, which extract generalizable face features. To preserve the generalization ability of CLIP and the high precision of the FER model, sigmoid masks are learned based on fixed CLIP face features to extract expression features. The approach also separates channels of learned masked features according to expression classes to directly generate logits and avoid using a fully connected (FC) layer to reduce overfitting. Additionally, a channel-diverse loss is introduced to make the learned masks separated. Experimental results on five FER datasets demonstrate that this method outperforms state-of-the-art (SOTA) FER methods by large margins.
Low GrooveSquid.com (original content) Low Difficulty Summary
A new way of recognizing facial expressions has been developed, which can work without any training data from the test set it’s trying to recognize. This is important because most existing methods need to be trained on data from the test set they’re trying to use, which isn’t always possible. The method works by first extracting general features from face images using a large model called CLIP, and then learning how to use those features to recognize facial expressions. To make sure it doesn’t get too good at recognizing one type of expression over others, the approach separates the different types of expressions into separate channels. This helps the method avoid getting stuck in a rut and being able to recognize all types of expressions equally well.

Keywords

» Artificial intelligence  » Generalization  » Logits  » Overfitting  » Precision  » Sigmoid  » Zero shot