Loading Now

Summary of Test-time Assessment Of a Model’s Performance on Unseen Domains Via Optimal Transport, by Akshay Mehra et al.


Test-time Assessment of a Model’s Performance on Unseen Domains via Optimal Transport

by Akshay Mehra, Yunbei Zhang, Jihun Hamm

First submitted to arxiv on: 2 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This research paper proposes a novel metric for evaluating machine learning (ML) models on unseen domains at test-time. The existing metrics, which rely on in-distribution data, are inadequate as they do not account for the lack of labels in this setting. Our proposed Optimal Transport-based metric is highly correlated with the model’s performance on unseen domains and can be computed only using information available at test time, including model parameters, training data or its statistics, and unlabeled test data. This metric characterizes the model’s performance on unseen domains using a small amount of unlabeled data from these domains and data or statistics from the training domain(s). Through extensive empirical evaluation using standard benchmark datasets and their corruptions, we demonstrate the utility of our metric in estimating the model’s performance for various practical applications, such as selecting source data and architecture that leads to the best performance on unseen domains and predicting a deployed model’s performance at test time. Our results show that our metric achieves a significantly better correlation with the model’s performance compared to the popular prediction entropy-based metric.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about finding a way to check how well machine learning models work when they’re used on new data they’ve never seen before. Right now, we don’t have good ways to do this because it’s hard to tell how well a model will perform without knowing the answers to the problems it’s trying to solve. The researchers propose a new method that can help us figure out how well models will work on new data by looking at a small amount of new data and some information about where the model came from. They test this method with different types of data and show that it works better than other methods we’re currently using. This is important because it will help us make better decisions when we’re choosing which models to use for certain tasks.

Keywords

» Artificial intelligence  » Machine learning