Skip to main content

Enabling Frugal Evaluations of Vision and Language Models

Riccardo Fogliato Headshot

Riccardo Fogliato, Applied Scientist at AWS

Abstract: 

In this talk, I will cover evaluations of machine learning models under data constraints, whether due to limited annotation budgets or the difficulty of collecting additional labels. First, I will describe a sampling- and estimation-based workflow that tailors data collection to model predictions, leveraging stratified sampling and difference estimators to reduce labeling needs without compromising the reliability of the estimates. I will then cover an empirical Bayes method for subgroup evaluation, which integrates small-sample direct estimates with regression-based predictions to produce precise performance metrics. Throughout the talk, I will highlight how these simple approaches are applied at industry scale for efficient evaluations of vision and language models.

 

Riccardo Fogliato is an applied scientist in Amazon Web Service's Responsible AI team. He works on evaluating vision and language models, developing statistical methods for scalable assessments. He holds a PhD in Statistics from Carnegie Mellon University.


Room
409