Abstract:
Robust and comprehensive evaluations of generative AI models play a critical role in ensuring safe, and beneficial technologies being deployed in society. These evaluations rely heavily on copious amounts of semi-structured data annotated by humans. Both the data and human perspectives involved in the process, thus play a key role in what is taken as ground truth by models, and who the models are eventually able to serve. Pervasive data skews lead to a lack of representation of global contexts and identities in models as well as evaluation strategies, with observable degradation of model utility in different contexts worldwide. Accounting for cross-cultural differences in interacting with technology is an important step for building and evaluating AI holistically. We will talk through different data driven strategies on broadening the scope of GenAI evaluations to be more competent in its handling of global perspectives and challenges.
Sunipa Dev is a researcher at Google, working at the intersection of language, society, and technology. Her research strives to ground evaluations of generative AI, especially language technologies in real world experiences of people, and foster inclusion of pluralistic, and cross-cultural perspectives into AI pipelines. Her contributions have been recognized with several prestigious awards, including an Outstanding Paper Award at ACL, the NSF CI Fellows Award and DAAD AINet Award, and she was named one of 100 Brilliant Women in AI Ethics in 2022.