Adriana Pagano, profesor de lingvistică aplicată la Federal University of Minas Gerais, Brazil, vizitează grupul de Tehnologia Limbajului din cadrul Institutului în decembrie 2025. Pe 2 decembrie va susține prezentarea cu titlul Multimodal and multilingual perspectivism – Exploring the Framed Multi30k (FM30k) multimodal-multilingual dataset.
Prezentarea va fi online. Participarea este deschisă oricui este interesat. Pentru a primi linkul de conectare, va rugăm să trimiteți un email Verginicăi Barbu Mititelu (adresa de email poate fi găsită aici).
Rezumat: This talk will present Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset (Viridiano et al., 2024) which i) extends the Multi30K dataset (Elliott et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 31,104 Brazilian Portuguese translations from original English descriptions; ii) adds 4,577,122 frame and frame element labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; and (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 169,560 frames and Frame Elements correlations with the existing phrase-to-region correlations. The dataset adds image annotation within FrameNet, thus departing from a three-decade tradition of text-only annotation, and augments Flick30k by adding FrameNet Labels to Flick30k entities. The dataset also increases the representation of Brazilian Portuguese in NLP and constitutes a rich resource for exploring multimodal and multilingual perspectivism. The talk will briefly present upcoming work with Framed Multi30K (FM30K) regarding MWEs annotation of both the original and the translated captions.
Bio: Adriana S. Pagano is Full Professor in Applied Linguistics at Universidade Federal de Minas Gerais, Brazil. She is a research fellow of CNPq (National Council for Scientific and Technological Development, Ministry of Science and Technology, Brazil) and FAPEMIG (Research Foundation of the State of Minas Gerais, Brazil). Her research interests include (i) language modelling from the perspective of systemic-functional linguistics; (ii) quantitative approaches to translation and multilingual textual production; and (iii) development of corpora and other resources for Natural Language Processing. She currently coordinates the project Algorithms for Fair Representation: Debiasing Large Language Models with Culturally-Diverse Datasets, funded by the Worldwide Universities Network (WUN RDF 2024), a joint project between UFMG, the University of Alberta, the University of Exeter, Mahidol University and Makerere University. She also coordinates the project Multimodal and Multilingual Perspectivisation for AI-Responsible Resources and Applications, funded by MCTI/CNPQ International Cooperation Projects (2025-2026), and the project Generative AI applied to Maternal and Child Health, funded by UFMG’s Centre for Innovation in Artificial Intelligence for Health (CI-IA Saúde). She participates in academic cooperation and research exchange agreements with Università di Torino (Italy), Univerzita Karlova (Czech Republic), University of Sydney (Australia), University of Exeter (United Kingdom), University of Ghana (Ghana) and Macau Polytechnic University (Macau). She is a researcher at Reinventa – Research and Innovation Network for Visual and Textual Analysis of Multimodal Objects (RED-00106-21) and is a member of the teams working on the projects Responsible and ethical applications of Artificial Intelligence in Public Health: AI-empowered Child and Maternal Health (Worldwide Universities Network RDF 2023) and Development and evaluation of an intelligent system for generating guidelines for safe prescription and accessible medication adapted to different cultural contexts (CNPq/Bill & Melinda Gates Foundation). ORCID 0000-0002-3150-3503