WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts
In a highly globalized world, it is important for multi-modal language models to correctly recognize visuals in mixed-cultural settings. This paper examines the robustness of MLLMs to mixed cultures by constructing MixCuBe, a cross-cultural awareness benchmark of images and evaluating SOTA MLLMs on it.