Publications

Physics-aware Multi-Object 3D Scene Reconstruction (in Progress)

TBD
TBD

Recently, research in 3D reconstruction shifts from achieving consistency in mere appearance and geometry to attaining physically plausible models of the scene or the object. For this problem, while test-time optimization approaches takes hours to optimize reasonble physical parameters of even a single object, the generalizability of feed-forward approaches is too limited. We are currently striving to overcome the shortcomings of both approaches and provide a simulation pipeline that is easily generalizable.

Website

WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts

Jun Seong Kim*, Kyaw Ye Thu*, Javad Ismayilzada, Junyeong Park, Eunsu Kim, Huzama Ahmad, Na Min An, James Thorne, Alice Oh
Workshop on Cross-Cultural Considerations in NLP (C3NLP) @ NAACL 2025

In a highly globalized world, it is important for multi-modal language models to correctly recognize visuals in mixed-cultural settings. This paper examines the robustness of MLLMs to mixed cultures by constructing MixCuBe, a cross-cultural awareness benchmark of images and evaluating SOTA MLLMs on it.

arXiv Outstanding Paper Award at C3NLP Workshop @ NAACL

Projects

The following are a few of demonstrable projects of mine listed in reverse-chronological order (i.e, the more recent, the better the quality). If interested in more of my projects, please check out my Github.

Research Projects

RenderFormer with Linear Attention [Poster]

Renderformer is an end-to-end fully data-driven transformer-based rendering pipeline developed by Microsoft. Due to its usage of vinalla transformer architecture, it has time complexity O(N^2). We tried to achieve linear time complexity by using a linear attention mechanism, namely Performer (FAVOR++).

Dynamic Brain Connectome Learning [Poster]

A set of graph machine learning architectures for learning temporal and spatial patterns of brain activation from fMRI images. Two downstream tasks are implemented: (1) brain activation prediction duration language tasks (link prediction) and (2) performance prediction from neural patterns (graph regression)

Advanced Passage Retrieval [Poster]

A NLP research into passage retrieval, the task of extracting top-k pertinent passages from a dataset as the output given a query as the input. We use the BM25 model as a benchmark to explore retrieval models that can achieve better accuracy specifically on the SQuAD1.1 dataset.

Kyaw Ye Thu

Publications

Physics-aware Multi-Object 3D Scene Reconstruction (in Progress)

WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts

Projects

Research Projects

RenderFormer with Linear Attention [Poster]

Dynamic Brain Connectome Learning [Poster]

Advanced Passage Retrieval [Poster]

Engineering Projects

Space Invader [Demo]

Burmese G2P

Third Eye [Demo]

PlayMaths [Demo]