August 1, 2025Four MMLL Papers Accepted at MICCAI and MIUA Conferences
B Bhattarai Multimodal Learning Lab (MMLL) specializes in advancing AI techniques that integrate heterogeneous data sources, including vision, text, and speech, to enable computers to understand, interpret, and reason across different modalities. MMLL has added four more papers to NAAMII's growing portfolio of accepted research, with publications at MICCAI 2025 and MIUA 2025. MICCAI is among the most competitive conferences in the field, with an acceptance rate hovering around 30%. This year, two of MMLL’s papers have been accepted at the conference, with one of them ranked in the top 9% of submissions, based on peer review scores.At MIUA, the UK’s premier venue for medical image analysis, one of MMLL’s papers, out of two, has been nominated for the Best Paper Award, placing it among the top few contributions at the conference.Presentation DatesMIUA 2025 15–17 July, University of Leeds (UK) MICCAI 2025 23–27 September, Daejeon Convention Center (South Korea) Paper 1: NERO: Explainable Out-of-Distribution Detection with Neuron-level RelevanceAnju Chhetri, Jari Korhonen, Prashnna Gyawali, Binod BhattaraiMICCAI 2025See full paper: arXivDeep learning models in medical imaging can fail silently when faced with unfamiliar or out-of-distribution (OOD) inputs, a critical concern in clinical settings. This research introduces NERO, a novel method for OOD detection that focuses on neuron-level relevance patterns rather than high-level features or logits. By clustering relevance maps for known classes and measuring how far a new sample deviates from these clusters, NERO not only improves detection accuracy but also offers explainable outputs. Tested on gastrointestinal datasets (Kvasir, GastroVision), NERO consistently outperformed existing methods across model architectures. Paper 2: NCDD: Nearest Centroid Distance Deficit for Out‑of‑Distribution Detection in Gastrointestinal Vision Sandesh Pokhrel, Sanjay Bhandari, Sharib Ali, Tryphon Lambrou, Anh Nguyen, Yash Raj Shrestha, Angus Watson, Danail Stoyanov, Prashnna Gyawali, Binod Bhattarai MIUA 2025 Best Paper AwardSee full paper: arXivReliable deep learning in medical imaging requires the ability to flag unfamiliar or anomalous inputs. This challenge is particularly acute in gastrointestinal imaging, where in-distribution and out-of-distribution (OOD) examples often share similar visual features. NCDD frames anomaly detection as an OOD problem and proposes a simple yet effective solution: compute how far a new sample’s feature representation deviates from its nearest class centroid. In-distribution samples cluster close to class centroids, while OOD samples tend to lie farther away. Evaluated on Kvasir2 and GastroVision datasets across different architectures, NCDD consistently outperformed state-of-the-art methods—demonstrating a more reliable way to flag anomalies in medical images. Paper 3: Multimodal Federated Learning With Missing Modalities through Feature Imputation Network Pranav Poudel, Aavash Chhetri, Prashnna Gyawali, Georgios Leontidis, Binod Bhattarai See full paper: arXivFederated learning enables multi-institutional collaboration without sharing raw data, but in healthcare settings, missing data modalities (like uncollected scans or tests) are common challenges. This paper introduces a lightweight feature imputation network to reconstruct missing modality data at the feature level instead of synthesizing raw inputs. Tested across three major chest X-ray datasets (MIMIC‑CXR, NIH Open‑I, CheXpert), both in uniform and varied data conditions, this method improved performance over standard baselines. The approach is efficient, preserves privacy, and supports real-world clinical AI deployment even when data is incomplete. Paper 4: Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision‑Language Models Bidur Khanal, Sandesh Pokharel, Sanjay Bhandari, Ramesh Rana, Nikesh Shrestha, Ram Bahadur Gurung, Cristian Linte, Angus Watson, Yash Raj Shrestha, Binod Bhattarai MICCAI 2025See full paper: arXivVision-Language Models (VLMs), designed to interpret medical images and generate clinical text, can sometimes produce descriptions that do not match the visual content, known as hallucinations. To address this in gastrointestinal (GI) image analysis, the researchers created Gut‑VLM, a dataset built in two stages: initially generating reports using ChatGPT for Kvasir‑v2 images (which may contain hallucinations), followed by expert review to correct and tag inaccuracies. Rather than solely fine‑tuning VLMs to generate descriptive reports, they propose hallucination‑aware fine‑tuning, training models to detect and correct hallucinations. This approach outperformed traditional report‑generation fine‑tuning, and the work establishes a new benchmark for evaluating VLM fidelity in GI image analysis. These acceptances reflect NAAMII’s continued focus on practical clinical challenges in medical AI: from out-of-distribution detection to hallucination-aware vision-language models and federated learning under real-world constraints. The work spans foundational methods and applied problems, with a shared aim of improving reliability and safety in healthcare AI systems.We’re proud of all the researchers and teams behind this work, for their rigor, creativity, and sustained effort. Your commitment continues to set the tone for what research at NAAMII stands for.Congratulations to everyone involved!