My long-term research vision is to develop AI systems that physicians can genuinely trust and use in clinical practice. Current deep learning models often achieve high accuracy on benchmark datasets but fail to generalize robustly or explain their decisions in ways clinicians can interpret.
I am motivated by three core research questions: How can we build hybrid architectures—combining the local feature extraction of CNNs with the global context modeling of Vision Transformers—that perform reliably even under limited labeled data? How can explainability methods be meaningfully evaluated so that saliency maps actually correspond to clinically relevant regions? And how can multimodal learning across imaging modalities improve diagnostic accuracy for diseases where single-modality data is insufficient?
During my PhD, I intend to pursue these questions in the context of chest X-ray analysis, ultrasound imaging, and potentially retinal imaging—domains where AI has clear potential to augment clinical workflows and improve patient outcomes in resource-limited settings.