Babak Ehteshami Bejnordi

I am a research scientist at Qualcomm AI Research (Senior Staff and Manager). My primary research focus lies in the realm of efficient Deep Learning for Large Language Models (LLMs) and Computer Vision. My recent research works have been in the areas of Efficient LLM deployment, Efficient (Latent) Reasoning, Mixture of Experts, Multi-Task Learning, and Continual Learning. I am a manager and team lead with main focus on Efficient LLM Architectures at Qualcomm AI Research, Amsterdam. Previously, I was the organizer of the Qualcomm Innovation Fellowship Program in Europe between 2019 and 2023.

I obtained my PhD at the Diagnostic Image Analysis Group, Radboud University, the Netherlands, where I worked on the development of ML algorithms for breast cancer diagnostics. During my PhD, I also organized the CAMELYON16 challenge.

From Jun to Nov 2016, I was a visiting researcher at Harvard University, where I worked on applying deep learning to computational pathology, with a focus on tumor-associated stroma as a prognostic biomarker in breast cancer, in collaboration with researchers from Harvard, NIH, and Mayo Clinic.

Qualcomm AI Research, Amsterdam, The Netherlands

My Resume

Research updates:

01 May 2026: Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs, accepted at ICML'26
17 Mar 2026: We published the Qualcomm Technical Report for our Efficient Reasoning on the Edge project.
26 Jan 2026: KaVa: Latent Reasoning via Compressed KV-Cache Distillation, accepted at ICLR'26
02 Dec 2025: We demoed Efficient LLM reasoning at the edge, live this week at NeurIPS'25
23 May 2025: Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference got accepted to TMLR 2025
09 May 2025: I gave an invited talk on "Efficient Deployment of LLMs on Edge Devices" at GHOST Day, Poznan, Poland
10 Mar 2025: This week, I delivered invited talks on LLM Efficiency at Apple and Cisco
09 Dec 2024: We will be demoing our Cache-MoE running efficiently on a smartphone at NeurIPS'24
05 Dec 2024: We bring Mixture of Experts (MoE) to mobile devices with limited available DRAM
26 Sep 2024: Check out our NeurIPS'24 and BMVC'24 papers on Efficient MoE and Multi-task learning
21 Sep 2023: Check out our NeurIPS'23 paper: Scalarization for Multi-Task and Multi-Domain Learning at Scale

25 Jul 2023: We organized the Resource Efficient DL for CV workshop at ICCV'23

06 Apr 2023: I will be teaching at DeepLearn 2023 - 9th International School on Deep Learning in Bari, Italy

03 Jul 2022: Efficient video object detection paper accepted at ECCV22

13 Jul 2021: Keynote talk at the ELLIS PhD and Postdoc Summit (kick-off program).

10 Jul 2021: We open-sourced the code for FrameExit and SkipConvolutions.

06 Mar 2021: Two papers accepted at CVPR2021 FrameExit (Oral paper) and Skip-Convolutions.

22 Jun 2020: Check out my podcast interview with TWIML AI on Conditional Computation.

24 FEB 2020: Check out our CVPR 2020 Oral paper on channel gated networks for continual learning.

20 Dec 2019: My paper "Batch-shaping for learning conditional channel gated nets" is accepted at ICLR 2020.

02 Dec 2018: My work on real-time human pose estimation on mobile devices was demoed at NeurIPS.

22 Jun 2018: My interview with the Cancer Today Magazine is published.

16 Jun 2018: My latest work in collaboration with Harvard, NIH, and Mayo Clinic is published.

20 Dec 2017: I defended my PhD in public: video recording

12 Dec 2017: Follow the Altmetric Attention Score and Tweets for my Article in JAMA

12 Dec 2017: My paper is published in JAMA

07 Dec 2016: CAMELYON16 won the 2016 MedicalPhit Innovation Award

16 Nov 2016: I gave a talk on deep learning at the Broad Institute of MIT and Harvard

10 Oct 2016: CAMELYON16 was mentioned in the White House AI strategic planning report

15 Oct 2015: I started organizing the CAMELYON16 challenge.

Latest Research

View all →

ICML'26

Dirichlet-Prior Shaping

Guiding expert specialization in upcycled mixture-of-experts.

Paper →

Qualcomm Tech Report'26

Reasoning on the Edge

Reasoning in small LLMs using LoRA adapters, combined with supervised fine-tuning and RL-based Budget forcing.

Paper →

ICLR'26

Latent Reasoning

Distilling knowledge from a compressed KV-cache of a teacher into a latent-reasoning student.

Paper →

TMLR'25

Cache-MoE

Efficient Mixture-of-Experts for mobile devices with limited DRAM.

Paper →

NeurIPS '24

Refactor LLM into MoE

Refactorizing LLMs as router-decoupled mixture of experts with system co-design.

Paper →

ICML '24

LLM-to-SLM

Think Big, Generate Quick: LLM-to-SLM for fast autoregressive decoding.

Paper →

NeurIPS '23

Scalarization for MTL

Scalarization for Multi-Task and Multi-Domain Learning at scale.

Paper →

BMVC '24

InterroGate for MTL

Learning to share, specialize, and prune representations for Multi-task Learning.

Paper →