Babak Ehteshami Bejnordi

Research Scientist @ Qualcomm AI Research

Research Projects

kava

Reasoning on the Edge

Qualcomm AI Research | Tech Report'26

Reasoning in small LLMs using LoRA adapters, combined with supervised fine-tuning and RL-based Budget forcing.

LoRA RL for budget forcing Chain-of-thought Model switching Reasoning On-device LLM Efficiency
kava

LATENT REASONING

Published @ICLR'26

Distilling knowledge from a compressed KV-cache of a teacher into a latent-reasoning student.

Latent Reasoning KV-cache KV-cache distillation Chain-of-thought LLM Efficiency
Cache-MoE

Cache-MoE

Expo Demo @NeurIPS'24

Efficient Mixture-of-Experts for mobile devices with limited DRAM via expert caching.

MoE On-device Caching LLM Efficiency
READ-ME

Refactor LLM into MoE

Published @NeurIPS'24

Refactorizing LLMs as router-decoupled mixture of experts with system co-design.

MoE Batched-inference Dynamic sparsity Decoupled routing LLM Efficiency
LLM-to-SLM

LLM-to-SLM

Published @ICML'24: ES-FoMo II

Think Big, Generate Quick: LLM-to-SLM for fast autoregressive decoding.

Hybrid LLM Fast decoding LLM Efficiency LLM to SLM
InterroGate for MTL

InterroGate for MTL

Published @BMVC'24

Learning to share, specialize, and prune representations for Multi-task Learning.

Multi-task Learning Inference efficiency Gated Networks Channel sparsity
pbt

Scalarization for MTL

Published @NeurIPS'23

Scalarization for Multi-Task and Multi-Domain Learning at scale.

Population-based Training Scalarization Multi-Task Learning Multi-Domain Learning
msvit

MSViT

Published @ICCV'23: NIVT

Dynamic mixed-scale tokenization for vision transformers.

Conditional compute Mixed-scale Efficient CV Tokenization
Salisa

Salisa

Published @ECCV'22

Saliency-based input sampling for efficient video object detection.

Efficient Inference VOD Video Object Detection Spatial Transformer Network
Single-gated MoE

Single-gated MoE

Published @BMVC'22

Single-gate Mixture of Experts (MoE) with early exiting for convolutional architectures.

MoE Anytime Inference On-device Early-exiting
FrameExit

FrameExit

Published @CVPR'21 (Oral)

Conditional Early Exiting for Efficient Video Recognition.

Early Exiting Video Recognition Gating Network Efficient Recognition
SkipConv

SkipConv

Published @CVPR'21

Skip-Convolutions for efficient video processing.

Residual Convolutions Efficient Video Processing Skip-Convolution
Channel Gating for Continual Learning

Channel Gating for Continual Learning

Published @CVPR'20 (Oral)

Conditional channel gated networks for task-aware continual learning.

Continual Learning Chanel-Gating Task-aware Dynamic sparsity
Batch-shaping for Channel Gating

Channel Gating with Batch-shaping

Published @ICLR'20

Batch-shaping for learning conditional channel gated networks.

Batch-shaping Channel Gating Dynamic sparsity