Rohan Choudhury

I’m a final-year PhD student at Carnegie Mellon University’s Robotics Institute, advised by Kris Kitani and László Jeni. My research broadly focuses on making visual models more efficient at understanding and generating visual content. My work is supported by the NSF GRFP Fellowship.

I’m currently also a Student Researcher at ByteDance Seed, collaborating with Peter Lin and Lu Jiang on accelerating video generation. I previously interned at Meta FAIR, working with Jing Huang on efficient video understanding.

Before my PhD, I was a software engineer at Nuro, developing trajectory forecasting models for self-driving vehicles, and graduated from Caltech, where I explored multi-agent reinforcement learning with Yisong Yue.

Outside of research, I enjoy running, weightlifting, watching sports, and listening to electronic music.

news

Oct 23, 2025	Excited to announce our new preprint: Accelerating Vision Transformers with Adaptive Patch Sizes!
Jun 01, 2025	Honored to be named an Outstanding Reviewer for CVPR 2025!
Mar 01, 2025	Excited to start as a Student Researcher at ByteDance Seed!

selected papers (full list)

Accelerating Vision Transformers with Adaptive Patch Sizes

Rohan Choudhury, JungEun Kim, Jinhyung Park, and 3 more authors

arXiv preprint, 2025

arXiv Bib Project Page Code

@article{choudhury2025apt,
  title = {Accelerating Vision Transformers with Adaptive Patch Sizes},
  author = {Choudhury, Rohan and Kim, JungEun and Park, Jinhyung and Yang, Eunho and Jeni, László A. and Kitani, Kris M.},
  journal = {arXiv preprint},
  year = {2025},
  site = {/apt/}
}

Don’t Look Twice: Faster Video Transformers with Run-Length Tokenization

Rohan Choudhury, Guanglei Zhu, Sihan Liu, and 3 more authors

NeurIPS, 2024 (spotlight, top 3%)

arXiv Bib Project Page Code

@article{choudhury2024rlt,
  title = {Don't Look Twice: Faster Video Transformers with Run-Length Tokenization},
  author = {Choudhury, Rohan and Zhu, Guanglei and Liu, Sihan and Niinuma, Koichiro and Kitani, Kris M. and Jeni, László A.},
  journal = {NeurIPS},
  year = {2024},
  site = {/rlt/},
  spotlight = true
}

ECCV

Video Question Answering with Procedural Programs

Rohan Choudhury, Koichiro Niinuma, Kris M. Kitani, and 1 more author

ECCV, 2024

arXiv Bib HTML

@article{choudhury2024proviq,
  title = {Video Question Answering with Procedural Programs},
  author = {Choudhury, Rohan and Niinuma, Koichiro and Kitani, Kris M. and Jeni, László A.},
  journal = {ECCV},
  year = {2024},
}