Rohan Choudhury

I’m a fourth-year PhD student at Carnegie Mellon University’s Robotics Institute, advised by Kris Kitani and László Jeni. My research broadly focuses on making visual models more efficient at understanding and generating visual content, particularly enabling algorithms to continuously perceive the world at high resolution and real-time frame rates (30+ FPS). My work is supported by the NSF GRFP Fellowship.
Currently, I’m also a Student Researcher at ByteDance, collaborating with Peter Lin and Lu Jiang on accelerating video generation. Previously, I interned at Meta FAIR, working with Jing Huang on efficient video understanding.
Before my PhD, I was a software engineer at Nuro, developing trajectory forecasting models for self-driving vehicles, and even earlier I earned my bachelor’s degree from Caltech, where I explored multi-agent reinforcement learning with Yisong Yue.
Outside of research, I enjoy running, weightlifting, watching sports, and listening to electronic music.
news
Oct 18, 2024 | Our work RLT was accepted to NeurIPS 2024 as a spotlight paper! |
---|---|
Jul 14, 2024 | Our paper Video Question Answering with Procedural Programs was accepted to ECCV 2024! |
selected papers (full list)
- Don’t Look Twice: Faster Video Transformers with Run-Length TokenizationNeurIPS, 2024