About
I am a Research Tech Lead at Descript, where I lead teams working on video and audio generation research, and on multi-modal understanding that helps creators produce content at the speed of thought.
I completed my PhD at the University of British Columbia, advised by Prof. Helge Rhodin. My research centers on computer vision, machine learning, and generative models — for images, shapes, and physics.
Before UBC, I spent a year at Dartmouth College as a research intern advised by Prof. Bo Zhu, working on physics-based machine learning. I received my M.Sc. from Rutgers University, and B.Sc. from the University of Liverpool / Xi'an Jiaotong-Liverpool University, advised by Prof. Corina Constantinescu.
News
- 2025 Leading video & audio generation research and multi-modal understanding at Descript.
- Jan 2025 Paper on identity preservation for diffusion personalization accepted to WACV 2025.
- Feb 2024 Unsupervised keypoints from pretrained diffusion models accepted to CVPR 2024.
- Sep 2022 AutoLink selected as Spotlight at NeurIPS 2022 (~3% acceptance).
Research
I build generative systems that are controllable, structured, and useful. My work spans:
Video & Audio Generation
Diffusion models, controllable synthesis, identity preservation, and few-shot personalization that bring creative tools to everyone.
Multi-modal Understanding
Joint reasoning across vision, language, and audio for grounded perception, retrieval, and intelligent editing.
Self-Supervised Structure
Unsupervised discovery of keypoints, skeletons, and segmentations from images and videos.
Physics-Based ML
Symplectic networks, neural projections, and Eulerian–Lagrangian methods for physical systems.
Publications
Full list on Google Scholar →

LatentKeypointGAN: Controlling GANs via Latent Keypoints
AutoLink: Self-Supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints



Symplectic Neural Networks in Taylor Series Form for Hamiltonian Systems
