Xingzhe He
AI Researcher · Generative Models

Xingzhe He

Research Tech Lead at Descript, leading video & audio generation research and multi-modal understanding.

Previously PhD at UBC with Helge Rhodin. Working on controllable generation and self-supervised structure discovery.

About

I am a Research Tech Lead at Descript, where I lead teams working on video and audio generation research, and on multi-modal understanding that helps creators produce content at the speed of thought.

I completed my PhD at the University of British Columbia, advised by Prof. Helge Rhodin. My research centers on computer vision, machine learning, and generative models — for images, shapes, and physics.

Before UBC, I spent a year at Dartmouth College as a research intern advised by Prof. Bo Zhu, working on physics-based machine learning. I received my M.Sc. from Rutgers University, and B.Sc. from the University of Liverpool / Xi'an Jiaotong-Liverpool University, advised by Prof. Corina Constantinescu.

News

Research

I build generative systems that are controllable, structured, and useful. My work spans:

Video & Audio Generation

Diffusion models, controllable synthesis, identity preservation, and few-shot personalization that bring creative tools to everyone.

Multi-modal Understanding

Joint reasoning across vision, language, and audio for grounded perception, retrieval, and intelligent editing.

Self-Supervised Structure

Unsupervised discovery of keypoints, skeletons, and segmentations from images and videos.

Physics-Based ML

Symplectic networks, neural projections, and Eulerian–Lagrangian methods for physical systems.

Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation

Technical Report
Matthew Bendel, Stephen W. Bailey, Mithilesh Vaidya, Sumukh Badam, Xingzhe He

PoDAR: Power-Disentangled Audio Representation for Generative Modeling

arxiv
Alejandro Luebs, Mithilesh Vaidya, Ishaan Kumar, Sumukh Badam, Stephen W Bailey, Matthew Bendel, Jose Sotelo, Xingzhe He

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

WACV 2025
Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Kun Wan, Helge Rhodin, Ratheesh Kalarot

Unsupervised Keypoints from Pretrained Diffusion Models

CVPR 2024
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar, Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi

Few-Shot Geometry-Aware Keypoint Localization

Xingzhe He, Gaurav Bharaj, David Ferman, Helge Rhodin, Pablo Garrido

AutoLink: Self-Supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints

NeurIPS 2022 Spotlight · ~3%
Xingzhe He, Bastian Wandt, Helge Rhodin

GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation

Xingzhe He, Bastian Wandt, Helge Rhodin

AdvectiveNet: An Eulerian–Lagrangian Fluidic Reservoir for Point Cloud Processing

Xingzhe He, Helen L. Cao, Bo Zhu

Soft Multicopter Control using Neural Dynamics Identification

Yitong Deng, Yaorui Zhang, Xingzhe He, Shuqi Yang, Yunjin Tong, Michael Zhang, Daniel M. DiPietro, Bo Zhu

RoeNets: Predicting Discontinuity of Hyperbolic Systems from Continuous Data

International Journal for Numerical Methods in Engineering
Yunjin Tong, Shiying Xiong, Xingzhe He, Shuqi Yang, Zhecheng Wang, Rui Tao, Runze Liu, Bo Zhu