Xingzhe He

Xingzhe He

Research Tech Lead at Descript, leading video & audio generation research and multi-modal understanding.

Previously PhD at UBC with Helge Rhodin. Working on controllable generation and self-supervised structure discovery.

About

I am a Research Tech Lead at Descript, where I lead teams working on video and audio generation research, and on multi-modal understanding that helps creators produce content at the speed of thought.

I completed my PhD at the University of British Columbia, advised by Prof. Helge Rhodin. My research centers on computer vision, machine learning, and generative models — for images, shapes, and physics.

Before UBC, I spent a year at Dartmouth College as a research intern advised by Prof. Bo Zhu, working on physics-based machine learning. I received my M.Sc. from Rutgers University, and B.Sc. from the University of Liverpool / Xi'an Jiaotong-Liverpool University, advised by Prof. Corina Constantinescu.

Productionized Industry Research & Systems

Foundational Generative Models for LipSync

A state-of-the-art lipsync generation stack from scratch, including highly compressed video VAEs and diffusion generators. The system improved audio-visual alignment, identity preservation, and fine facial dynamics across challenging poses. Example videos.

Audio Inpainting Models

A state-of-the-art audio inpainting model from scratch, including highly compressed audio VAEs and diffusion generators. The system not only preserves the identity and talking pace, but also room tone, including background noise and music. Example audio clips.

Long-Horizon Video-to-Video Generation

A training-free video-to-video model that scales sequence length for consistency, supporting generation and editing of static-camera videos beyond two hours. Example videos.

Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation

Technical Report
Matthew Bendel, Stephen W. Bailey, Mithilesh Vaidya, Sumukh Badam, Xingzhe He

PoDAR: Power-Disentangled Audio Representation for Generative Modeling

arxiv
Alejandro Luebs, Mithilesh Vaidya, Ishaan Kumar, Sumukh Badam, Stephen W Bailey, Matthew Bendel, Jose Sotelo, Xingzhe He

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

WACV 2025
Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Kun Wan, Helge Rhodin, Ratheesh Kalarot

Unsupervised Keypoints from Pretrained Diffusion Models

CVPR 2024
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar, Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi

Few-Shot Geometry-Aware Keypoint Localization

Xingzhe He, Gaurav Bharaj, David Ferman, Helge Rhodin, Pablo Garrido

AutoLink: Self-Supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints

NeurIPS 2022 Spotlight · ~3%
Xingzhe He, Bastian Wandt, Helge Rhodin

GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation

Xingzhe He, Bastian Wandt, Helge Rhodin

AdvectiveNet: An Eulerian–Lagrangian Fluidic Reservoir for Point Cloud Processing

Xingzhe He, Helen L. Cao, Bo Zhu

Soft Multicopter Control using Neural Dynamics Identification

Yitong Deng, Yaorui Zhang, Xingzhe He, Shuqi Yang, Yunjin Tong, Michael Zhang, Daniel M. DiPietro, Bo Zhu

RoeNets: Predicting Discontinuity of Hyperbolic Systems from Continuous Data

International Journal for Numerical Methods in Engineering
Yunjin Tong, Shiying Xiong, Xingzhe He, Shuqi Yang, Zhecheng Wang, Rui Tao, Runze Liu, Bo Zhu