Now, I’m a research scientist in ByteDance, working on large speech model and AI avatar. Our work are widely deployed in famous applications and services, such as Tiktok/抖音, Capcut/剪映, Volcano Engine(火山引擎), etc.
I graduated from the Department of Computer Science, Zhejiang University (浙江大学计算机科学与技术学院) with a bachelor’s degree in 2020. After that, in 2023, I graduated with a master’s degree in the Department of Computer Science, Zhejiang University, advised by Kejun Zhang (张克俊).
My research interest includes speech synthesis, music generation, avatar and translation. I have published more than 20 papers at the top international AI conferences such as NeurIPS, ICLR, ICML, ACL, AAAI, etc. I served as area chair for ACL and NAACL. Also, I served as reviewer for NeurIPS, ICLR, TMM, CVPR, etc.
I used to be a research intern at Tencent AI Lab and SEA AI Lab , collaborating with Shuicheng Yan (颜水成) and Yi Ren (任意). Before that, I was a research intern at ByteDance AI Lab , advised by Bilei Zhu (朱碧磊). Also, I had a one-year long internship at Microsoft Research Asia , Xu Tan (谭旭), Tao Qin (秦涛) and Tie-yan Liu (刘铁岩).
I’m one of the main contributors of a popular music open-source project: Muzic .
🔥 News
- 2024.10: One paper is accepted by TAFFC!
- 2024.09: One paper is accepted by NeurIPS 2024!
- 2024.07: One paper is accpeted by TASLP!
- 2024.02: Our voice cloning is launched in Capcut at full stream!
- 2024.01: Two papers are accepted by ICLR 2024!
- 2023.06: One paper is accetped by ICML Workshop!
- 2023.05: One paper is accepted by TMM!
- 2023.05: One paper is accepted by INTERSPEECH 2023!
- 2023.01: One paper is accepted by ICLR 2023!
📝 Publications
🎙 Speech Translation and Synthesis
TASLP
RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference LeveragingICLR 2024
Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts. | ProjectICLR 2023
Bag of Tricks for Unsupervised Text-to-Speech | ProjectICASSP 2021
Denoising Text to Speech with Frame-Level Noise Modeling | ProjectAAAI 2021
UWSpeech: Speech to Speech Translation for Unwritten Languages | ProjectINTERSPEECH 2023
EE-TTS: Emphatic Expressive TTS with Linguistic Information | ProjectACL 2020
SimulSpeech: End-to-End Simultaneous Speech to Text TranslationIJCAI 2020
Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation
🎼 Music Generation and Retrieval
TMM
SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation | ProjectTAFFC
REMAST: Real-time Emotion-based Music Arrangement with Soft TransitionACM-MM 2022
ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships | ProjectACM-MM 2022
SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure BiasISMIR 2022
PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics TranscriptionACL 2022
Automatic Song Translation for Tonal Languages | ProjectICASSP 2022
S3T: Self-Supervised Pre-training with Swin Transformer for Music ClassificationEMNLP 2022
TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method | Project
🧑🎨 Multi-modal Learning
NeurIPS 2024
MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes | Code .ICLR 2024
Real3d-portrait: One-shot realistic 3d talking portrait synthesis | Project | Code .ICML 2023 Workshop
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | VideoNeurIPS 2022
Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object LocalizationACM-MM 2020
FastLR: Non-Autoregressive Lipreading Model with Integrate-and-FireIJCAI 2019
Discriminative and Correlative Partial Multi-Label Learning
🎖 Honors and Awards
- National Scholarship (Top 1%)
- Zhijun He Scholarship (Top 1%)
- Tianzhou Chen Scholarship (Top 1%)
- Huawei Scholarship (Top 1%)
- Outstanding Graduates of Zhejiang Province
📖 Educations
- 2020.06 - 2023.06, Master, Zhejiang University, Hangzhou.
- 2016.09 - 2020.06, Undergraduate, Zhejiang Univeristy, Hangzhou.
💬 Invited Talks
- 2022.12, Music Generation with Domain Knowledge, Department of CS @ NUS.
- 2021.08, Simulataneous Speech Translation Panel, IWSLT Workshop @ ACL 2021.
- 2021.01, Speech Translation for Unwritten Languages, Live Share @ MSRA.
💻 Internships
- 2023.02 - 2023.05, Tencent AI Lab, Shenzhen, Guangdong, China.
- 2022.03 - 2022.12, SEA AI Lab, Singapore.
- 2021.06 - 2021.11, ByteDance AI Lab, Speech & Audio Team, Shanghai, China.
- 2019.07 - 2020.06, Microsoft Research Asia, Machine Learning Group, Beijing, China.