Now, I’m a research engineer in FAIR (Foundamental AI Research Lab, ), working on multimodal AI agent. Formerly, I was a research scientist in ByteDance , working on large speech model and AI avatar. Our work were widely deployed in famous applications and services, such as Tiktok/抖音, Capcut/剪映, Volcano Engine(火山引擎), etc.

I graduated from the Department of Computer Science, Zhejiang University (浙江大学计算机科学与技术学院) with a bachelor’s degree in 2020. After that, in 2023, I graduated with a master’s degree in the Department of Computer Science, Zhejiang University, advised by Kejun Zhang (张克俊).

My research interest includes speech synthesis, music generation, avatar and translation. I have published more than 20 papers at the top international AI conferences such as NeurIPS, ICLR, ICML, ACL, AAAI, etc. I served as area chair for ACL, EMNLP, and NAACL. Also, I served as reviewer for NeurIPS, ICLR, TASLP, TMM, CVPR, ICCV, etc.

I used to be a research intern at Tencent AI Lab and SEA AI Lab , collaborating with Shuicheng Yan (颜水成) and Yi Ren (任意). Before that, I was a research intern at ByteDance AI Lab , advised by Bilei Zhu (朱碧磊). Also, I had a one-year long internship at Microsoft Research Asia , closely collaborated with Xu Tan (谭旭), Tao Qin (秦涛) and Tie-yan Liu (刘铁岩).

I’m one of the main contributors of several popular open-source projects: Muzic , MegaTTS3 , etc.

🔥 News

2025.06: 📢 The technical report of our Seamless Interaction is released.
2025.03: I join FAIR as a research engineer in Menlo Park, USA.
2024.10: 🎉 One paper is accepted by TAFFC!
2024.09: 🎉 One paper is accepted by NeurIPS 2024!
2024.07: 🎉 One paper is accpeted by TASLP!
2024.02: 📢 Our voice cloning is launched in Capcut at full stream!
2024.01: 🎉 Two papers are accepted by ICLR 2024!
2023.07: I join ByteDance as a research scientist in Shenzhen, China.

📝 Publications

🎙 Speech Translation and Synthesis

TASLP RefXVC: Cross-Lingual Voice Conversion with Enhanced Reference Leveraging
ICLR 2024 Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts. | Project
ICLR 2023 Bag of Tricks for Unsupervised Text-to-Speech | Project
ICASSP 2021 Denoising Text to Speech with Frame-Level Noise Modeling | Project
AAAI 2021 UWSpeech: Speech to Speech Translation for Unwritten Languages | Project
INTERSPEECH 2023 EE-TTS: Emphatic Expressive TTS with Linguistic Information | Project
ACL 2020 SimulSpeech: End-to-End Simultaneous Speech to Text Translation
IJCAI 2020 Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation

🎼 Music Generation and Retrieval

NeurIPS 2024 MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes | Code .
ICLR 2024 Real3d-portrait: One-shot realistic 3d talking portrait synthesis | Project | Code .
ICML 2023 Workshop Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Video
NeurIPS 2022 Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization
ACM-MM 2020 FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire
IJCAI 2019 Discriminative and Correlative Partial Multi-Label Learning

🎖 Honors and Awards

National Scholarship (Top 1%)
Zhijun He Scholarship (Top 1%)
Tianzhou Chen Scholarship (Top 1%)
Huawei Scholarship (Top 1%)
Outstanding Graduates of Zhejiang Province

📖 Educations

2020.06 - 2023.06, Master, Zhejiang University, Hangzhou.
2016.09 - 2020.06, Undergraduate, Zhejiang Univeristy, Hangzhou.

💬 Invited Talks

2022.12, Music Generation with Domain Knowledge, Department of CS @ NUS.
2021.08, Simulataneous Speech Translation Panel, IWSLT Workshop @ ACL 2021.
2021.01, Speech Translation for Unwritten Languages, Live Share @ MSRA.

💻 Internships

2023.02 - 2023.05, Tencent AI Lab, Shenzhen, Guangdong, China.
2022.03 - 2022.12, SEA AI Lab, Singapore.
2021.06 - 2021.11, ByteDance AI Lab, Speech & Audio Team, Shanghai, China.
2019.07 - 2020.06, Microsoft Research Asia, Machine Learning Group, Beijing, China.

✏️ Service

Area Chair: ACL, EMNLP, NAACL.
Reviewer (Conference): ICLR, NeurIPS, CVPR, ICCV, MM, AAAI, etc.
Reviewer (Journal): Neural Networks, TASLP (Transactions on Audio, Speech and Language Processing), TMM (IEEE Transactions on Multimedia), etc.

Chen Zhang (章晨)