About me
Interested in audio-visual multi-modal processing with machine learning methods. Recieved the B.Sc degree in the department of automation from Tsinghua University in 2018, and is now a master student with the department of computer science and technology at Tsinghua University.
Education
2021-Present
Department of Computer Science and Technology
Tsinghua University
I am studing for my master's degree supervised by Prof. Thomas Fang Zheng and
Prof. Dong Wang in Center of Speech and Language Technologies (CSLT) now.
2014-2018
Department of Automation
Tsinghua University
Publications
[Published]
CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition,
by Interspeech 2023 (the 24th INTERSPEECH Conference )
[Published] Random Cycle Loss and Its Application to Voice Conversion,
by TPAMI (IEEE Transactions on Pattern Analysis and Machine Intelligence)
[Published]
CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis,
by ICASSP 2023 (2023 IEEE International Conference on Acoustics, Speech, and Signal Processing)
[Published] CycleFlow: Purify Information Factors by Cycle Loss,
by Odyssey 2022 The Speaker and Language Recognition Workshop
Projects
Video to Speech Synthesis
Python
Deep Learning
Audio-Visual
Lip Reading
Target at restoring the corresponding speech signal from visual
information in lip movement alone. We have collected a large-scale mandarin audio-visual dataset as
the benchmark of this project.
Skills
Program Language: Python, C++, JavaScript
Markup Language: LaTeX, HTML, Markdown
Deep Learning Framework: Pytorch, Lightning
Contact