View article

[PDF] from acm.org

Visemenet: Audio-driven animator-centric speech animation

Authors

Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, Karan Singh

Publication date

2018/7/30

Journal

ACM Transactions on Graphics (TOG)

Volume

Issue

Pages

1-10

Publisher

ACM

Description

We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in …

Total citations

Cited by 183

20182019202020212022202320241 12 24 27 37 58 24

Scholar articles

Visemenet: Audio-driven animator-centric speech animation

Y Zhou, Z Xu, C Landreth, E Kalogerakis, S Maji… - ACM Transactions on Graphics (TOG), 2018