Viseme

In short: A viseme is the visual representation of a phoneme, describing the specific mouth shape a speaker forms when producing a particular speech sound.

About Viseme

Visemes are the visual building blocks of lip sync technology. While phonemes represent individual sounds in spoken language, visemes represent the corresponding mouth shapes that produce those sounds.

In AI lip sync pipelines, the system maps detected phonemes from an audio track to their corresponding visemes, then renders those mouth shapes onto the target face frame by frame. Multiple phonemes can map to the same viseme since some distinct sounds produce visually similar mouth positions.

How Viseme Connects to Lip Sync

Viseme relates to several other concepts in the AI lip sync pipeline: Phoneme , and Face Landmark Detection .

Explore More

Related Terms

Try AI Lip Sync

Experience studio-quality lip synchronization for videos in any language.