top of page

LyricVis

When every lyric becomes a picture. 


Type: Open Source Software 

Role: Solo developer


Project overview:  LyricVis is a personal exploration in generative AI and creative media — using song lyrics as the raw material for AI-generated music videos, one image per lyric phrase, synchronized to the music. The question that started it: could the words of a song, fed into an image model, produce something that deepens how you experience it?


Example outputs: https://www.youtube.com/@lyricvisual

Mission Statement

I wanted to see what I could create visually using song lyrics as the basis for a music video. Not as a replacement for traditional production, but as an experiment — could AI-generated imagery, driven by the language of a song, actually heighten the experience of listening to it? LyricVis is the answer to that question, built entirely by hand to find out.

Solution

The process is straightforward by design:

• Input: A timestamped lyrics file and the corresponding audio track
• Processing: For each lyric phrase, LyricVis generates a unique AI image
• Output: Images are sequenced and synchronized with the audio to produce a cohesive video

The result is a music video generated entirely from the language of the song itself — no director, no crew, no budget. Just lyrics, a model, and the question of what they produce together.

Technical Details

LyricVis is built on a Python orchestration layer integrating two open-source tools:

• Image generation: Stable Diffusion (via AUTOMATIC1111) produces AI visuals for each lyric segment
• Video assembly: A Python application handles synchronization — calling the Stable Diffusion API, stitching images together, and compiling the final video with moviepy and ImageMagick

The modular architecture makes it straightforward to swap models, adjust timing, or experiment with different visual styles per song.

GitHub: https://github.com/crmills100/python/tree/main/lyricvis

Result

The experiment worked — well enough to keep going. Videos that would take a production team days can be generated in hours. More importantly, the AI imagery does something interesting: it interprets the language of each lyric independently, producing visual associations that a human director might never make. Sometimes that's surprising. Sometimes it's exactly right.

Whether it heightens the experience of the song depends on the song. That's part of what makes it worth exploring.

bottom of page