Developed an experimental pipeline that transforms AI-generated music into unique 2D platformer levels in Unity. Leveraging GPT-2, TensorFlow, CUDA, and PyTorch, we converted text prompts into AI-generated songs, then into MIDI files, which Unity used to generate custom levels. Collaborating with a team of four, we divided tasks based on our strengths, addressing challenges early to ensure efficient and timely project completion.
We utilized the AudioLDM 2 Python library, designed for research, to generate music from text prompts by leveraging GPT-2 and Meta’s AudioMAE for self-supervised learning from audio spectrograms. AudioLDM typically uses the CPU, which took around 26 minutes to produce a 10-second song, but by utilizing CUDA on a GPU, we reduced this to 9 minutes. Further optimization through AudioLDM’s diffusers compressed the generation process to 200 steps, cutting production time to just 26 seconds for a .wav file.
We used Omnizart, a Python library for automatic music transcription, to extract musical notes as piano notes for our project. For transforming MIDI files into data usable for level generation, we integrated the pretty-midi library. This allowed us to extract each note’s pitch, start time, finish time, and BPM, saving this information into a text file for Unity to generate unique levels. By using pretty-midi’s estimate_tempo() method, we synchronized the game tempo with the music’s rhythm, enabling dynamic, rhythm-responsive gameplay.