Supporting webpage for ICASSP 2025.
[Paper on ArXiv]
The examples of music editing task are all from the Song Describer dataset[1]. For our model, we use a text prompt and a music prompt as the conditions for music editing. The text prompt comes from the dataset, while the music prompt is the top-4 constant-Q transform (CQT) representation extracted from the target audio. The table below shows the music prompt, displaying the top-4 CQT representation of the left channel from 0 to 6 seconds. For the baseline model MusicGEN[2], the same text prompt and Chroma-based melody representation are used as conditional inputs.
Scroll to see all the results if necessary.
text prompt | music prompt | Target | MusicGen-melody | MusicGen-melody-large | Ours |
---|---|---|---|---|---|
A twisty nice melody song by a slide electric guitar on top of acoustic chords later accompanied with a ukelele. | |||||
8-bit melody brings one back to the arcade saloons while keeping the desire to dance. | |||||
Instrumental piano piece with a slightly classical touch and a nostalgic, bittersweet or blue mood. | |||||
Positive instrumental pop song with a strong rhythm and brass section. | |||||
A blues piano track that would be very well suited in a 90s sitcom. The piano occupies the whole track that has a prominent bass line as well, with a general jolly and happy feeling throughout the song. | |||||
An upbeat pop instrumental track starting with synthesized piano sound, later with guitar added in, and then a saxophone-like melody line. | |||||
Pop song with a classical chord progression in which all instruments join progressively, building up a richer and richer music. | |||||
An instrumental world fusion track with prominent reggae elements. |
The examples for the text-to-music task also come from the Song Describer dataset[1]. For both our model and the baseline model MusicGEN[2], only the text prompt from the dataset is used as the control condition for music generation. In this case, the music prompt for our model is left empty.
Scroll to see all the results if necessary.
text prompt | MusicGen-melody | MusicGen-melody-large | Ours |
---|---|---|---|
An energetic rock and roll song, accompanied by a nervous electric guitar. | |||
A deep house track with a very clear build up, very well balanced and smooth kick-snare timbre. The glockenspiel samples seem to be the best option to aid for the smoothness of such a track, which helps 2 minutes to pass like it was nothing. A very clear and effective contrastive counterpoint structure between the bass and treble registers of keyboards and then the bass drum/snare structure is what makes this song a very good representative of house music. | |||
A string ensemble starts of the track with legato melancholic playing. After two bars, a more instruments from the ensemble come in. Alti and violins seem to be playing melody while celli, alti and basses underpin the moving melody lines with harmonies and chords. The track feels ominous and melanchonic. Halfway through, alti switch to pizzicato, and then fade out to let the celli and basses come through with somber melodies, leaving the chords to the violins. | |||
medium tempo ambient sounds to begin with and slow guitar plucking layering followed by an ambient rhythmic beat and then remove the layering in the opposite direction. | |||
An instrumental surf rock track with a twist. Open charleston beat with strummed guitar and a mellow synth lead. The song is a happy cyberpunk soundtrack. | |||
Starts like an experimental hip hop beat, transitions into an epic happy and relaxing vibe with the melody and guitar. It is an instrumental track with mostly acoustic instruments. |
[1] I. Manco, B. Weck, S. Doh, M. Won, Y. Zhang, D. Bogdanov, Y. Wu, K. Chen, P. Tovstogan, E. Benetos, E. Quinton, G. Fazekas, and J. Nam, “The Song Describer dataset: A corpus of audio captions for music-and-language evaluation,” in Proc. NeurIPS, New Orleans, 2023.
[2] J. Copet, F. Kreuk, I. Gat, T. Remez, D. Kant, G. Synnaeve, Y. Adi, and A. Defossez, “Simple and controllable music generation,” in Proc. NeurIPS, New Orleans, 2023.