A brand-new open-source deep-learning language model called MusicGen has just been released by the Audiocraft research team at Meta. MusicGen is capable of smoothly developing new music based on text prompts and can even be aligned to an existing tune. Regardless of how easily that may be, the thought process that went into developing the notion is quite intriguing!
As a matter of fact, in order to demonstrate their claim, the researchers evaluated the results of MusicGen alongside those of MusicLM and two more models called Musai and Riffusion. What’s best is that MusicGen may be operated locally (a GPU with at least 16GB of RAM), and it is available in four different model sizes, ranging from small (300 million parameters) to something bigger (3.3 billion parameters).
MusicGen works much like ChatGPT for audio, allowing you to define the sort of music you desire, drop in an existing track, and then just choose “Generate.”
And now, here’s the fun part!
During the training process, the Meta team listened to a total of 20,000 hours of professionally licensed music. Can you imagine something like that? Their session actually consisted of 10,000 high-quality music files taken from an internal dataset, plus some Pond5 and Shutterstock stuff, too.
Furthermore, they utilized Meta’s 32Khz EnCodec audio tokenizer to come up with some smaller bits of music so that it could be analyzed in parallel. The demo version of the Hugging Face AI website on Facebook allows you to provide a description of your music and provides a few samples, such as “an 80s driving pop song with heavy drums and synth pads in the background.” Check out more details here!
So practically, if you want to try MusicGen, you’ll need a certain song no longer than 30 seconds, choose some settings, then create your product. You’ll get a high-quality sample of about one to ten minutes long. Quite impressive, isn’t it?!
Leave a Reply