Generate Nick Cave’s Lyrics in 2 mins

Original article was published on Artificial Intelligence on Medium

Generate Nick Cave’s Lyrics in 2 mins

In this article we will use GPT-2 language model to develop simple songs lyrics generator on the example of Nick Cave, who is one of my favourite artists.

https://www.nickcave.com/wp-content/uploads/2019/09/GHOSTEEN_PACKSHOT_01-2.jpg

GPT-2

GPT-2 is a large transformer-based language model with 1.5 billion parameters. It is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset made of 8 million web pages, causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. For details, please go to the OpenAI publication. Let’s focus on practise!

Parse lyrics

First of all, we need to parse lyrics from Nick Cave’s official webpage using well known BeautifulSoup library.

Parse Nick Cave’s songs

After running above code we have 201 songs coming from 20 albums saved in songs.txt file.

Train model

Build tokenizer

Use aitextgen library to train a custom tokenizer on a downloaded songs. This will save two files: aitextgen-vocab.json and aitextgen-merges.txt, that are needed to rebuild the tokenizer.

Train GPT-2 transformer model

Use created tokenizer to initiate aitextgen

Create TokenDatasets, that builds datasets for training, processing them with the appropriate size.

Time for training!

Generate song

Generate 3 paragraphs starting from ‘We drowned’ and save them to file. You can modify max_length and temperature parameters. Temperature is randomness related hyper-parameter. When its value is small (e.g. 0,2), the GPT-2 model is more confident but also more conservative. When temperature is a large value (e.g. 1), the GPT-2 model produces more diversity and also more mistakes. In my opinion, the second option is much better when you want to generate a piece of art ;).

Below you can see sample results. Looks quite promising, especially considering how much coding it required :).

Thanks for reading!