Source: Deep Learning on Medium
Recently, OpenAI caused quite a stir in the AI community when they withheld releasing the full version of their latest GPT-2 Language Model. They released a stripped down version of that model (117 Million parameters versus 1.5 billion for the original).
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves…openai.com
Debates aside on whether this was the right decision, I found it frustrating that I could not use that model and it would remain in reach of only the few people who have the resources to train such a huge model (the cost is estimated to be around $50K).
The idea for a ‘kickstarter’ model for training AI models came out of this. I am not sure how viable it is: either if it’s technically possible as well as if there would be demand for something like this) but I find it intriguing.
Hers’s how I envision it would work. There would be a platform created where anyone can start a KickTraining Campaign (similar to a kickstarter campaign). There would be a nice page displaying all the relevant information about the project.
Say — I create a campaign to train the GPT-2 model. I would lay out the objective and model details (training the GPT-2 model) and stake 1 GPU (all I can afford…) and start training the model. Obviously that’s going to take a long time.
But now that I have launched my campaign, any user on KickTraining can follow the progress (a progress bar showing the model being trained) in real time. And anyone who is interested in helping out, can volunteer their own or their cloud providers GPU(s) to the campaign.
The incentive to stake your GPU (and mullah) could be that the model will be automatically distributed at predefined intervals (say every 10 epochs) to anyone who is part of the effort. Or…I could just have an open agenda where the model will be made public (in which case people can help cause they are just good samaritans and/or believe in the usefulness of the model to the community).
Say — my campaign takes off and I get a bunch of people who staked their GPUs…let’s call them KickTrainers. Now — the model is being trained and making progress. The page shows a real time count of GPUs being staked and the model progress. This adds gamification to the whole thing and more people start jumping on board to get the model trained as fast as possible.
Yes —there would be details to sort out how to treat every one equally. What if you stake your GPU at the last second? You shouldn’t get the same reward as an early KickTrainer. Maybe there is a minimum GPU compute requirement. Details to sort out.
Hypothetical Overly Optimistic Scenario: Woo hoo! Our campaign want viral and within 20 days — we have reached our goal (which could be some epoch number or some other metric)! I end the campaign and the model is distributed one last time. Everyone walks away with the fully trained GPT-2 and only had to pay a fraction of the cost.
Potential Use cases
Obviously — this would be a great way to train huge models plebeians like me want access to who don’t have the advantage of working for a company with Google-like resources.
This could also be a great way of sharing models between similar companies and helping to train them at the same time. So, say a medical company starts a campaign to train a classification model on some disease and other companies interested in that join hands.
Maybe after the first Campaign is done, some other company takes the initial Model, and puts up a new campaign to re-train it on their data now. The cycle starts again: everyone can chip in to train and ultimately benefit from the improved Model. I call it KickTransferLearning! ;)
Obviously — this idea is still very much in the idea stage. How do you setup a platform where it’s easy to share your GPUs? It would have to be easy and seamless for the users.
Can we enable on the fly adding and removing of GPUs (KickTrainers come and go) as the model is being trained without messing up everything?
How do we vouch for the initial model having the correct setup and data? And, on the flip side, can we guarantee the security of the data even as it gets trained by all these different GPUs?
I don’t know if this could work but I think it would be a really useful platform for a lot of people and encourage more collaboration and sharing of resources among the AI community.
Or…it could be a dud.
Would love to hear from people with more experience and expertise than me in this area. :)