How AI Will Redefine The Way We Think About Ownership

Original article can be found here (source): Artificial Intelligence on Medium

A common adage amongst machine learning engineers is “data is gold”. A model is only as good as the corpus it is trained on, and this crucialness provides a potential reason why creators of training data could be argued to have superior ownership over the results of an AI model. The conversion from data to output is most transparent through the use of Markov Chain Monte Carlo (MCMC) and Long Short-Term Memory for text generation. Both of these algorithms use large inputs of text to learn dependencies between words and word frequencies to then create new blocks of text. For example, in March 2017 high schooler Robbie Barrat used 6,000 Kanye West lyrics to train a model to output lyrics that were reminiscent of those by the famous rapper⁷. The AI was able to grasp which words the artist tended to use as well as the typical length and rhythm of West’s raps. Barrat used an open-source machine learning model to generate the first iteration of his AI-generated Kanye West lyrics; from a labor perspective, Barrat simply plugged a formatted version of West lyrics into a pre existing model in order to get seemingly unique results. Given this, it seems hard to deny that it is in fact Kanye West who did the heavy lifting in enabling a machine learning model to rap like him. Other examples of similar AI-generated content situations include the use of a Generative Adversarial Network (GAN) trained on Rembrandt art by Microsoft and IMG to generate “the next Rembrandt”⁸ as well as the Beatles’ music by Sony CSL Research Laboratory to generate an original song in the style of the iconic band⁹.

AI generated, Beatles inspired

While there are no major legal cases discussing the relationship between copyrighted training data and AI-generated lyrics or art, Authors Guild v. Google¹⁰ provides a reasonable precedent to evaluating this dynamic. In this case, the Association of American Publishers claimed Google was performing a massive copyright breach by scanning book collections for its Google Books Library Project, which aimed to make searchable the content of millions of books. As a result of this project, users on the Google platform could search for text and be presented with several pages worth of unobfuscated text from books that had the queried keyword or phrase. Authors were understandably outraged at the prospect of their copyrighted works being accessed, even partially, for free and disputed the case for a decade (2005–2015) before it was ultimately ruled in favor of Google with the court seeing the use of data as fair use. Specifically, the court saw Google’s use of copyrighted books to train its algorithm as ultimately “[communicating] something new and different from the original or expands its utility, thus serving copyright’s overall objective of contributing to public knowledge”. Effectively, this case sets a precedent wherein the conversion of training data into an algorithm is sufficiently transformative of process that it is okay to use copyrighted material to start. Lawyer Benjamin Sobel refers to this as “AIs fair use dilemma”¹¹. The dilemma lies in the murkiness of how to structure fair use in a way that is least disruptive to greater societal economics. On one hand, leaving machine learning algorithms exposed such that anyone with protected information present in the training data set could sue would place a massive strain on the progress of artificial intelligence. Given that intellectual property law tries to support science and progress, such an environment would be disappointing. On the other hand, Sobel notes “a hyper-literate AI would be more likely to displace humans in creative jobs, and that could exacerbate the income inequalities that many people fear in the AI age”¹² so AI’s unfettered access to copyrighted training data could also harm innovation. At the moment, it appears the legal system considers the first option the lesser of the two evils and given the massive benefits machine learning can have on society it is indeed necessary to eliminate creators of data as potential owners of AI-generated inventions.

The option of granting ownership of AI-generate work to the AI model itself is an option getting more serious coverage thanks to the work of engineer Stephen Thaler and a team of scientists out of the University of Surrey. As of early 2019 the team has filed several patents in the name of their computation system, Dabus AI¹³, who they claim has generated two patent-worthy inventions: a fractal-based easier-to-grasp food container and a lamp that flickers with a pattern that mirrors brain activity. These patents have been rejected by patent offices within the UK, Europe, and U.S. but Thaler and his team are continuing to push for recognition and protectability of the AI-generated creations. While Dabus AI, and by extension other machine learning models, are perhaps that primary creators of “their” inventions, granting intellectual property rights to an AI would undermine the greater goals of the practice. Minor incongruencies include the fact that ownership rights such as copyright extend seventy years after the death of the author. More fundamentally, however, a major purpose of granting intellectual property rights to an individual is to provide an economic incentive for innovation; an AI does not need such an incentive to continue producing productive output. As Pamela Samuelson noted at a symposium of the future of software protection, “All it takes is electricity (or some other motive force) to get the machines into production. The whole purpose of the intellectual property system is to grant rights to creators to induce them to innovate. The system has assumed that if such incentives are not necessary, rights should not be granted”¹⁴.

Currently, the most popular argument in regards to artificial intelligence output ownership is to grant intellectual property rights to the programmer that creates the AI system. Giving authorship to the programmer is already practiced in a few countries such as Hong Kong (SAR), India, Ireland, New Zealand and the UK. This approach is best encapsulated in UK copyright law which explicitly states, “In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken”¹⁵. The United States has no analogous law to this which is why the case around attribution within the US is significantly more ambiguous. The prevalence of this model is likely due to the neat way it aligns with intellectual property theory; it induces programmers to continue making progress in AI by giving them claim to the works generated by the models. Notably, giving ownership to the owner does not dispute the notion that it was the AI that did the bulk of the creative labor; it is crucial in this to distinguish between an “inventor” and an “owner”. Artificial intelligence, through its advanced capability to produce output unimagined by the programmer, has the technical capacity to be seen as a creative force. However, to stay true to the goals of the legal system, it is necessary to assign ownership to the programmer. While the United States is currently lacking a sufficient legal model to practice this, a couple frameworks provide promising templates.

Firstly, AI can be seen as in a work-for-hire relationship with its programmer¹⁶. When an employee acts within the scope of their employment, their hiring party is granted ownership of the copyright so work-for-hire would facilitate a transition of rights from AI to programmer. However, codifying the AI-programmer relationship as purely work-for hire would lead to many violations of labor rights (an AI is unpaid, works lengthy hours, etc.) therefore amendments would need to be made. Another option would be to view AI as a dependent legal person, in the way an animal or an unborn child is¹⁷. Dependent legal persons can only act through the agency of another legal person in exercising its legal rights so under this status a programmer would be able to exercise intellectual property rights for the AI. A third option is to view invention as an exercise in discovery rather than pure creation. Many inventions are created accidentally, however ownership is granted to the first individual to discover the value in the results of the accident. Through this doctrine, despite the AI generating the invention, the AIs current inability to recognize value (something only a sentient being could do) would give the programmer who would be the discoverer and legal inventor. Ultimately the reason why this solution makes sense, no matter what legal framework is put in place to justify it is, is because it is already being executed behind the back of the United States intellectual property practitioners.

Surprise! The Patent Office has already granted a patent for a computer-generated invention. On January 25, 2005, John Koza was granted by a patent for an innovation generation by his “Invention Machine”, a genetic programming based AI¹⁸. As detailed by Ryan Abbott after an interview with Koza, “[the] Invention Machine generated the content of the patent without human intervention and in a single pass. It did so without a database of expert knowledge and without any knowledge about existing controllers. It simply required information about basic components (such as resistors and diodes) and specifications for a desired result (performance measures such as voltage and frequency). With this information, the Invention Machine proceeded to generate different outputs that were measured for fitness (whether an output met performance measures”¹⁹. Koza was advised by his legal counsel to not disclose the role of the Invention Machine in the invention and so he filed it under his own name. Rather than having such filing indiscretions come to pass, it is better to create a framework in which AI inventors are legitimized and programmers given ownership over inventions created.

Lastly, there is a solution that everyone seems to be deferring to despite the fact that it is the solution that pleases no one: making all AI-generated works public domain. If it is agreed upon that an inventor must be human and that a programmer does not do sufficient inventive work to qualify as an inventor then the work by default becomes public domain (much like was the case with Naruto’s selfie). Even on projects like OpenAIs MuseNet which uses machine learning to generate on demand music from a wide variety of genres, the work is freed up to the public with footnotes saying “We do not own the music output”²⁰.

OpenAI avoids claiming output from its ML models

While it seems like this is the option with the greatest foothold in existing legal frameworks, a world where all AI-generated works are public domain is one where AI progress becomes slow, if not stagnant all together. The lack of intellectual property protection will prevent companies from investing in AI development; after all, how can they be expected to invest the significant resources involved in AI research without any sort of financial upside? Although easy, leaving AI-generated content for public domain is not right.