Strategic National Issues in Machine Learning

Original article was published by Evan Crain on Artificial Intelligence on Medium


Strategic National Issues in Machine Learning

Credit, U.S. Department of Energy Government Work

Executive Summary

Machine learning, often called “artificial intelligence” even though it is a technical definition as a subset of machine learning, has the ability to transform society and economy as did the steam engine, electricity, and the internet. This report considers what machine learning is and is not, to defeat popular misconceptions, describe the grand competition between China and the United States for machine learning supremacy, and concludes with national policy recommendations to improve the existing strategic White House plan.

The United States is at a precipice to enable a new generation of economic strength through the development and long-term technological supremacy of the next evolution of technology. If we are not careful, we will become a fast follower to the detriment of our relative role in the world. Economic strength is necessary to maintain the strength of the dollar. This technology is critical to our economy, the future of the financial system, and therefore our ability to maintain, grow, and service our national debt. We have a good national plan, but we need to do more.

This report was written for the Jackson Institute class GLBL 765: Contemporary Issues in American Diplomacy and National Security at Yale University, taught by Ambassador John Negroponte.

What Machine Learning Is and Is Not

Stuxnet and Hollywood (What Machine Learning Is Not)

People are most familiar of a subset of machine learning, artificial intelligence. When people hear the term “artificial intelligence,” Hollywood stories come to mind from the Terminator franchise, Eagle Eye, and other films. People think of computers becoming self-aware, conquering the Earth, and taking the form of humans using advanced robotics. These machines fight against a small, weary, and courageous human contingent who escaped the machine apocalypse. The rebels are intent on restoring Earth to the machine’s creators, the same frail, imperfect humans who have the capacity machines could never understand: the ability to love. Artificial intelligence and its technical parent, machine learning, could not be more different than this narrative.

First, advanced robotics must be distinguished from computing. Robotics is an asset intensive field attempting to advance the physical control of machines to automate physical tasks — that is, machine speed, precision, accuracy, and cost effectiveness. Robots are, of course, controlled by computers, and machine learning heightens the effectiveness of robotics. But, nonetheless, the outcomes and methods are distinct, more of an overlapping Venn diagram.

Second, the Hollywood stories highlight a fictional form of artificial intelligence called artificial general intelligence. The real-world artificial intelligence is called artificial specific intelligence. Humanities ability to train machines to “learn” is specific because humans constrain the problems a machine can solve by defining the problems as software algorithms and mathematical models, and then provide the machine specific datasets to analyze. Machines cannot “generally” choose problems or pursue the ability to solve problems.

Third, this topic must be understood in context to the statistics that make the field possible. Machine learning is the parent, artificial intelligence a method within, and, importantly in context, deep learning is a subset of artificial intelligence. The differentiator between these three is the statistical method employed by the creator of the mathematical model. This author does not have the technical ability to distinguish between PhD level statistical methods, and so, from here on in this memo, the term “machine learning” will be in exclusive use to maintain the technical integrity of the paper, except when citing external resources using the term “artificial intelligence” incorrectly as an umbrella for machine learning.

Finally, there is considerable fear related to artificial intelligence propagated by politicians, Hollywood, and general misinformation. 38% of Americans believe artificial intelligence will reduce the number of jobs, 49% believe it will reduce privacy, and 32% believe it is an outright threat to humans.13

What Machine Learning Is

For the purposes of this paper — knowingly not satisfying technical readers, but providing enough context to explain the concept to laymen — the definition of machine learning is the use of advanced statistical methods to predict outcomes, such that the model reduces in error with additional data.1

Thus, machine learning is applicable to any situation in which it is helpful to predict an outcome. This purpose is the same as the pre-existing field of applied statistics, and many of the statistical models used in machine learning have existed for decades. Machine learning is special given the ability to use massive quantities of data to “train” and improve the statistical models, only possible through advances in computing power necessary to process the vast sums of data. There have yet been four evolutions of machine learning.2 These are internet, business, perception, and autonomous.

Internet machine learning algorithms are used to recommend products and services, such as the Netflix “recommended for you” feature; the more shows a user watches, the better Netflix recommends shows a user likes.

Business machine learning goes further. Consider Pandora and Spotify. Pandora began as an initiative to catalog the 450 characteristics of the entire worldwide library of music. It did this by hand, hiring 75 musicians to listen to songs and ascribe qualities to each song.3 Pandora’s founders then hired a mathematician to write a model predicting user preferences based on the qualities of the songs of which users listened. Spotify, however, began on a premise of machine learning. Spotify’s algorithms might have access to a database characterizing music with human descriptors (loud, allegro, short, etc.). These are called “strong features,” highly correlated data points. A person who tapped the “like” button for classical music is obviously likely to enjoy more classical music — this is an example of a strong feature. Machine learning goes beyond and considers “weak features.” Spotify’s algorithm is called BART (Bandits for Recommendations as Treatments) and considers three factors: natural language processing, raw audio analyzation, and collaborative filtering.4 Spotify does not use humans descriptors to determine if a song’s “vibe” is “chill,” for example. The computer compares the raw 0s and 1s against other songs it was told are “chill.” Mathematics used in machine learning is not necessarily linear, meaning that Spotify engineers could not explain why a song was recommended for a particular user. The lack of explainability resulting from the use of weak factors is major issue in machine learning — we observe the algorithms increase in accuracy with additional data, even though we cannot explain why individual predictions were made.

This leads to the third evolution, perception machine learning. Spotify’s BART algorithm cannot listen to music. It is a machine. Likewise, a machine cannot see a painting or a face, and process it as our minds. Perception machine learning uses the latest breakthrough in machine learning called “deep learning” to digitize (that is, transcribe into binary digits) features of the world.5 Apple uses this technology to use facial features as passcodes. According to Kai-Fu Lee, this technology will bring about the first wave of truly futuristic products and services, such as a shopping cart recognizing the customer, downloading the customer’s data, such as current contents of the customer’s refrigerator and usage history, and predicts the person’s grocery needs until the next grocery store visit (assuming, of course, the refrigerator did not already purchase regularly used items for online delivery).

The final evolution is autonomous machine learning.6 This evolution will enable machines to safely operate themselves without needing a human to code and monitor every potential action. A common example includes autonomous vehicles. Another example is a strawberry farming robot, of which strawberry picking is a delicate and difficult task, even for a human.

To conclude a discussion of what machine learning is and is not, consider a final example of what machine learning is not. Think of a quantitative trading hedge fund, say Jim Simon’s flagship Medallion Fund. The computer’s function is to make a dizzying sum of financial market trade recommendations and to execute those trades with a direct connection to the financial exchange through an API. For decades, the models of quant hedge funds have been inexplicable, far beyond the capability of any human to understand. Those funds have made many extraordinarily wealthy, given those wanting to start a quant hedge fund have enough money to start a fund and have a team with the right technical skills to write machine learning programs. PhD mathematicians write the software, design the model parameters, and feed the machine market data (and any other data its creators choose, e.g. thermal imaging maps of manufacturing sites used to predict production volume and therefore quarterly sales). But, what are the limits? The model cannot decide to find a list of Walmart executives; learn about Facebook, acquire data from Facebook; learn about sociology, discover a Walmart executive is having an affair; learn about psychology, crime, and email, acquire a Gmail account, and blackmail the executive; all this, for the purpose that the executive will make a decision the computer will then execute a profitable trade against. (Or worse, for a purpose other than the original goal of optimizing trading profit.) Humanity simply does not — and hopefully never will — have the capability of teaching a model to discover problems and pursue the data and understanding of those problems to solve them (or create problems benefiting the machine!).

China vs. United States

In the theory of competitive strategy, there is a concept of first mover and fast follower. Organizations — businesses, nations, etc. — must decide to pioneer or follow the pioneers. Conscientious decision makers will carefully manage labor, capital, and technology7 to take advantage of whatever strategy was chosen. According to Yale’s Professor Paul Bracken, Yale has often taken a fast follower strategy, allowing Harvard and other Ivy Leagues to evaluate new ideas (with significant expenditure), after which Yale invests in only the best endeavors. Apple’s graphical user interface was based on one created by Xerox. IBM, a typewriter company, took over a computing market started by UNIVAC.

In this way, the United States is a first mover in machine learning, and China is a fast follower. China’s internet companies began by copying — down to fonts, logos, spacing, colors — American webpages and internet products. This reveals a crucial difference in ethos between the West and East. Silicon Valley is built on ideas: unique, inspirational, focused, and clean. China, however, does not have a moral or social aversion to copying, and instead considers copying as the “building blocks” for innovation.8 Since the beginning of the internet age, China has built and transformed its own market for digital products centered on Chinese consumer preferences and is expanding globally.

Today, the United States has an advantage in research and application for all four evolutions of machine learning.9 The United States is home to the world’s top experts and the most advanced applications.10 However, China is learning from U.S. innovations and applying them — albeit, in less advanced and beautiful methods — to a larger volume of problems. It is reasonable to assume China will catch up in their machine learning capabilities and export products around the world, to great benefit of their economy and detriment of the U.S. economy.

China enjoys two additional advantages: China has significantly greater access to data, and the Chinese government spends heavily on innovation in machine learning. Machine learning requires data, for which the Chinese neither prefer nor have the data privacy protections required by the U.S. and Europe. Additionally, the Chinese government is not averse to literally knocking down cities to create innovation centers, as has been done in many cities and, most notably, in Zhongguancun.11

China also actively engages in the purchase of technologies through venture capital investment in American startups. Since 2000, China has invested $66 billion USD over 5,000 transactions, an average of $13 million per investment. While these investments were allocated toward all kinds of startups, this is an efficient, modern, and legal method of acquiring a nation’s intellectual property. Conversely, the United States has invested $47 billion USD in Chinese startups across 2,700 transactions.15

National Plans

The Chinese State Council published the “Development Plan for a New Generation of Artificial Intelligence” in 2017,16 listing extensive uses for artificial intelligence and outlining the intent to become the global leader in machine learning by 2030, with milestones in 2020 (top tier reputation) and 2025 (new breakthroughs). The breakthrough piece is important: the last breakthrough was deep learning in 2010; further breakthroughs will leapfrog current capabilities. However, it is likely that other nations will copy the technology in the same way other nations are currently copying the U.S.

President Obama released a strategic plan a month before the 2016 election, which received little reception due to the election news cycle, and merely recommended further study.12

President Trump, who has the advantage of significant technological development and application in the last four years since President Obama’s report, has a White House website14 dedicated to artificial intelligence. He signed Executive Order 13859 in early 2019, which created five pillars of the United States artificial intelligence strategy:

1. Investment in research and development

2. “Unleash” the technology toward real-world application

3. Remove barriers to innovation

4. Train the workforce

5. Promote foreign policy receptive to American artificial intelligence innovation

The website outlines a thorough and detailed plan to execute on these pillars across relevant agencies and industries, describes support for necessary infrastructure, and hosts its first progress report published in late 2019.

In addition to national plans, public perception is important in the national pursuit of technology innovation. The United States population has a negative attitude toward artificial intelligence, with only 41% of a Brookings survey responding somewhat or very positive.13 A significant factor might be the confusion regarding what machine learning is and is not, with myths perpetuated by Hollywood, general lack of awareness of highly technical fields, and politicians like Andrew Yang.

Recommendations

The United States has set forth a thorough plan for the advancement of machine learning technology. In addition to the plan outlined at www.whitehouse.gov/ai, I recommend emphasis on the following categories, to create and prepare for a near future in which the technology is ubiquitous, and to improve public acceptance:

1. Regulatory Enablement of Data Use Safe and private data use will advance the effectiveness of machine learning

2. Regulatory Infrastructure for Machine Learning Create order and prevent abuse of machine learning

3. Regulatory Protection of Intellectual Property Prevent inappropriate theft and otherwise legal exportation of the technology toward international actors who do not have the United States’ best interests in mind

4. Support of Government Adoption of Machine Learning Mandate a priority for machine learning applications in upgrades to the government enterprise

Regulatory Enablement of Data Use

This recommendation addresses China’s advantage in the greater access to data sources resulting from the legal and moral exploitation of data that would be illegal or immoral in the United States. The idea behind this recommendation is to create the regulatory infrastructure to allow the sanitization, sale or otherwise transfer, and usage of data without abusing private information.

Regulatory Infrastructure for Machine Learning

The input — data — into machine learning is one aspect needing order, but so are the outcomes of machine learning algorithms. The United States should actively research and anticipate regulatory needs to ensure machine learning tools do not abuse human rights. An example of an activity needing oversight is the extent a mathematical model is free to take decisions on behalf of a customer, such as a “robo” financial advisor choosing to purchase certain financial products.

Regulatory Protection of Intellectual Property

While there are already extensive intellectual property laws, the development of machine learning technologies is a matter of grand national strategy. The United States should take care to prevent foreign direct investment and other theft which intends to export technology cheaply from the United States for the intention of subverting the United States’ status as the global leader in liberty and stability, as enabled by the strength of our economy and currency.

Support of Government Adoption of Machine Learning

Just as government investment in technology fueled Silicon Valley following World War II, government agencies should be directed toward leapfrogging generations of technologies when updating technological infrastructure and processes. Many government systems and processes are decades old, as was revealed by Covid and state welfare technology systems, and IT programs to upgrade these systems should prioritize the adoption of machine learning tools. Just as defense investment fueled products geared toward national security and consumer application, this investment will expand the number of companies and people employed in the United States so that machine learning is not a niche of Silicon Valley, but becomes a ubiquitous skillset in all geographies and industries.

1Interview with Paul Jeffries, a former data scientist at Fannie Mae

2”AI Superpowers,” Kai-Fu Lee, page 105

3https://www.startupgrind.com/blog/origin-story-the-founding-of-pandora-radio/

4https://www.dittomusic.com/blog/how-does-spotifys-algorithm-work-streaming-hacks-for-musicians#:~:text=Algorithms%20look%20for%20how%20those,up%20in%20your%20Discover%20Weekly.%22

5”AI Superpowers,” Kai-Fu Lee, page 117

6”AI Superpowers,” Kai-Fu Lee, page 128

7Technology and Grand Strategy class notes, Professor Paul Bracken, Yale University

8”AI Superpowers,” Kai-Fu Lee, pages 26, 33

9”AI Superpowers,” Kai-Fu Lee, pages 134–136

10”AI Superpowers,” Kai-Fu Lee, page 20

11”AI Superpowers,” Kai-Fu Lee, page 51

12https://obamawhitehouse.archives.gov/blog/2016/10/12/administrations-report-future-artificial-intelligence

13https://www.brookings.edu/blog/techtank/2018/05/21/brookings-survey-finds-worries-over-ai-impact-on-jobs-and-personal-privacy-concern-u-s-will-fall-behind-china/

14https://www.whitehouse.gov/ai/

15https://www.ncuscr.org/sites/default/files/page_attachments/RHG_Disruption_US%20China%20VC_January2020.pdf

16”AI Superpowers,” Kai-Fu Lee, page 98