How is Artificial Intelligence in Medicine a Special Case?
There has recently been great controversy on social media over comments made on Feb 20th 2020 by Geoffrey Hinton, who by consensus is considered the most senior distinguished AI researcher in the world, and who is a recent winner of the Turing Award (the “Nobel Prize” of computer science). Here are Hinton’s comments:
“Suppose you have cancer and you have to choose between a black box AI surgeon that cannot explain how it works but has a 90% cure rate and a human surgeon with an 80% cure rate. Do you want the AI surgeon to be illegal?” — Geoff Hinton
Personally, as one who is a surgeon who performs surgery on the most delicate part of the body — the eye, and the retina in particular. And as one who is a computer scientist specializing in AI, and is founder & Technical Lead of a Healthcare AI company. And as one who served on the Alliance for Artificial Intelligence in Healthcare working group that provided detailed technical feedback to the FDA as it drafted its policy on AI. And as one who happens to find myself on the faculty of the world’s #1 cancer center MD Anderson (opinions are my own), I do feel I have an obligation to share my views on Geoff Hinton’s comments and the resulting controversy.
Hinton’s comments were roundly condemned as naive, false, dangerous, misleading, or unrealistic by physicians and machine learning practitioners alike. A handful of people, including myself, argued that some of what he said was valid.
The reason for the charged response was multifactorial. Majority of the responders did not seem to be aware of how drugs and medical devices are developed, tested, and brought into the clinic for use in the treatment of patients. AI in this sense fits into a pre-existing framework, albeit, one that needs some adjusting to fit its particular needs. Notably, the requirement for best practices around the issue of representative sampling in development and testing is a general issue that applies to all medical drugs and devices, not just AI. Many responders did seem unaware that essentially all tests and interventions in medicine today were obtained on populations. And the need for best practices in designing such studies is a general need in medicine, not a special one applying to AI alone. Studies to validate clinical tests and interventions should always strive be representative of the population. And post-study analysis should evaluate accuracy stratified on various demographic features — ethnicity, age, sex, geography, prior exposures, etc. This is a general requirement in no way particular to AI.
Another reason for the charged response to Hinton’s comment is that AI is indeed a sensitive subject. There is justifiably real concern about what impact it will have on jobs and people’s livelihood. This is a genuine concern, and any semblance of anthropomorphization of AI, not only stokes this fear but is typically also an inaccurate overstatement of what is currently feasible. It is hype.
The summary of my take is as follows:
- In medicine today, only safety, efficacy, and non-inferiority are required to introduce new interventions and diagnostic modalities. Of note, we do not fully understand how most drugs in use today work. Nonetheless, we continue to research and study in hopes of increasing our understanding of their mechanisms of action. In other words, our current system gate-keepers — i.e the FDA — do not require explainability in medicine. And had they been requiring it, most life-extending drugs in use today will still not be available. Hinton was therefore not wrong in pointing out that explainability is not a reasonable requirement to strictly hold for AI.
- Hinton used the term “AI surgeon.” He erred in anthropomorphizing AI. Healthcare AI is not a person, it is a device. The FDA for instance classifies it as Software as a Medical Device (SaMD), and this classification is apt.
- The need for representative sampling of demographic distributions and the negative consequences of not doing so have been in medicine since antiquity and will always be. They are in no way particular to AI. And best practices should always be followed in population studies.
“How is Artificial Intelligence in Medicine a Special Case?” In attempting to answer this important question, we will also necessarily be answering the complementary and equally important question: “How is Artificial Intelligence in Medicine Not a Special Case?”
I spend a great deal of my time thinking about the ethical implications of what we are doing as we implement and deploy AI in healthcare. And I was privileged to serve on the Alliance for Artificial Intelligence in Healthcare working group that provided detailed technical feedback to the FDA as they crafted their AI policy. AI in healthcare is still an emerging field and clearly the FDA’s policy must continue to evolve. The key point is that the patient’s best interest must remain central. And this is why for drugs and medical devices, the required criteria are safety, efficacy, and non-inferiority to standard-of-care. In other words: Is it safe? Does it work? And does it work as well or better than the current way?
Full understanding of the mechanism of a drug, for instance, would be nice to have, but is rarely attained. Yet we do not withhold safe treatments from patients when those treatments have been shown to work better than anything currently available. The same applies to AI. While full explainability would be nice, just like with drugs it may never be fully attainable. The FDA’s duties to the public are to keep it safe and in good health. And this includes getting patients access to safe clinical tests and interventions which have been rigorously demonstrated to improve their health.
In summary, essentially all clinical tests and interventions in use in medicine today were validated based on population studies that make certain assumptions. They are imperfect. Only safety, efficacy, and non-inferiority are currently required. Explainability is desired but rarely attained fully. If full explainability was a rigid requirement, there would be few if any drugs available to treat patients. We must continue to strive to ensure that validation is representative of everyone in the population. And we must strive to understand and quantify who each test works for and who it may not work so well for. Are women, African Americans, Hispanics, Asians, older people, younger people, people from New York, and people from Idaho represented in the data? I detail this more in the most recent issue of The Ophthalmologist, in a piece titled “Ethics in AI” — along with my colleagues Dr. Michael Abramoff, Dr. Pearse Keane, and Dr. Daniel Ting. Medicine is not perfect, but has come a long way. Indeed sub-stratification analysis is vital. Yet some options are objectively better than others. Hence we do extrapolate for practical reasons and will necessarily continue to do so, while also striving to reach precision medicine that looks and caters particularly to each individual.
Dr. Stephen G. Odaibo is CEO & Founder of RETINA-AI Health, Inc, and is on the Faculty of the MD Anderson Cancer Center, the #1 Cancer Center in the world. He is a Physician, Retina Specialist, Mathematician, Computer Scientist, and Full Stack AI Engineer. In 2017 he received UAB College of Arts & Sciences’ highest honor, the Distinguished Alumni Achievement Award. And in 2005 he won the Barrie Hurwitz Award for Excellence in Neurology at Duke Univ School of Medicine where he topped the class in Neurology and in Pediatrics. He is author of the books “Quantum Mechanics & The MRI Machine” and “The Form of Finite Groups: A Course on Finite Group Theory.” Dr. Odaibo Chaired the “Artificial Intelligence & Tech in Medicine Symposium” at the 2019 National Medical Association Meeting. Through RETINA-AI, he and his team are building AI solutions to address the world’s most pressing healthcare problems. He resides in Houston Texas with his family.