Harvard & Google Seismic Paper Hit With Rebuttals: Is Deep Learning Suited to Aftershock…

Source: Deep Learning on Medium

Harvard & Google Seismic Paper Hit With Rebuttals: Is Deep Learning Suited to Aftershock Prediction?

The aftershocks that follow an earthquake can be even more dangerous and damaging than the main temblor, for example by collapsing already structurally weakened buildings. With deep learning emerging as something of a panacea in the world of science, AI researchers and seismologists alike are leveraging the tech in pursuit of better aftershock forecast solutions.

A major breakthrough seemed to occur in 2018 when a Harvard University and Google research team published the paper Deep learning of aftershock patterns following large earthquakes in Nature. The paper proposed a deep learning model that significantly improved aftershock location forecasts compared to previous methods. It went viral on social media and garnered global mainstream media coverage.

The paper’s popularity however has triggered its own aftershocks. This June Rajiv Shah, a data scientist at Boston-based DataRobot, blogged that the paper’s methods “simply didn’t carry many of the hallmarks of careful predicting modeling.” The Harvard & Google authors responded in sometimes acerbic tones, questioning Shah’s earthquake knowledge.

And earlier this month Nature published the paperOne neuron versus deep learning in aftershock prediction as a “Matters Arising” commentary on the Harvard & Google study. The paper suggests that a simple logistic regression model with two free parameters can yield the same results as the Harvard & Google paper’s more complex deep neural network approach.

This scientific back-and-forth raises the question: Is deep learning actually the best approach for aftershock prediction?

Deep learning aftershock location forecasts

Scientists have long studied aftershocks through empirical observations. Many of these observations have been formulated into empirical laws to help understand aftershock rates and magnitudes, such as the Omori law for aftershock frequency-time relationships, the Utsu law for aftershock productivity, and the Gutenberg-Richter law for magnitude ratio.

However, forecasting the location where aftershocks will occur remains an unsolved challenge. That motivated the Harvard University and Google researchers to bet on artificial intelligence. They trained a deep learning model on a labeled dataset of over 131,000 mainshock–aftershock pairs and tested its efficacy on a test dataset of over 30,000 pairs. The dataset includes some of the most powerful earthquakes in recent history, such as the magnitude 9.1 undersea quake that triggered a devastating tsunami, killing over 15,000 people and causing nuclear meltdowns in Tohoku, Japan, in 2011.

The system’s data input is stress-change tensors, which function like a multidimensional array providing multiple stress change component values at each point in space. The output is an aftershock location map reflecting “the predicted probability that a grid cell generates one or more aftershocks.”

The accuracy of the neural-network aftershock location forecasts was evaluated using receiver operating characteristic (ROC) analysis, which is widely used to assess the efficacy of diagnostic medical tests. The trained model with over 13,000 parameters scored 0.849 (area under curve) on predicting the locations of aftershocks, significantly outperforming the classic Coulomb failure stress change of 0.583.

The Harvard & Google paper’s first author, Harvard University seismologist Phoebe DeVries, told Nature she believes “machine learning is a powerful tool in that kind of (earthquake aftershock prediction) scenario.”

“Incredibly basic predictive modeling errors”

In June, Shah wrote in a Medium blog that he had found a major flaw in the Harvard & Google paper: data leakage, which in this data science context denotes unrealistic accuracy scores and a lack of attention to model selection. Shah claimed the model had achieved a higher score on the test dataset than on the training dataset, which is not normal because models are usually tailored to the data distribution of the training set.

“These are subtle, but incredibly basic predictive modeling errors that can invalidate the entire results of an experiment… If we allow papers or projects with fundamental issues to advance, it hurts all of us. It undermines the field of predictive modeling,” Shah wrote, adding that his intention was not to “villainize the authors of the aftershocks paper.”

Shah says he shared his results with Nature and DeVries in hopes of a correction, but Nature declined, telling him instead that “Devries et al. are concerned primarily with using machine learning as [a] tool to extract insight into the natural world, and not with details of the algorithm design.” DeVries and fellow paper author Brendon Meade meanwhile wrote a strongly-worded response letter to Nature, claiming Shah’s concerns had no scientific foundation.

“We are earthquake scientists and our goal was to use a machine learning approach to gain some insight into aftershock location patterns. We accomplished this goal. The authors of these comments do not,” wrote DeVries and Meade.

One neuron network achieves the same performance

Arnaud Mignan is a senior researcher at the Swiss Federal Institute of Technology, Zurich with expertise in catastrophe risk research and modelling. He is first author on the One neuron versus deep learning in aftershock prediction paper that more recently challenged the Harvard & Google study.

“The (Harvard & Google) deep learning model was significantly overfitting,” Mignan told Synced. Mignan says the reason the one neuron approach performs on par with deep learning is that aftershock patterns do not present highly complex data, rather they tend to be relatively simple.

Instead of using stress-change tensors, which are derived from observation data and computed based on deformation and other parameters, Mignan chose mainshock average slip and minimum distance between space cells and mainshock rupture as data inputs. Using a logistic regression model, Mignan’s approach not only equaled but also slightly improved AUC score to 0.86 compared to the Harvard & Google study’s 0.849.

“The authors (DeVries et al) did not flatten seismicity maps into n_x times n_y features (i.e. pixels) but used each geographic cell (i.e. each pixel) as one data sample. That meant that the problem was much simpler, not a computer vision exercise, but a basic fitting exercise (with only 12 features instead of the thousands or more that would have required deep learning),” says Mignan.

The latest public exchange on the matter comes from Harvard & Google study co-author Meade. In a comment published on the Nature Research website earlier this month, Meade dismisses Mignan’s findings as nothing new: “The fact that a neural network result can be closely approximated by a simpler model is a core result of our paper and one that we described in detail.”

In an email to Synced, Mignan said Meade’s comment “dismisses Occam’s razor, which is a pillar of science and an important mental model.”

While preparing this story Synced reached out to DeVries for comments but we have not thus far received a reply.

Despite all the concerns and controversies regarding the Harvard & Google results, Mignan told Synced he still believes deep learning can play an important role in earthquake research, particularly in statistical seismology. “Deep learning is efficient when dealing with unstructured data such as seismic waveform data, and satellite images are also a fantastic playground to apply computer vision (for geo hazard assessment, risk assessment, maybe even precursory pattern recognition).”