Source: Deep Learning on Medium
As stated several times in previous posts, the advances in sequencing technology have rapidly reduced the price to get a genome sequenced. Cost is not much of a bottleneck now. What is the bottleneck is how to properly clean the data, analyze it, and report the results.
In this paper, the authors translate this larger problem with regard to metagenomic data. This data is the study of genetic material that is taken from an environment — probably most notably are samples from the human gut or intestine. Yes, that does mean poop.
There is valuable information from the human gut, but we shall not dwell on that subject too long. These samples are littered with thousands if not hundreds of thousands of microorganisms. Due to the vast array of organisms in the metagenomic data, there is a need to develop methods that can classify the DNA fragments. Sounds like a learning problem!