Using Deep Learning To Guess DNA Markers

For anyone following our weekly AI posts, last week we trained a deep network for country of origin. With all of our models we do something we call deep indexing where after the model is trained we can index the training set, and cluster it by label. Doing this Greenland stood out as being very different when compared to the other countries.

Greenland, your faces are unique

Well, we were thrilled to see that Greenland is also unique from a genetic perspective.

They have an isolated gene pool that is unique compared to other countries.

So wewondered, can we predict genetic haplogroups directly from someone’s face? We have already known for awhile that we can detect some disease, and the Greenland example is encouraging.

So we just happen to know a guy, that knows a guy, who knows someone in Columbia, who knows someone in a dark alley that knows how to get that dataset. I’m kidding, but I’m not kidding that I got that dataset. Training 30 different deep networks on each haplogroup we had we then ranked the networks to see which ones were the most predictive. Doing that we get:

I dropped everything lower than r=0.05

Some industries are happy to accept any r-value over 0.15 as being meaningful, in reality it depends. Some of these markers are higher than I anticipated.

BTW, if you used ANY of these types of features for hiring that would be illegal in the US even if you proved it wasn’t racist or sexist.

I always thought there would be a future where AI could render someone’s face using a DNA sample. This work doesn’t suggest that because DNA is so much more complicated than just haplogroups. It does suggest that you might be able to make a GAN face using haplogroup levels in the near future. What are your thoughts? Concerns?

Source: Deep Learning on Medium