Source: Deep Learning on Medium
This article clears the fog about the domains which have become popular recently. In simple terms, a domain is a string(a sequence of characters) which helps a browser, using DNS services, to locate (DNS to IP mapping) the server which is hosting the requested website. This string (domain name) is maintained by several DNS hosting website companies. A typical example of a domain is .com.
One of the formal definition of domain from wikipedia:
“A domain name is an identification string that defines a realm of administrative autonomy, authority or control within the Internet.”
With the rise of Artificial Intelligence/Machine and Deep learning/Blockchain wave, some domains have gathered much prominence than they had it before. These domains include .ai,.ml,.io. One line introduction to these domains:
While ai and ml have gathered recent prominence because of their phonetic similarity to abbreviated Artificial Intelligence (AI) and Machine Learning (ML), IO domain has its own set of occupied technologies. So much of the attention that the .ai domain has garnered that most of the major tech companies (Facebook, Google, Amazon, Microsoft, etc.,) have their brands established on .ai domain
This article unleashed some of the insights into these domains.
Following is the outline of the idea which helped to do analysis:
Get the list of websites that are hosted on all the 3 domains
For each website, get a few properties which will be useful for the categorizing
While any language is equally good to implement, I have used Python to simplify the automation and parallelisation (using threads) of the tasks.
1. Getting the list of valid websites:
To get the list of valid names, I had to scrap the search results of most of the major search engines (Google/Bing/Baidu/Yahoo/Ask). To make things quick and simple, I have obtained 1000+ website names per domain to scrape.
2. Gathering different properties of websites:
For each website, I had to retrieve the content, read properties of websites to get the required analyses. Furthermore, I had to get Alexa information as well to retrieve more insights.
Results and Analysis:
“A picture is worth a thousand words” complemented by “The drawing shows me at a glance what would be spread over ten pages in a book” by Ivan Turgenev, Father and Sons (1862).
Keeping in the above quote in mind, I have presented most of the analysis in graphs and pictures, rather than words. In case, you are still interested in the words, please scroll down to the Conclusion section
Analysis 1: Website hosted
Location of the hosting servers have been obtained from Alexa. I have consolidated the numbers (number of servers that hosted per country) of all the domains. Each bubble shows the number of servers that are hosted in a country. For example, USA (438) means that 438 websites are hosted in the servers that are located in the USA.
Analysis 2: Rank Analysis
We have 2 good ways of getting rank (popularity) of a website:
a. Google’s Pagerank: Its been some years (around 2013) that Google has stopped updating page rank. Though I couldn’t find the official source to confirm this, I can tell you that the web had flooded with blogs/news. Please refer this unofficial, but a credible source, link for more information.
b. Alexa’s Rank: Alexa, from Amazon, has provided an alternative to Google’s page rank. I had to get the Alexa’s Traffic rank. If you are interested to know how Alexa ranks each website, please do take a look at this. For each website I have obtained Alexa’s rank and scatter plotted to get the below plot:
Do remember rank is lower the better
Analysis 3: Key word Analysis
Each website comes with a optional list of key words which describes the content. These key words helps the search engines to easily identify the information present in the site.
WordCloud (or Tag Cloud ) is a best way to give visual representation of the key words. Font size of a word becomes bigger with the frequency in the list. To draw this, I have used Mueller’s WordCloud :
Since key words were very much differed in each domain, I had to generate seperate image for each domain:
Following points might add some insights to yours after looking at above pictures:
As expected most of the hot technologies like data science, AI, etc., stayed in the AI domain. A quick glance can reveal the hosted websites belongs to AI/Analytics/ Recruiting/ Security.
While the most of the hot technologies/key words appeared in AI domain, blockchain has moved to IO domain.
Interestingly, “blockchain/bitcoin/startup” as the key word also appeared in IO domain frequently in IO domain than in the AI domain.
Ranking wise (Figure 2) , IO domain has surpassed AI domain because of the relatively earlier entry of crypto currency, block chaing and their popularity..
ML Domain: Movie Land :)
Surprisingly, atleast to noviciates like me, is most of the popular woods(tolly, holly, bolly) have appeared in the domain with the high frequency. Domain seems to be the most favorite domain for movie industry related websites.
This (entertainment quotient) may be the reason that the ML domain has the best rank (refer Fig 2) surpassing AI and IO domain.
Common to all the 3 domains:
Most of the websites are hosted in US, China and India taking the top 3 places.
In case, you are interested to take a look at the code, please do visit my github repo: Github link
For more articles, please visit my blog
Hope you find the results interesting. Do leave a comment
PS: Since all results are dynamic, they are subject to change.