Original article was published on Artificial Intelligence on Medium
What is the difference between Data Analytics, Data Analysis, Data Mining, Data Science, Machine Learning, and Big Data?
It’s quite normal to confuse these terms with each other, but I will try to give a clear explanation.
They are closely related, as are the terms “Web”, “Internet” and “HTML”.These three terms refer to totally different fields or practices in reality.
In general, the collection is done in an industrial way, the purification is automatic and this on all channels (Mobile, Web, Server, etc.)
This is the analysis of the data that has been collected. The data collected is presented in a certain way, digestible or not. With Google Analytics we have a data analysis already built: “Average time spent on the site”, “Average number of pages visited” etc…
This is data mining. We extract the data that comes from the automated collection, we cross them with other data and we look for a pattern or a correlation between these data via standard “Regression” methods etc …
For example: What is the correlation factor for 18–25 at Pull & Bear and Zara?
There is a trend that shows that type X products are strongly bought when there are type Y events and especially during the N period
Data Science is the field that brings together data sciences including Data Analysis, Data Analytics, and Data Mining among others.
By comparison additions, subtractions are part of a larger set called “Arithmetic”.This set is included in a larger set called “Mathematics”.
It is a technique that involves giving data to a neural network so that it is able to “learn” patterns automatically in the data and is able to produce a response accordingly.
For example I give 1 Million lines in Excel format to my neural network, with 500,000 lines which describe black mice, and 500,000 which describe white mice. Each line contains the weight, the size, the number of mustaches, etc…
The engine will automatically detect the correlation factors that identify white mice compared to black (Weight, Sizes, etc.).
I can now ask the machine to guess if a mouse is white or black by entering parameters similar to those given to training it.
The name was given to the issues surrounding large volumes of data processing. It basically revolves around the Hadoop ecosystem which is a distributed file system.
When we talk about Big Data, it’s really huge volumes (at least 30GB per day which would make around 10TB per year)
Having a database of 120 Million rows is not big data!
All of these areas are extremely interrelated.
Those are my personal research, if you have any comments please reach out to me.
Welcome to my medium page