Original article was published on Artificial Intelligence on Medium
Ultimate Guide for becoming Self Taught Data Scientist
Everything you will be needed in your journey for Data Scientist.
I hope you will become a data scientist in the future without buying online/offline courses / Bootcamps etc… Learn everything for free and don’t fear of getting a job, if you are a potential data scientist you will definitely get a job without anyone help. ..
Mathematics is very important in the field of data science as concepts within mathematics aid in identifying patterns and assist in creating algorithms. The understanding of various notions of Statistics and Probability Theory is key for the implementation of such algorithms in data science. →DataRegressed Team
To-Do List: A
- Multivariable Calculus
- Functions of several variables
- Derivatives and gradients
- Step function, Sigmoid function, Logit function, ReLU (Rectified Linear Unit) function
- Cost function
- Plotting of functions
- Minimum and Maximum values of a function
To-Do List: B
- Linear Algebra
- Vectors Matrices
- Transpose of a matrix
- The inverse of a matrix
- The determinant of a matrix
- Dot product
To-Do List: C
- Probability and Statistics Basics
- Mean, Median, Mode,
- Standard deviation/variance
- Correlation coefficient and the covariance
- Matrix Probability distributions (Binomial, Poisson, Normal)
- p-valueBaye’s Theorem Confusion Matrix, ROC Curve)
- A/B Testing
- Monte Carlo Simulation
To-Do List: D
- Optimization Methods
- Cost function Objective function
- Likelihood function
- Error function
- Gradient Descent Algorithm and its variants (e.g., Stochastic Gradient Descent Algorithm)
- Basic R syntax
- Foundational R programming concepts such as data types, vectors arithmetic, indexing, and data frames
- How to perform operations in R including sorting, data wrangling using dplyr, and data visualization with ggplot2
- R studio
- Basic Python syntax
- Object-oriented programming
- Jupyter notebooks
- Python libraries such as NumPy, Pylab, seaborn
- Matplotlib, pandas, Scikit-learn,
- TensorFlow, ,PyTorch .etc
Learn Data Basics
1. Learn how to manipulate data in various formats, for example, CSV file, pdf file, text file, etc.
2. Learn how to clean data, impute data, scale data, import and export data, and scrap data from the internet.
3. Some packages of interest are pandas, NumPy, pdf tools, stringr and etc.
4. Additionally, R and Python contain several inbuilt datasets that can be used for practice.
5. Learn data transformation and dimensionality reduction techniques such as covariance matrix plot, principal component analysis (PCA), and linear discriminant analysis (LDA).
Learn Data Visualization Basics.
Data visualization is a graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.
- Data Component
- Geometric Component
- Mapping Component
- Scale Component
- Labels Component
- Ethical Component
Learn Machine learning basics .
Machine learning is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence.
- Supervised Learning (Continuous Variable Prediction)
- Basic regression
- Multi regression analysis
- Regularized regression
- Logistic Regression Classifier
- Support Vector Machine (SVM)
- K-nearest neighbor (KNN) Classifier
- Decision Tree Classifier
- Random Forest Classifier
- Naive Bayes
- Gradient boosting
- Unsupervised Learning
- Kmeans clustering algorithm
- Hierarchical clustering
Form a team and practice all you learned on these platforms.
- Make friends
- Meet experts and talk with them
- Learn from experts
- Get a mentor
- Make yourself visible to the outside world
- It also helps you to get a good job in your dream companies