The Most Basic Analysis of Data that We All Need to Know

https://cdn-images-1.medium.com/max/1024/1*S7tKnPBUyBW1vr37HkP4cw.jpeg

Data Science covers all such tools, techniques and technologies which help us handle data and use it for our good.

An analysis of data corresponds to a technique, mathematical concept or statistical value, with a determined purpose, that can shed some light on a set of data, whether they are some relation between variables, a change of dimension of the data, etc, each of them offers a specific vision picture, therefore the data scientist must recognize the nature of each of these analytical or any other, with a view to its objective or goal to be met.

1. Active Learning: Intelligent selection of samples to improve learning models, used within a loop to help define the domain of knowledge (field of knowledge to which the data belongs or on which you want to make the inference).

2. Agents based simulation: which simulates the actions and interactions of autonomous agents, to explain complex behaviors, which often come from very simple rules.

3. Analysis of variances (ANOVA): for hypothesis testing between two or more groups.

4. Association Rule Mining: data mining technique to identify the occurrences as a whole or interrelated.

5. Bayesian Networks: model of conditional probabilities to explain the causes of a data set.

6. Collaborative Filtering: known as “recommendations” allows you to suggest or eliminate items from a set by means of a history of actions by users, it consists of finding similar items or those who use it, such as genres of movies, video games, etc.

7. Coordinate Transformation: to provide a different perspective of the data.

8. Deep learning: Learning method through neural networks.

9. Design of Experiments: applies controlled experiments to quantify the effects on a system by changing its inputs.

10. Differential Equations: which explain the relationships between functions and their derivatives, used to formalize models and make predictions, sometimes resolved by numerical methods knowing their initial conditions.

11. Discreet Event Simulation: Simulates a series of discrete events in order to analyze the processes and achieve their optimization.

12. Discrete Wavelet Transformation: transforms the time series into a frequency domain preserving the location of the data.

13. Ensemble Learning: Multiple learning models and combination of outputs for high performance, the data scientist must be careful that it is not so complex that it becomes unmanageable.

13. Experts system: use of symbolic logic to reason about the facts, allows to return a conclusion or explanation understandable by humans, for example the symbolic analysis of data.

14. Exponential smoothing: used to remove artifacts and expected objects from a collection of errors or false outputs, unlike the moving averages that assign the same weight to the variables of the past, in exponential smoothing, the weights decrease. exponentially with the passage of time.

15. Factor Analysis: which describes the variability between the variables correlated in order to decrease the number of non-observed variables called factors, especially when there are incossible influences.

16. Fast Fourier Transform: to transform time series to domain frequencies efficiently, filtering the variation of time or noise in a data set.

17. Format Conversion: Creates a standard representation of the data from an unrecognizable source.

18. Fuzzy Logic: allows to find degrees of truth by estates, useful to clarify concepts when the categories are not well defined, including coming to separate “hot”, “cold”, “warm”, etc., as different domains .

19. Gaussian Filtering: used to filter data or remove noise, especially referring to images.

20. Generalized Linear Models: an expansion of the linear regression that allows the error does not necessarily have a normal distribution.

21. Genetic Algorithms: Involves models over generations inspired by the mutation and the crossing of parameters.

22. Search Grid: used to visualize the panorama of the discrete parameters in the exploration of problems.

23. Hidden Markov Models: Models of sequential data determined by discrete latent variables, but when observed could be continuous or discrete.

24. Hierarchical Clustering: conceit based on grouping large or small sets (clusters) in the data.

25. K-means and X-means Clustering: one of the grouping techniques.

26. Linear, non-linear and integer programming: set of techniques to minimize or maximize a function under certain parameters or restrictions.

27. Markov Chain Monte Carlo: sampling method using Bayesian models together with a given distribution of parameters.

28. Monte Carlo Methods: Set of coputational techniques for sampling using random numbers, required for multivariate sampling.

29. Naïve Bayes: Predicts the classes following the Bayes theorem, which uses conditional probabilities.

30. Neural Networks: learning made by adjusting the weights between the nodes by means of learning rules, widely used in the development of artificial intelligence and therefore in Machine Learning.

31. Remover Outlier Removal: eliminates noise by means of its identification.

32. Principal components analisys: allows to reduce the correlated data.

33. Random search: Ramdomización of parameters to find the best solution that the one that could have been found commonly.

34. Regression with Shrinkage Lasso: method of selection of variables and prediction combined with a possibility based on a linear model.

35. Sensitivity Analysis: involves testing individual parameters in a model or analytical to observe the magnitude of the effect.

36. Annealing Simulated: Name derived from the cooling process of the metal, by analogy change of temperature, in the convergence of several algorithms.

37. Stepwise Regression: a method of variable selection and prediction.

38. Stochastic Gradient Descent optimization of generated purpose for neural networks, and logistic regression models.

39. Support Vector Machines projection of the vector of specifications haia a space where the classes are separable.

40. Term frequency inverse Document Frequency: a statistical measure that measures the relative importance of a word with respect to the rest of the document.

41. Topic Modeling or Latent Dirichlet Allocation: identifies latent topics in text by examining the simultaneous occurrence of words.

42. Tree Based Methods: Structured models in graphs or trees in which each branch indicates a decision, works as a classifier.

43. Test T (T-test): hypothesis test used to test the difference between two groups.

44. Wrapper Methods: to help identify combinations of specifications in models within its spectrum of action.

The list is not exhaustive, but it presents a series of possibilities and uses that can be added to the baggage of skills that the data scientist possesses in his work of finding utility in an ocean of data.


The Most Basic Analysis of Data that We All Need to Know was originally published in High Data Stories on Medium, where people are continuing the conversation by highlighting and responding to this story.