Original article was published by The Unlikely Techie on Artificial Intelligence on Medium
The 5 Most Common Types of Bias
If we approach the topic from a statistical point of view, there are five ways in which bias can creep into the results.
Confirmation bias is the inclination to look for, decipher, favor, and review data that affirms or bolsters one’s earlier individual convictions or values. Therefore, confirmation bias is a powerful type of cognitive bias with a critical impact on society’s correct workings by misshaping evidence-based decision-making.
An example of this is when you remember information selectively or make a biased interpretation of information given to you. Studies showed that we could even be manipulated to remember fake childhood memories. This indicates that people sometimes don’t even notice when they analyze data in a biased way (another psychological phenomenon that fits this category is wishful thinking).
Selection bias is the bias introduced by selecting individuals, groups, or data for analysis that does not achieve proper randomization, thereby ensuring that the sample obtained is not representative of the population to be analyzed. The term “selection bias” usually refers to a statistical analysis’s bias resulting from the sampling method. Therefore, it is essential to consider selection bias. Some conclusions of the study may be wrong.
An outlier is an extreme data value. For example, a 110-year-old customer or a consumer with $10 million in their savings account. You can identify outliers by carefully inspecting the data, especially when distributing the values. Since outliners are extreme data values, it can be dangerous to decide based on the calculated “average.” In other words, extreme behavior can have a significant impact on what is considered average. It is imperative to base your conclusions on the median (the average value) to have an accurate result.
Overfitting and underfitting
Underfitting implies that a model gives an oversimplistic picture of reality. Overfitting is the inverse (i.e. an overcomplicated picture). Overfitting risks causing a particular assumption to be treated as the truth, whereas it is not the case in practice.
How can this bias be counteracted? The most straightforward approach is to ask how the model was validated. If you receive a somewhat glazed expression as a reaction, there is a good chance that the analysis outcomes are so-called unvalidated outcomes and, therefore, might not apply to the whole database. Always ask the data analyst whether they have done a training or test sample. If the answer is no, it is highly likely that the analysis outcomes will not be applicable to all customers.
Basically, this happens when additional factors influence variables you have not accounted for. In an experiment, the independent variable usually affects your dependent variable. For example, if you want to investigate whether the need to exercise leads to weight loss, the need to work out is your independent variable and the weight loss is your dependent variable.
Disturbing factors are all other factors that also influence your dependent variable. They are additional factors that have a hidden influence on your dependent variable. Aggravating factors can cause two main problems: increased variance and the introduction of bias.
It is essential to confirm that the conclusion drawn from research and analysis results is not affected by distortions. Uncovering biased results is not the sole responsibility of the analyst concerned. It is the joint responsibility of all those directly involved (including the market participant and the analyst) to reach a valid conclusion based on the correct data.