How to Qualify Trust in Deep Learning Systems

Source: Deep Learning on Medium

How to Qualify Trust in Deep Learning Systems

IBM Researchers proposed a fact sheet to assess trust in deep learning systems

Yesterday, I published an article about the new release of the Fairness Indicators for TensorFlow. The new stack focuses on quantifying fairness for machine learning models. Fairness is important because is an element to quantify the trust we can place in a deep learning model. But what does trust truly means in the context of deep learning systems? One of the most intriguing ideas in this area cam from IBM researchers last year in a paper proposing a new methodology for establishing trust in AI systems.

Trust is a foundational building block of human socio-economic dynamics. In software development, during the last few decades, we steadily built mechanisms for asserting trust on specific applications. When we get on planes that fly on auto-pilot or cars completely driven by robots we are intrinsically expressing trust on the creators of a specific software application. In software, trust mechanisms are fundamentally based on the deterministic nature of most software applications in which their behavior is uniquely determine by the code workflow which makes it intrinsically predictable. The non-deterministic nature of artificial intelligence(AI) systems breaks the pattern of traditional software applications and introduces new dimensions to enable trust in AI agents.

Trust is a dynamic derived from the process of minimizing risk. In software development, trust is built through mechanisms such as testability, auditability, documentation and many other elements that help establish the reputation of a piece of software. While all those mechanisms are relevant to AI systems, they are notoriously difficult to implement. In traditional software applications, their behavior is dictated by explicit rules expressed in the code; in the case of AI agents, their behavior is based on knowledge that evolves over time. The former approach is deterministic and predictable, the latter is non-deterministic and difficult to understand.

If we accept that AI is going to be a relevant part of our future, it is important to establish the foundations of trust in AI systems. Today, we regularly rely on AI models without having a clear understanding of their capabilities, knowledge or training processes. The concept of trust in AI systems remains highly subjective and hasn’t been incorporated as part of popular machine learning frameworks or platforms. What is AI trust and how can we measure it?

The Pillars of Trusted AI

Trust in human interaction is not only based on our interpretation of specific actions but it considers social knowledge built throughout centuries. We understand that a behavior is discriminatory not only by judging it on real time by also by factoring in a socially-accepted concept that discrimination is derogatory to human beings. How can we extrapolate these ideas to the world of artificial intelligence(AI). In their paper , the IBM team proposed four fundamental pillars to trusted AI:

· Fairness: AI systems should use training data and models that are free of bias, to avoid unfair treatment of certain groups.

· Robustness: AI systems should be safe and secure, not vulnerable to tampering or compromising the data they are trained on.

· Explainability: AI systems should provide decisions or suggestions that can be understood by their users and developers.

· Lineage: AI systems should include details of their development, deployment, and maintenance so they can be audited throughout their lifecycle.


AI fairness is typically associated with the minimization of bias in AI agents. Bias can be described as the mismatch between the training data distribution and a desired fair distribution. Unwanted bias in training data can result on unfair results. Establishing tests for identifying, curating and minimizing bias in training datasets should be a key element to establish fairness in AI systems. Obviously, fairness is more relevant in AI apps with a tangible social impact such as credit or legal applications.


Understanding how AI models arrive to specific decisions is another key principle of trusted AI. Arriving to meaningful explanations about the knowledge of AI models reduces uncertainty and helps to quantify their accuracy. While explainability might be seen as an obvious factor to improve the trust in AI systems, its implementation is far from trivial. There is a natural tradeoff between the explainability of AI models and their accuracy. Highly explainable AI models tend to be very simple and, therefore, not incredibly accurate. From that perspective, establishing the right balance between explainability and accuracy is essential to improve the trust on an AI model.


The concept of AI robustness is determined by two underlying factors: safety and security.


An AI system might be fair and explainable but still unsafe to use. AI safety is typically associated with the ability of an AI model to build knowledge that incorporates societal norms, policies, or regulations that correspond to well-established safe behaviors. Increasing the safety of AI models is another key element of trusted AI systems.


AI models are highly susceptible to all sorts of attacks including many based on adversarial AI methods. The accuracy of AI models is directly correlated to their vulnerability to small perturbations on the input dataset. That relationship is often exploited by malicious actors that can try to alter specific datasets in order to alter/influence the behavior of an AI models. Testing and benchmarking AI models against adversarial attacks is key to establish trust in AI systems. IBM has been doing some interesting work in this area.


AI models are constantly evolving making it challenging to trace its history. Establishing and tracking the provenance of training datasets, hyperparameter configurations and other metadata artifacts overtime is important to establish the lineage of an AI model. Understanding the lineage of AI models helps us establish trust from a historical perspective that is different to achieve by just factoring fairness, explainability and robustness alone.

A Factsheet for AI Systems

The subject of disclosures and transparency in AI systems is a very nascent area of research but one that is key to the mainstream adoption of AI. Just like we use information sheets for hardware appliances or nutrition labels in foods, we should consider establishing a factsheet for AI models. In their paper, IBM proposes a Supplier’s Declaration of Conformity (SDoC, or factsheet, for short) that helps to provide information about the four key pillars of trusted AI. IBM’s SDoC methodology should help answer basic questions about AI models such as the following:

· Does the dataset used to train the service have a datasheet or data statement?

· Was the dataset and model checked for biases? If “yes” describe bias policies that were checked, bias checking methods, and results.

· Was any bias mitigation performed on the dataset? If “yes” describe the mitigation method.

· Are algorithm outputs explainable/interpretable? If yes, explain how is explainability achieved (e.g. directly explainable algorithm, local explainability, explanations via examples).

· Describe the testing methodology.

· Was the service checked for robustness against adversarial attacks? If “yes” describe robustness policies that were checked, checking methods, and results.

· Is usage data from service operations retained/stored/kept?

The idea of establishing a factsheet for AI models is as simple as it is relevant to establish trusted AI systems. IBM’s SDoC is far from perfect but it’s a welcomed step in the right direction.