Making Sense of Shapley Values

Source: Deep Learning on Medium

Alright, the first thing we’ll do is rewrite the initial equation somewhat

At first glance it might not seem as if we’ve made our situation any better, but bear with me. Soon I’ll break down the different parts of the equation in order to make sense of them, but let’s also define a toy scenario that we can use to make it all a bit less abstract.

Let’s say that we operate a factory that produces bricks. One of our production teams consists of four people: Amanda, Ben, Claire and Don (from now on I will refer to each of them by the first letter in their name). Each week they together manage to produce X amount of bricks. Since it’s going well for our factory we have a bonus that we want to distribute to the team members. But, in order for us to do that in a fair way we need to find out how much each person contributes to the production of the X amount of bricks per week.

The hard part here is that we have a couple of effects in play that all impact the number of bricks that the team can produce. One of them is team size since a larger team will result in more bricks being produced. Another might be how well the team members cooperate with each other. The issue is that we can’t quantify these effects in a meaningful way, but fortunately for us Shapley values can be used to side-step the issue.

We’ve now defined our players (A, B, C and D) as well as the game that they’re participating in (producing bricks). Let’s start by deciding how many of the X bricks produced can be attributed to Don, i.e. calculating the Shapley value for D. If we relate this back to the parameters of the Shapley value formula we have that

So D is our player i and the entire group N consists of all four team members, A, B, C and D. With that laid out let’s start by looking a bit closer at this part of the Shapley Value formula

It says that we need to take our group of people and exclude the person that we’re focusing on now. Then, we need to consider all of the possible subsets that can be formed. So if we exclude D from the group we’re left with {A, B, C}. From this remaining group we can form the following subsets (i.e. these are the sets that S can take on)

In total we can construct 8 different subsets of the remaining team members. One of these subsets ø is the null set, i.e. it has none of the members. Now let’s turn our focus to this part

This is we’re one of the fundamental concepts of Shapley values starts to come into play: the marginal value of adding player i to the game. So for any given subset S we’re going to compare its value to the value that is has when you also include the player i in it. By doing that we get the marginal value of adding player i to that subset.

If we relate it back to our example, we want to see what the difference in the amount of bricks produced each week is if we add D to each of our 8 subsets. We can represent these 8 marginal values visually as

You can view each of these as a different scenario that we need to observe in order to fairly assess how much D contributes to the overall production. This means that we need to observe how many bricks are produced if no one is working (i.e. the empty set ø) and compare it to what happens if we only have D working. We also need to observe how many bricks are produced by AB and compare that to the amount of bricks produced by AB together with D and so on for all 8 of the constellations that we can form.

Alright, we’ve now figured out that we need to compute 8 different marginal values. The summation in the Shapley value equation is telling us that we need to add all them together. However, we also need to scale each marginal value before we do that, which we’re told by this part of the equation

It calculates how many permutations of each subset size we can have when we’re constructing it out of all remaining team members excluding player i. Or in other words: if you have |N|-1 players, how many groups of size |S| can you form with them? We then use this number to divide the marginal contribution of player i to all groups of size |S|.

For our scenario, we have that |N|-1 = 3, i.e. these are the remaining team members when we’re left with when calculating the Shapley value for D. In our case we will use that part of the equation to calculate how many groups we can form of size 0, 1, 2 and 3 since those are only group sizes we can construct with the remaining players. So, for example, if we have that |S|=2 then we get that we can construct 3 different groups of this size: AB, BC and CA. This means that we should apply the following scaling factor to each of our 8 marginal values:

Let’s reflect just for a moment why we’re doing this. We want to know how much D contributes to the total output of the team. In order to do that we’ve calculated how much he contributes marginally to each constellation of the team that we can form. By adding this scaling factor we’re averaging out the effect that the rest of the team members have for each subset size. This means that we’re able to capture the average marginal contribution of D when added to a team of size 0, 1, 2 and 3 irregardless of the composition of these teams.

Okay, we’re almost done now, we only have one final part of the Shapley Value equation to break down which also at this point should be fairly straightforward to understand

We have one final scaling factor that we need to apply to all of our marginal values before being able to sum them. We have to divide them with the number of players participating in the game, i.e. the number of team members that we have in total.

Again, why are we doing this? Well, if we look at our brick factory example we’ve averaged out the effects of the other team members for each subset size, allowing us to express how much D contributes to groups of size 0, 1, 2 and 3. The final piece of the puzzle is to average out the effect of the group size as well, i.e. how much does D contribute irregardless of the size of the team. For our scenario we do this by dividing with 4 since that’s the amount of different group sizes that we can consider.

We have now arrived at the point where can finally compute the Shapley value for D. We have observed how much he marginally contributes to all different constellations of the team that can be formed. We’ve also averaged out the effects of both team member composition as well as team size which finally allows us to compute

I’m playing it fast and loose when it comes to using mathematical notation here, but this is more of a graphical illustration of what we’re doing than a mathematical one (it’s how I visualize it in my head).

There we have it, the Shapley value for D. After we’ve done this for the rest of the team will know each persons contribution to the X amount of bricks produced each week, allowing us to fairly divide the bonus amongst all team members.