Normalized Discounted Cumulative Gain

Original article was published by Kurtis Pykes on Artificial Intelligence on Medium


Cumulative Gain (CG)

If every recommendation has a graded relevance score associated with it, CG is the sum of graded relevance values of all results in a search result list — see Figure 1 for how we can express this mathematically.

Figure 1: Cumulative Gain mathematical expression

The Cumulative Gain at a particular rank position p, where the rel_i is the graded relevance of the result at position i. To demonstrate this in Python we must first let the variable setA be the graded relevance scores of a response to a search query, thereby each graded relevance score is associated with a document.

setA = [3, 1, 2, 3, 2, 0]
print(sum(setA))
11

The problem with CG is that it does not take into consideration the rank of the result set when determining the usefulness of a result set. In other words, if we was to reorder the graded relevance scores returned in setA we will not get a better insight into the usefulness of the result set since the CG will be unchanged. See the code cell below for an example.

setB = sorted(setA, reverse=True)
print(f"setA: {setA}\tCG setA: {cg_a}\nsetB: {setB}\tCG setB: {sum(setB)}")
setA: [3, 1, 2, 3, 2, 0] CG setA: 11
setB: [3, 3, 2, 2, 1, 0] CG setB: 11

setB is clearly returning a much more useful set than setA, but the CG measure says that they are returning equally as good results.