Python dict.fromkeys() function

Original article was published on Artificial Intelligence on Medium

Python dict.fromkeys() function

I recently started to take Harvard CS50’s latest AI course on edX and really enjoy it. I was working on one of its programming assignments, PageRank, based on Google’s search engine algorithm that ranks webpages according to their importance and relevancy:

Source: Wikipedia: PageRank

and I encountered a very peculiar bug. It took me quite some time to fix it, because I didn’t know the subtle implementation of Python’s dict.fromkeys() function.

The background is: I am trying to create a new dictionary new_pagerank based on an old dictionary old_pagerank . A simplfied old old_pagerank looks like this:

old_pagerank = {'1.html': {'3.html', '2.html'}, '2.html': {'3.html'}, ‘3.html’: {'2.html'}}

The key is a webpage, and value is a list of webpages that you can go from this webpage. I use dict ’s built-in function dict.fromkeys() to eliminate the redundancy of creating key from stretch.

new_pagerank = dict.fromkeys(old_pagerank.keys(), [])

So when I try to use list’s append() to update the value of new_pagerank, for example:

new_pagerank['3.html'].append('1.html')

instead of getting:

new_pagerank = {'1.html': [], '2.html':[], '3.html': ['1.html']}

I get:

new_pagerank = {'1.html': ['1.html'], '2.html':['1.html'], '3.html': ['1.html']}

I cannot believe my eyes. I only want to append the value '1.html' to key '3.html' by calling append() only on new_pagerank['3.html'] . Yet '1.html' is appended to lists of all keys. The same issue persists when I try to update the list. Even though I specify to only change one list, all lists will be affected.

It then occurs to me that maybe when I set empty list [] as the default value to all values of keys,

new_pagerank = dict.fromkeys(old_pagerank.keys(), [])

only one empty list is created, and all keys reference to the same empty list, rather than its own unique empty list.

new_pagerank = {'1.html': ['1.html'], '2.html': ['1.html'], '3.html': ['1.html']}

So I test this hypothesis by looping through new_pagerank ’s keys, and print out the id of their value,

key: '1.html'
value: ['1.html']
id: 4338529032
key: '2.html'
value: ['1.html']
id: 4338529032
key: '3.html'
value: ['1.html']
id: 4338529032

I get the same id 4338529032 for all three lists['1.html']. I am correct!

This bug is quite deceptive, because because I use the same function earlier and set default value to be 0 and there is no issue. The subtle different is that, 0 , unlike [] is not an object. Therefore, there is no pass-by-reference issue.

another_dict = dict.fromkeys(old_pagerank.keys(), 0)

Therefore, my final solution is:

new_pagerank = dict.fromkeys(old_pagerank.keys())

If I don’t pass any argument to dict.fromkeys, then Python will assign a default None value to all keys. When I want to update the list, I can first check whether it’s None :

if new_dict[outgoing_page] is not None:
new_dict[outgoing_page].append(page)
else:
new_dict[outgoing_page] = [page]

Now it’s working!

Yeah! Happy coding!