Hey welcome back! It’s a brand new week. And if you are unaware, we tapped into the gold mine of arXiv’s new code link feature (mentioned in last week’s newsletter). Recently, arXiv collaborated with PapersWithCode to conveniently link any associated repos to its accompanying paper on the abstract page (which is much better than stalking the PDF). 👇


Well… we wanted to know if we could extract links to say… all of the NLP-related papers published in a trailing 5-day week?!?! Please note that papers on arXiv are published Mon-Fri and can vary in totals ranging between 300–500 papers on a weekly basis for NLP-related material.

This past week, there were 330 papers published in the Computation and Language directory. Of these, 108 had GitHub links 👀. That’s roughly a .300 batting average (which is slightly above 17–20% rate of late). The rest didn’t have code but had the paper linked to the PapersWithCode website. The last 11 or so didn’t include either, these were excluded, giving us a total of 319. (Keep in mind that code can be added later on, so it is possible that some of the abstracts may have been populated with code in the last 72 hours and thus my stats can be slightly off)

The data dump has 3 fields:

URL of the abstract,

Title of the Abstract

Code to GitHub pages (if available) or PwC.

Cool fact: Older papers have been retroactively appended with code links even though this feature is only a week old.(e.g. For example, this paper was submitted in May but has a code url).

In conclusion, this has been an awesome time experimenting and the amount data obtained is kind of nuts. Tons of new libraries and associated notebooks were discovered pretty fast.

…if you can replicate this adventure, you are a true Jedi Master.👩‍💻