Original article was published on Artificial Intelligence on Medium
Web scraping in Wikipedia is data scraping used for extracting data from websites. The web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
In more familiar words Web scraping is about downloading structured data from the web, selecting some of that data, and passing along what you selected to another process.
Methods of Web Scraping
A web scraping software will automatically load and extract data from multiple pages of websites based on your requirement. It is either custom-built for a specific website or is one that can be configured to work with any website. With the click of a button, you can easily save the data available on the website to a file on your computer.
This is what we are about to do right now, it is inspired by the medium post right here.
Glassdoor is a website where current and former employees anonymously review companies. Glassdoor also allows users to anonymously submit and view salaries as well as search and apply for jobs on its platform. In 2018, the company was acquired by the Japanese firm, Recruit Holdings, for US$1.2 billion.
Glassdoor does not have any public API for Jobs. This means that you have to do scraping if you want to get data about the job posting. Also, Glassdoor does not have an API for reviews either, which might be of interest to you.
Import from Selenium the appropriate packages.
Define a Function called get jobs to collect our data the same approach of the medium post listed above.
Now time to run the function:
at the end you get a file like this:
Download it into your desktop using the following code
I know there is a full tutorial online but I decided to hare it since I am writing in the first place for me to learn new things and this is one of the things I’ve learned today.