Creating custom image datasets for Deep Learning projects.

Original article was published by Parul Pandey on Deep Learning on Medium


5. The Fastai way

The last method doesn’t use any browser extension. This method I picked up from Zacchary Mueller’s Practical-Deep-Learning-for-Coders-2.0 resource, which he has shared on Github. This code has been given by Francisco Ingham and Jeremy Howard’s work which in turn is inspired by Adrian Rosebrock

The method requires you to install the fastaia deep learning library as it utilized some of its inherent functions. To understand what is happening under the hood, you would require some knowledge of the library, especially the use of the data block API. Explaining that it is out of the scope of this article, but I would quickly go through the steps required to download the images:

  • Go to Google Images and search for the images you are interested in. Scroll down until you find the images you want to download. Let’s say we are interested in finding images of apple and mango.
  • Open the Javascript ‘Console’ in Chrome/Firefox and paste the following lines of code and execute. This will get all the URLs of the images and save them in a CSV file. Repeat the process for every category. Now you will have two CSV files, i.e. apple.csv and mango.csv.
urls=Array.from(document.querySelectorAll('.rg_i')).map(el=> el.hasAttribute('data-src')?el.getAttribute('data-src'):el.getAttribute('data-iurl')); 

window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
  • Next, create a folder for each category of images that you want to download.
folders = ['Apple','Mango']
files = ['apple.csv','mango.csv')
  • Finally, download the images
classes = ['Apple','Mango']
path = Path('fruits')path.mkdir(parents=True, exist_ok=True)
for i, n in enumerate(classes):
print(n)
path_f = Path(files[i])
download_images(path/n, path_f, max_pics=50)
  • Verify if the images are correct
imgs = L()
for n in classes:
print(n)
path_n = path/n
imgs += verify_images(path_n.ls())

Display the images

fruits = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(0.2),
get_y=parent_label,
item_tfms=RandomResizedCrop(460),
batch_tfms=[*aug_transforms(size=224,max_warp=0),Normalize.from_stats(*imagenet_stats)])
dls = fruits.dataloaders(path, bs=32)
dls.show_batch(max_n=9)
Downloaded images (Image by Author)

Here is a video showing the entire process:

Demo (Video by Author)