A Large Covid-19 CT Scans Dataset

Original article was published on Deep Learning on Medium


A Large Covid-19 CT Scans Dataset

In this story, I aim to introduce the new CT scans dataset that contains covid and normal cases. This dataset is one of the largest open-source datasets that has been proposed by now.
This dataset contains the full original CT scans of 377 persons. There are 15589 and 48260 CT scan images belonging to 95 Covid-19 and 282 normal persons, respectively.
All the CT scan images are the original 16-bit grayscale format with 512*512 pixels resolution that are produced by the CT imaging device. To protect the patients’ privacy, we converted the DICOM original format images to TIFF format, which contains the same image information but does not hold patients’ information. Also, this format is easier to use with programming libraries.
The CT scan images have been captured by a SOMATOM Scope imaging device model and syngo CT VC30-easyIQ software version for visualizing the lung HRCT radiology images from the patients.

Sample images in our dataset
Distribution of the patients in our dataset based on sex and age

As regular monitors can not visualize the 16-bit grayscale images, we developed a code that converts the images to a visualizable format. (Code available at the link to the GitHub repo at the end of this story)

Our dataset is constructed of two sections. The first part with the name (Training&Validation.zip) contains the images for training and validating the networks in five folds. You can also find the CSV files of the images(labels) in the CSV folder.

The second part (COVID-CTset.zip) contains the whole dataset for all the patients.

Each patient has three folders (SR_2, SR_3, SR_4), which each folder show one sequence of the lung HRCT scan images of that patient (One time the patient’s lung opens and closes).

All the information and data have been provided and verified by Negin Radiology Center located in Sari, Iran.

For more details, please refer to https://github.com/mr7495/COVID-CTset

This data belongs to this paper (Preprint is published on medRxiv & Researchgate):

The dataset is shared in this Google drive folder:

If you wish to use these materials and data, please cite it by:

@article {Rahimzadeh2020.06.08.20121541,
author = {Rahimzadeh, Mohammad and Attar, Abolfazl and Sakhaei, Seyed Mohammad},
title = {A Fully Automated Deep Learning-based Network For Detecting COVID-19 from a New And Large Lung CT Scan Dataset},
elocation-id = {2020.06.08.20121541},
year = {2020},
doi = {10.1101/2020.06.08.20121541},
publisher = {Cold Spring Harbor Laboratory Press},
URL = {https://www.medrxiv.org/content/early/2020/06/12/2020.06.08.20121541},
eprint = {https://www.medrxiv.org/content/early/2020/06/12/2020.06.08.20121541.full.pdf},
journal = {medRxiv}
}

I hope this dataset comes with the help of future researches.

For any questions contact me by this email: mr7495@yahoo.com.