For machine learning and data science projects data sets are necessary to train and test model. Furthermore, several datasets repository is available on the Internet. Some of them are freely available some need to pay money.
Here I will list some popular datasets resources over the Internet for machine learning projects.
1-Kaggle Datasets Repository (https://www.kaggle.com/datasets)
Almost 17,000 datasets are freely available ranging from student, marketing, business, cancer, diabetes, plants, social media, sports and many more. You can download the datasets freely after login to the website. Kaggle is now owned by Google and is a subsidiary of Alphabet corporation.
2-UCI ( University of California, Irvine) Datasets Repository
(https://archive.ics.uci.edu/ml/datasets.php)
Very popular datasets repository from the University of California, Irvine Campus. A variety of datasets are also available on this website. To download check out the URL in the heading.
3- KDNuggets Datasets Links
https://www.kdnuggets.com/datasets/index.html
KDNuggets does not have a data repository. However, links are given to the data sources which are authentic.
4- Government of India Datasets
(https://data.gov.in/resource-category/dataset)
Government of India provides data policies implementation, health improvement, employment, and other public and state-related policies. These datasets are freely available and you can use in your project.
5- Berkeley School of Information, University of California, Berkeley
Datasets (https://datascience.berkeley.edu/open-data-sets/)
This repository contains the following types of datasets.
1-United States government and demographics
2-International government and demographics
3- Health
4-Science
5- Technology and APIs
6-Sports and Entertainment
7-General Aggregation Sites
Source- URL https://datascience.berkeley.edu/open-data-sets/
6- Stanford University Datasets (https://snap.stanford.edu/data/)
The dataset repository contains datasets especially about social network and the Internet.