Cool Datasets

A place to find cool datasets. Follow us on Twitter for updates! @cooldatasets

Now anyone can submit datasets! Submit

alt text


Government-Datasets

City of Chicago Employee Salaries
This file contains salaries for the City Of Chicago
Toxic Inventory Chemicals
The Toxics Release Inventory (TRI) makes available information for more than 600 toxic chemicals
2016 Election Results by State
2016 National popular vote tracker compiled by David Wasserman
Crime in the United States.csv
by Volume and Rate per 100,000 Inhabitants, 1994–2013. Includes Violent Crimes, Murders, Rapes, Bu
Hip and Knee Complications Dataset from the CDC
This data set includes provider data for the hip/knee complication measure, and the Agency for Healthcare Research and Quality (AHRQ) measures of serious complications.
Payment and Value of Hospital Care
This data set includes provider data for the payment measures and value of care displays associated with a 30-day episode of care for heart attack, heart failure, and pneumonia patients.
City of Phoenix Employee Salaries
City Official's salaries for the City of Phoenix, Arizona.
Construction Activity in the United States
United States Department of Commerce dataset of total value of construction currently put in place.
Amendments in America
11,000 proposed amendments to the United States Constitution from 1787-2014
Louisville Crime Statistics
Crime in Louisville, Kentucky from 2003 to 2016
Police Cruiser Districts
Dataset of police cruiser district locations in Columbus Ohio
FCC Complaint Calls
List of informal consumer complaint calls regarding unwanted robocalls and telemarketing calls.
White House Staff Salaries Dataset
Information on the salaries of staff at the White House
EU Climate Change Mitigation Policies
This dataset contains a number of climate change mitigation policies and measures (PAM) implemented or planned by European countries to reduce greenhouse gas emissions.
Officer Involved Shootings Austin Texas
Officer Involved Shootings in Austin Texas from 2000-2014
Hillary Clinton Income Taxes
Adjusted gross income and taxes owed by Hillary are included for each year from 2000-2015.
Presidential Debate Tweets
2000 tweets immediately following the first Presidential Debate in September 2016
The Open Data Dataset
A dataset containing the Open Data Portals of 100 of America's largest cities
White House Nominations
800 White House nominations and appointments

Science-Datasets


Entertainment-Datasets

Top 100 Rotten Tomatoes Movies
Movies with 40 or more critic reviews vie for their place in history at Rotten Tomatoes. Eligible movies are ranked based on their Adjusted Scores.
Spotify Songs
50 Most Streamed Spotify Songs
Bookie Backer Football Datasets
Weekly updated football datasets.
TED Talks Dataset
Master list of 2,600 Ted Talks and descriptions
Top 500 Albums
Dataset of Rolling Stone's 500 greatest albums of all time

Machine-Learning-Datasets

Stanford Drone Dataset
Images and videos of various types of agents (not just pedestrians, but also bicyclists, skateboarders, cars, buses, and golf carts) that navigate in a real world outdoor environment
20 Newsgroups Dataset
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Hate Speech Identification
A sampling of Twitter posts that have been judged based on whether they are offensive or contain hate speech, as a training set for text analysis.
Forest Fire Dataset
The aim of this data is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.
Image Processing Datasets
Curated datasets from Computer Vision Online
Natural Language Question and Answer Dataset
The largest human created question answer dataset for natural language processing
Microsoft MARCO Dataset
A reading comprehension dataset for the AI research
2000 Positive Words Sentiment Dataset
2000 positive words used for sentiment analysis
Youtube's 8M Dataset
8Million video URLs, 500K hours of video
Comma.AI Driving Dataset
7 hours of self-driving training data from Comma.ai
Uber Movement
Anonymized data from over 2 billion Uber trips.
Standard Remibursement Rates for Travel
200,000 standard reimbursement rates for travel among various U.S. destinations
Galton's Pea Dataset
Francis Galton introduced the correlation coefficient with an analysis of the similarities of the parent and child generation of 700 sweet peas.
Diamond Quality
Sample dataset of 350 diamonds, their color, size, clarity, and price
Deep Fashion
Categorized database of 800,000 fasion images
Wells Fargo Deposits
Wells Fargo branch deposits by US states and counties
Instacart Orders and Customers
3 million Instacart Orders, Opensourced

Miscellaneous-Datasets