Cool Datasets

A place to find cool datasets. Follow us on Twitter for updates! @cooldatasets

Now anyone can submit datasets! Submit

alt text


Government-Datasets

City of Chicago Employee Salaries
This file contains salaries for the City Of Chicago
Toxic Inventory Chemicals
The Toxics Release Inventory (TRI) makes available information for more than 600 toxic chemicals
2016 Election Results by State
2016 National popular vote tracker compiled by David Wasserman
Crime in the United States.csv
by Volume and Rate per 100,000 Inhabitants, 1994–2013. Includes Violent Crimes, Murders, Rapes, Bu
Hip and Knee Complications Dataset from the CDC
This data set includes provider data for the hip/knee complication measure, and the Agency for Healthcare Research and Quality (AHRQ) measures of serious complications.
Payment and Value of Hospital Care
This data set includes provider data for the payment measures and value of care displays associated with a 30-day episode of care for heart attack, heart failure, and pneumonia patients.
City of Phoenix Employee Salaries
City Official's salaries for the City of Phoenix, Arizona.
Construction Activity in the United States
United States Department of Commerce dataset of total value of construction currently put in place.
Amendments in America
11,000 proposed amendments to the United States Constitution from 1787-2014
Louisville Crime Statistics
Crime in Louisville, Kentucky from 2003 to 2016
Police Cruiser Districts
Dataset of police cruiser district locations in Columbus Ohio
FCC Complaint Calls
List of informal consumer complaint calls regarding unwanted robocalls and telemarketing calls.
White House Staff Salaries Dataset
Information on the salaries of staff at the White House
EU Climate Change Mitigation Policies
This dataset contains a number of climate change mitigation policies and measures (PAM) implemented or planned by European countries to reduce greenhouse gas emissions.
Officer Involved Shootings Austin Texas
Officer Involved Shootings in Austin Texas from 2000-2014
Hillary Clinton Income Taxes
Adjusted gross income and taxes owed by Hillary are included for each year from 2000-2015.
Presidential Debate Tweets
2000 tweets immediately following the first Presidential Debate in September 2016
The Open Data Dataset
A dataset containing the Open Data Portals of 100 of America's largest cities
White House Nominations
800 White House nominations and appointments

Science-Datasets

Electric Arc Shock Tube (EAST) Test 59 Data
The Ames Electric Arc Shock Tube (EAST) Facility is the only shock tube...
NASA Financial Budget Documents, Strategic Plans and Performance Reports 1997: NASA Budget
NASA Financial Budget Documents, Strategic Plans and Performance Reports for fiscal year 1997.
British Library Labs Collections
A comprehensive list of research resources, including catalogues, metadata, archives and records, which can be accessed both online
Atlantic Offshore Seabird Datasets
60 East Coast data sets from 1906 to 2009, with over 260,000 records of seabird observations.
USDA Plant Species
90,000 entries, the Complete PLANTS Checklist is nearly 7 MB and includes Symbol, Synonym Symbol, Scientific Name with Authors, National Common Name, and Family.
Meteorite Landings Dataset
45,000 recorded NASA meteorite landings.
Near Earth Comets Dataset
Orbital elements of near earth Comets
Digitally Constructed Neurons
50k digitally-constructed and downloadable neurons
NASA Fireball And Bolide Reports
Chronological data summary of fireball and bolide events provided by U.S. Government sensors.
Extra-vehicular Activity of US and Russian Astronauts
Activities performed by an astronaut or cosmonaut outside a spacecraft beyond the Earth's appreciable atmosphere dating back to 1965.
Hurricane Tracking Dataset
Detailed tracking and info for tropical storms and hurricanes in the North Atlantic since 1851.
Historical Global Emissions Dataset
Global Carbon emission from 1751 to 2013 by Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy
Sunspots
Monthly numbers of sunspots, as from the World Data Center, aka SIDC from 1749-2013
Weight Gain in Rats
Ten rats are randomized to each of the four treatments. The question of interest is how diet affects weight gain. source source of protein given, a factor with levels Beef and Cereal. type amount of protein given, a factor with levels High and Low. weightgain weigt gain in grams.
Nitrogen Polution Loads
Nitrogen pollution from contributing sources in Bay watershed, pounds per year.
Acceleration Due to Gravity
Between May 1934 and July 1935, the National Bureau of Standards in Washington D.C. conducted a series of experiments to estimate the acceleration due to gravity, g, at Washington.
First Observation of Gravitational Waves
Also known as the GW150914 event, this observation from LIGO proved Einstein's prediction of general relativity

Entertainment-Datasets

Top 100 Rotten Tomatoes Movies
Movies with 40 or more critic reviews vie for their place in history at Rotten Tomatoes. Eligible movies are ranked based on their Adjusted Scores.
Spotify Songs
50 Most Streamed Spotify Songs
Bookie Backer Football Datasets
Weekly updated football datasets.
TED Talks Dataset
Master list of 2,600 Ted Talks and descriptions
Top 500 Albums
Dataset of Rolling Stone's 500 greatest albums of all time

Machine-Learning-Datasets

Stanford Drone Dataset
Images and videos of various types of agents (not just pedestrians, but also bicyclists, skateboarders, cars, buses, and golf carts) that navigate in a real world outdoor environment
20 Newsgroups Dataset
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
Hate Speech Identification
A sampling of Twitter posts that have been judged based on whether they are offensive or contain hate speech, as a training set for text analysis.
Forest Fire Dataset
The aim of this data is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.
Image Processing Datasets
Curated datasets from Computer Vision Online
Natural Language Question and Answer Dataset
The largest human created question answer dataset for natural language processing
Microsoft MARCO Dataset
A reading comprehension dataset for the AI research
2000 Positive Words Sentiment Dataset
2000 positive words used for sentiment analysis
Youtube's 8M Dataset
8Million video URLs, 500K hours of video
Comma.AI Driving Dataset
7 hours of self-driving training data from Comma.ai
Uber Movement
Anonymized data from over 2 billion Uber trips.
Standard Remibursement Rates for Travel
200,000 standard reimbursement rates for travel among various U.S. destinations
Galton's Pea Dataset
Francis Galton introduced the correlation coefficient with an analysis of the similarities of the parent and child generation of 700 sweet peas.
Diamond Quality
Sample dataset of 350 diamonds, their color, size, clarity, and price
Deep Fashion
Categorized database of 800,000 fasion images
Wells Fargo Deposits
Wells Fargo branch deposits by US states and counties
Instacart Orders and Customers
3 million Instacart Orders, Opensourced

Miscellaneous-Datasets

Titanic Passengers Dataset
Passenger information from the Titanic
United States Patents
United States patent information dating from 1790-2015
Global Airports Dataset
Name, City, Country, and Lat/Lon of 5000 Airports Around the World.
Oil Prices
Historical dataset for nominal and inflation adjusted oil prices since 1918
NYC Restaurant Inspections
This dataset provides restaurant inspections, violations, grades and adjudication information
NYC High School SAT Scores
A dataset containing all NYC High Schools average SAT scores in reading, writing and math
Air Traffic Dataset
San Francisco International Airport Report on Monthly Passenger Traffic Statistics by Airline.
EPA Fuel Economy
Fuel economy data are the result of vehicle testing done at the Environmental Protection Agency's National Vehicle and Fuel Emissions Laboratory in Ann Arbor, Michigan, and by vehicle manufacturers with oversight by EPA.
Beer Styles Dataset
A crowd sourced database of how well beer styles (Stout, Pale Ale, etc) and additions (chocolate, bacon, cherry) go with each other.
Water Use Dataset
Monthly residential water usage use by zip code. Numbers represent Hundered Cubic Feet (HCF) usage. Records from 2005-2013
United States Birth Rates
Birth Rates, by Age of Mother in the United States from 1940
Popular Baby Boy Names in Illinois
Top 25 boy names, each year from 1980-2013 including frequency.
Food Recalls by Brand
Most common food recalls by brand since 2009.
Homeless Population Dataset
Population of homeless in New York City Neighborhoods by year
EU Bank Interest Rates
This dataset covers euro-denominated deposits with an agreed maturity from euro area households (percentages per annum, rates on new business).
Barbershop Locations in Texas
3,000 Barbershop locations in Texas.
Jail Bookings Dataset
Miami-Dade Corrections jail bookings from May 29, 2015 to current.
Valet Parking Dataset
Valet Parking by District, Facility, and Locations in Philadelphia
Los Angeles Businesses
Listing of 470,000 business names and locations in Los Angeles
Street Trees
List of San FranciscoDepartment of Public Works (dpw) maintained street trees including: Planting date, species, and location
Public Libraries
Dataset of all public libraries in the United States
Death Probability Since 1900
historical and projected probabilities of death by single year of age, gender, and year for the period 1900 through 2010. Death Probabilities for Male.
STDs Nationally Ranked
U.S. states ranked by cases of Chlamydia, gonorrhea, and primary and secondary syphilis reported.
The World's Telephones by Year
The world's telephones by continent in the years 1951, 1956, 1957, 1958, 1959, 1960, 1961
International Energy Consumption
These data list total primary energy consumption by country and region in Quadrillion Btu. Figures are annual totals for the years 1980 through 2008
NYC Subways
1900 New York City subway entrance locations
Immigration to Ellis Island (1892-1924)
Dataset by trip, dates, ports, ships, and passengers.
Sovereign Bond Holdings Dataset
Data on sectorial holdings of sovereign bonds for 12 countries
1 million digits of Pi
Not necessarily a dataset but still cool
Kickstarter Datasets
Monthly datasets of all campaigns from Kickstarter.com
World Internet Users
A yearly look at the number of internet users around the World.