Princeton Web Census Data Release

We are releasing the entire Princeton Web Census data containing privacy measurements of 1 million sites conducted regularly from December 2015 to June 2019.

By Steven Englehardt, Gunes Acar, Dillon Reisman, and Arvind Narayanan.

Since 2015, we have conducted a web census to study third-party online tracking. Each month, our bot visits the web’s 1 million most popular sites and records data pertaining to user privacy, including cookies, fingerprinting scripts, the effect of browser privacy tools, and the exchange of tracking data between different sites ("cookie syncing").

Our open-source measurement software, OpenWPM, has been used in dozens of other studies. In 2016 we published a paper "Online Tracking: A 1-million-site Measurement and Analysis" based on a snapshot of this data, and released that snapshot.

Now we are releasing the entire Princeton Web Census data -- about 15 terabytes -- containing privacy measurements of 1 million sites conducted each month from December 2015 to June 2018.

We plan to run one or two more crawls in the next few months (until mid 2019), and we will update this data release periodically. (Update: November 2018 and June 2019 crawls are added to the release.)

Send an email to web-census-data@lists.cs.princeton.edu to request access to the dataset. Please tell us who you are and a high-level description of what you plan to use it for. (We'll approve all requests, but we'd like to get an idea of how people are using the data.)

Each month, we run measurements in eight configurations at scales ranging from 10,000 sites to 1 million sites, summarized here. Please visits dataset details page for usage information, timeline of changes and issues with the data.

Type of measurement Number of sites Sample
(1000 sites)
Stateless (cookies and other state cleared between visits to different sites) 1,000,000 Sample
Stateful (cookies and other state are loaded from a seed profile of 10K crawl) 100,000 Sample
Enabled automatic detection of identifying cookies (stateful, cookies and other sites retained between visits to different sites) 25,000 Sample A
Sample B
Visit home page + 4 inner pages per site (all other measurements visit only one page per site, the home page. This and all following measurements are stateless) 10,000 Sample
Ghostery installed and set to block all possible trackers 50,000 Sample
HTTPSEverywhere installed 50,000 Sample
Firefox set to block all third-party cookies 50,000 Sample
DoNotTrack header is turned on 50,000 Sample