Rescue crawls conducted by the public for sites that have announced that they are closing.
Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible.
IDs of tweets that mention Ferguson, Missouri between August 10th and August 27th, 2014 subsequent to the death of Michael Brown . Tweets collected by Ed Summers. He subsequently extracted the URLs from these tweets, and they were crawled by the Internet Archive. Please read Summers's article at inkdroid.org , with an update here , for more information. Photo: " Memorial to Michael Brown " by Jamelle Bouie
Web crawl snapshots generously donated from Accelovation . This data is currently not publicly accessible. From the site : Accelovation is pioneering the delivery of Insight Discovery™ software solutions that help companies move from innovation idea to product reality faster and with more success. Our solutions are used by leading firms in the Fortune 500 and beyond – companies from a diverse set of industries ranging from consumer packaged goods to high tech, foods to chemicals, and...
Demo crawl for National Oceanic and Atmospheric Administration (NOAA). This data is currently not publicly accessible. from Wikipedia : The National Oceanic and Atmospheric Administration (NOAA) is a scientific agency within the United States Department of Commerce focused on the conditions of the oceans and the atmosphere. NOAA warns of dangerous weather, charts seas and skies, guides the use and protection of ocean and coastal resources, and conducts research to improve understanding and...
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl BK from Alexa Internet. This data is currently not publicly accessible.
MP3.com Crawl from Alexa Internet. This data is currently not publicly accessible.
Crawl CRC from Alexa Internet. This data is currently not publicly accessible.
Crawl DJ from Alexa Internet. This data is currently not publicly accessible.
Immersive gaming environments R&D project for National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible. from Wikipedia : The National Digital Information Infrastructure and Preservation Program (NDIIPP) is an archival program led by the Library of Congress to archive and provide access to digital resources. The U.S. Congress established the program in 2000. The Library was chosen because of its role as one of the leading providers of...
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Data collected by Internet Archive on behalf of University of Michigan. This data is currently not publicly accessible. from Wikipedia : The University of Michigan, frequently referred to as simply Michigan, is a public research university located in Ann Arbor, Michigan, United States. It is the state's oldest university and the flagship campus of the University of Michigan.
2002 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Crawl TO from Alexa Internet. This data is currently not publicly accessible.
Data collected in 2005 by Internet Archive. This data is currently not publicly accessible.
Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
Data related to September 11th, 2001 collected by Internet Archive. This data is currently not publicly accessible. from Wikipedia : The September 11 attacks (also referred to as September 11, September 11th, or 9/11 were a series of four coordinated terrorist attacks launched by the Islamic terrorist group al-Qaeda upon the United States in New York City and the Washington, D.C. areas on September 11, 2001.
1996 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Crawl DL from Alexa Internet. This data is currently not publicly accessible.
Crawl ARC from Alexa Internet. This data is currently not publicly accessible.
Crawl ST from Alexa Internet. This data is currently not publicly accessible.
Crawl Robot from Alexa Internet. This data is currently not publicly accessible.
Mayoral crawls performed by Internet Archive. This data is currently not publicly accessible.
Demo crawl for the National Science Digital Library. This data is currently not publicly accessible. from Wikipedia : The United States' National Science Digital Library (NSDL) is an open-access online digital library and collaborative network of disciplinary and grade-level focused education providers. NSDL's mission is to provide quality digital learning collections to the science, technology, engineering, and mathematics (STEM) education community, both formal and informal, institutional and...
Pages captured from Yahoo! Video prior to removal of user uploads. Crawl Started February 2011. This data is currently not publicly accessible. from Wikipedia : Yahoo! Video is a video sharing website on which users could upload and share videos. The service is owned and created by Yahoo! Yahoo! Video began as an internet-wide video search engine and added the ability to upload and share video clips in June 2006. A re-designed site was launched in February 2008 that changed the focus to...
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
Crawl DZ from Alexa Internet. This data is currently not publicly accessible.
Crawl EH from Alexa Internet. This data is currently not publicly accessible.
TEST COLLECTION: Crawl of .edu and .gov sites started in June 2010.
Topic: crawldata
Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
Target product crawl data collected by Alexa Internet. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl Image from Alexa Internet. This data is currently not publicly accessible.
Crawl EI from Alexa Internet. This data is currently not publicly accessible.
Crawl EE from Alexa Internet. This data is currently not publicly accessible.
Crawl RECY from Alexa Internet. This data is currently not publicly accessible.
Crawl Title from Alexa Internet. This data is currently not publicly accessible.
Data collected in 2001. This data is currently not publicly accessible. from Wikipedia : Inktomi Corporation was a California company that provided software for Internet service providers. It was founded in 1996 by UC Berkeley professor Eric Brewer and graduate student Paul Gauthier. The company was initially founded based on the real-world success of the web search engine they developed at the university. After the bursting of the dot-com bubble, Inktomi was acquired by Yahoo!
Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
Data collected in 2005. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Data collected by Internet Archive. This data is currently not publicly accessible.
Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl DX from Alexa Internet. This data is currently not publicly accessible.
Crawl AUG from Alexa Internet. This data is currently not publicly accessible.
Crawl Test from Alexa Internet. This data is currently not publicly accessible.
Crawl Short from Alexa Internet. This data is currently not publicly accessible.
Crawl DH from Alexa Internet. This data is currently not publicly accessible.
Crawl GR from Alexa Internet. This data is currently not publicly accessible.
Crawl TS from Alexa Internet. This data is currently not publicly accessible.
Data related to Nigerian elections, 2001 collected by Internet Archive. This data is currently not publicly accessible.
Crawl of vox.com, September 2010. This was an attempt to preserve vox.com content as much as possible in the wake of service closure, September 30, 2010.
Topic: webwidecrawl
Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
Crawl data gather by Internet Archive on behalf of the Brookings Institute. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Demo crawl of scientific data. This data is currently not publicly accessible.
Youtube crawl performed by Internet Archive on behalf of the National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible.
Product DB data collected by Alexa Internet. This data is currently not publicly accessible.
Standards crawl data collected by Internet Archive. This data is currently not publicly accessible.
Data collected by Internet Archive on behalf of the Swiss National Library. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
Crawl performed by Internet Archive. This data is currently not publicly accessible.
2000 Election Crawl from Alexa Internet. This data is currently not publicly accessible.
Crawl F2 from Alexa Internet. This data is currently not publicly accessible.
Traffic files from Alexa Internet that are sanitized-- just base urls (no parameters) and time/date. This data is currently not publicly accessible. Covers the period from December 2001 to February 2009.
108
108
Apr 10, 2010
04/10
by
Wikipedia
web
eye 108
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
98
98
Apr 10, 2010
04/10
by
Wikipedia
web
eye 98
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
82
82
Apr 9, 2010
04/10
by
Wikipedia
web
eye 82
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
77
77
Apr 9, 2010
04/10
by
Wikipedia
web
eye 77
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
101
101
Apr 10, 2010
04/10
by
Wikipedia
web
eye 101
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
99
99
Apr 10, 2010
04/10
by
Wikipedia
web
eye 99
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
99
99
Apr 9, 2010
04/10
by
Wikipedia
web
eye 99
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
108
108
Apr 10, 2010
04/10
by
Wikipedia
web
eye 108
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
108
108
Apr 9, 2010
04/10
by
Wikipedia
web
eye 108
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
95
95
Apr 10, 2010
04/10
by
Wikipedia
web
eye 95
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
96
96
Apr 10, 2010
04/10
by
Wikipedia
web
eye 96
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
95
95
Apr 10, 2010
04/10
by
Wikipedia
web
eye 95
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
109
109
Apr 10, 2010
04/10
by
Wikipedia
web
eye 109
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
107
107
Apr 10, 2010
04/10
by
Wikipedia
web
eye 107
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
136
136
Apr 10, 2010
04/10
by
Wikipedia
web
eye 136
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010
95
95
Apr 10, 2010
04/10
by
Wikipedia
web
eye 95
favorite 0
comment 0
Retrieved from wikipedia.org on April 8, 2010