Skip to main content

1,308
UPLOADS


More right-solid

More right-solid

Show sorted alphabetically

More right-solid

Show sorted alphabetically

More right-solid

More right-solid
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Internet Archive Web Crawls
Internet Archive Web Crawls
collection
1,669,191
ITEMS
44.5B
VIEWS
collection

eye 44.5B

The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine .
Topic: webwidecrawl
Worldwide Web Crawls
Worldwide Web Crawls
collection
634,931
ITEMS
18.5B
VIEWS
collection

eye 18.5B

Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites. Since September 10th, 2010, the Internet Archive has been running Worldwide Web Crawls of the global web, capturing web elements, pages, sites and parts of sites. Each Worldwide Web Crawl was initiated from one or more lists of URLs that are known as "Seed Lists". Descriptions of the Seed Lists associated with each crawl may be provided as part of the metadata for...
Alexa Crawls
Alexa Crawls
collection
226,901
ITEMS
16.5B
VIEWS
collection

eye 16.5B

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Topics: web crawl, Alexa
Live Web Proxy Crawls
Live Web Proxy Crawls
collection
108,957
ITEMS
11.1B
VIEWS
collection

eye 11.1B

Content crawled via the Wayback Machine Live Proxy mostly by the Save Page Now feature on web.archive.org. Liveweb proxy is a component of Internet Archive’s wayback machine project. The liveweb proxy captures the content of a web page in real time, archives it into a ARC or WARC file and returns the ARC/WARC record back to the wayback machine to process. The recorded ARC/WARC file becomes part of the wayback machine in due course of time.
Archive Team
Archive Team
collection
3,647,235
ITEMS
5B
VIEWS
collection

eye 5B

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history. History is littered with hundreds of conflicts over the future of a community, group, location or...
Survey Crawls
Survey Crawls
collection
100,903
ITEMS
11.2B
VIEWS
collection

eye 11.2B

Survey crawls are run about twice a year, on average, and attempt to capture the content of the front page of every web host ever seen by the Internet Archive since 1996.
Topic: survey crawls
Fix Broken Links Web Crawls
Fix Broken Links Web Crawls
collection
169,696
ITEMS
4B
VIEWS
collection

eye 4B

These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved. Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's...
Wikipedia Outlinks
Wikipedia Outlinks
collection
107,183
ITEMS
2.4B
VIEWS
collection

eye 2.4B

Crawl of outlinks from wikipedia.org . These files are currently not publicly accessible. from Wikipedia : Wikipedia is a multilingual, web-based, free-content encyclopedia project operated by the Wikimedia Foundation and based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the...
Common Crawl
Common Crawl
collection
30,106
ITEMS
771.5M
VIEWS
collection

eye 771.5M

Web crawl data from Common Crawl.
Wikipedia Near Real Time (from IRC)
Wikipedia Near Real Time (from IRC)
collection
18,250
ITEMS
1.8B
VIEWS
collection

eye 1.8B

This is a collection of web page captures from links added to, or changed on, Wikipedia pages. The idea is to bring a reliability to Wikipedia outlinks so that if the pages referenced by Wikipedia articles are changed, or go away, a reader can permanently find what was originally referred to. This is part of the Internet Archive's attempt to rid the web of broken links .
Topics: Wikipedia, Wikimedia
collection

eye 2.1B

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
collection

eye 1.5B

Web wide crawl.
collection

eye 1.3B

Web wide crawl number 16 The seed list for Wide00016 was made from the join of the top 1 million domains from CISCO and the top 1 million domains from Alexa.
GDELT
GDELT
collection
57,657
ITEMS
1.2B
VIEWS
collection

eye 1.2B

A daily crawl of more than 200,000 home pages of news sites, including the pages linked from those home pages. Site list provided by The GDELT Project
Topics: GDELT, News
collection

eye 1.3B

The seeds for this crawl came from: 251 million Domains that had at least one link from a different domain in the Wayback Machine, across all time ~ 300 million Domains that we had in the Wayback, across all time 55,945,067 Domains from https://archive.org/details/wide00016 This crawl was run with a Heritrix setting of "maxHops=0" (URLs including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Wide Crawl Number 12 - started March, 14th 2015
Wide Crawl Number 12 - started March, 14th 2015
collection
49,621
ITEMS
1.4B
VIEWS
collection

eye 1.4B

Web wide crawl with initial seedlist and crawler configuration from January 2015.
Wide Crawl started April 2013
Wide Crawl started April 2013
collection
25,035
ITEMS
1.4B
VIEWS
collection

eye 1.4B

Web wide crawl with initial seedlist and crawler configuration from April 2013.
collection

eye 1.4B

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Wide Crawl started June 2014
Wide Crawl started June 2014
collection
45,341
ITEMS
1.3B
VIEWS
collection

eye 1.3B

Web wide crawl with initial seedlist and crawler configuration from June 2014.
Wordpress Blogs and the Pages They Link To
Wordpress Blogs and the Pages They Link To
collection
92,388
ITEMS
860.4M
VIEWS
collection

eye 860.4M

This is a collection of pages and embedded objects from WordPress blogs and the external pages they link to. Captures of these pages are made on a continuous basis seeded from a feed of new or changed pages hosted by Wordpress.com or by Wordpress pages hosted by sites running a properly configured Jetpack wordpress plugin.
Topics: Wordpress.com, blogs, jetpack
collection

eye 1.1B

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Audio Books & Poetry
Audio Books & Poetry
collection
103,439
ITEMS
1.9B
VIEWS
collection

eye 1.9B

Listen to free audio books and poetry recordings! This library of audio books and poetry features digital recordings and MP3's from the Naropa Poetics Audio Archive, LibriVox, Project Gutenberg, Maria Lectrix, and Internet Archive users.
Wide Crawl Number 13
Wide Crawl Number 13
collection
46,050
ITEMS
1B
VIEWS
collection

eye 1B

Web Wide Crawl Number 13
Wayback Indexes
Wayback Indexes
collection
554
ITEMS
1.2B
VIEWS
collection

eye 1.2B

Wayback indexes. This data is currently not publicly accessible.
Community Images
Community Images
collection
397,931
ITEMS
805M
VIEWS
collection

eye 805M

Images contributed by Internet Archive users and community members. These images are available for free download. Please select a Creative Commons License during upload so that others will know what they may (or may not) do with with your images.
Topic: images
Wide Crawl started August 2013
Wide Crawl started August 2013
collection
21,932
ITEMS
930.5M
VIEWS
collection

eye 930.5M

Web wide crawl with initial seedlist and crawler configuration from August 2013.
collection

eye 808.5M

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
National Library of Australia Crawls
National Library of Australia Crawls
collection
50,498
ITEMS
522.4M
VIEWS
collection

eye 522.4M

Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
Wide Crawl started January 2012
Wide Crawl started January 2012
collection
30,373
ITEMS
805.1M
VIEWS
collection

eye 805.1M

Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Wide Crawl started April 2012
Wide Crawl started April 2012
collection
39,279
ITEMS
706.4M
VIEWS
collection

eye 706.4M

Web wide crawl with initial seedlist and crawler configuration from April 2012.
collection

eye 694.9M

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
.com survey started January 2011
.com survey started January 2011
collection
2,535
ITEMS
533.8M
VIEWS
collection

eye 533.8M

Survey crawl of .com domains started January 2011.
Topic: webcrawl
Wide Crawl started February 2014
Wide Crawl started February 2014
collection
9,806
ITEMS
600.6M
VIEWS
collection

eye 600.6M

Web wide crawl with initial seedlist and crawler configuration from February 2014.
Wide Crawl Started January 2013
Wide Crawl Started January 2013
collection
15,157
ITEMS
526.8M
VIEWS
collection

eye 526.8M

Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Wide Crawl started September 2012
Wide Crawl started September 2012
collection
22,423
ITEMS
523.3M
VIEWS
collection

eye 523.3M

Web wide crawl with initial seedlist and crawler configuration from September 2012.
Wide Crawl started October 2010
Wide Crawl started October 2010
collection
15,839
ITEMS
541M
VIEWS
collection

eye 541M

Web wide crawl with initial seedlist and crawler configuration from October 2010
Wide Crawl started October 2011
Wide Crawl started October 2011
collection
12,648
ITEMS
501.6M
VIEWS
collection

eye 501.6M

Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Host Screen Captures
Host Screen Captures
collection
17,458
ITEMS
211M
VIEWS
collection

eye 211M

Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
collection

eye 398.9M

The seed for this crawl was a list of every host in the Wayback Machine This crawl was run at a level 1 (URLs including their embeds, plus the URLs of all outbound links including their embeds) The WARC files associated with this crawl are not currently available to the general public.
Wide Crawl started March 2011
Wide Crawl started March 2011
collection
8,528
ITEMS
459.8M
VIEWS
collection

eye 459.8M

Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
International News Crawls
International News Crawls
collection
13,331
ITEMS
274.2M
VIEWS
collection

eye 274.2M

Crawls of International News Sites
Spirituality & Religion
Spirituality & Religion
collection
428,004
ITEMS
214.8M
VIEWS
collection

eye 214.8M

Listen to sermons and lectures concerning religion and spirituality here.
Internet Memory Foundation
Internet Memory Foundation
collection
1,918
ITEMS
271.3M
VIEWS
collection

eye 271.3M

Data crawled on behalf of Internet Memory Foundation . This data is currently not publicly accessible. from Wikipedia : The Internet Memory Foundation (formerly the European Archive Foundation) is a non profit foundation whose purpose is archiving web content, it supports projects and research which include the preservation and protection of multimedia content. Its archives form a digital library of cultural content.
Wikipedia Outlinks February 2012
Wikipedia Outlinks February 2012
collection
2,951
ITEMS
366.1M
VIEWS
collection

eye 366.1M

Crawl of outlinks from wikipedia.org started February, 2012. These files are currently not publicly accessible.
Television Archive
Television Archive
collection
9,539,613
ITEMS
233.7M
VIEWS
collection

eye 233.7M

Programs in  TV News Archive for research and educational purposes. The programs allow users to search across a collection of television news programs dating back to 2009 for research and educational purposes such as fact checking. Users may view short clips, share links to customized short quotes, embed customized short quotes, or borrow a copy of the full program.
( 1 reviews )
web_wk
web_wk
collection
9,973
ITEMS
315.2M
VIEWS
collection

eye 315.2M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl EG
Alexa Crawl EG
collection
1,678
ITEMS
313.2M
VIEWS
collection

eye 313.2M

Crawl EG from Alexa Internet. This data is currently not publicly accessible.
Movies
Movies
collection
87,865
ITEMS
374.8M
VIEWS
collection

eye 374.8M

Watch full-length feature films, classic shorts, world culture documentaries, World War II propaganda, movie trailers, and films created in just ten hours: These options are all featured in this diverse library! Many of these videos are available for free download.
Podcasts
Podcasts
collection
72,921,943
ITEMS
191.9M
VIEWS
collection

eye 191.9M

A great resource for podcasters: the Creative Commons  Podcasting Legal Guide .
Internet Archive Books
Internet Archive Books
collection
3,881,299
ITEMS
153.7M
VIEWS
collection

eye 153.7M

Books contributed by the Internet Archive.
Topic: internet archive books
Books for People with Print Disabilities
Books for People with Print Disabilities
collection
7,459,724
ITEMS
159.7M
VIEWS
collection

eye 159.7M

Free books for the  people with disabilities that impact reading.  If you have a disability that interferes with reading printed text then all of these books can be instantaneously available in your browser or via protected download. Want access? Individuals If you would like to apply for access (it is free),  make sure you have an Archive.org account and then  fill in this form to contact the Vermont Mutual Aid Society . If you are affiliated with any of...
Topics: print disabled, print disability
Books to Borrow
Books to Borrow
collection
5,791,200
ITEMS
145M
VIEWS
collection

eye 145M

Books in this collection may be borrowed by logged in patrons.  You may read the books online in your browser or, in some cases, download them into Adobe Digital Editions , a free piece of software used for managing loans.  Please note that works in this collection are protected by copyright law (Title 17 U.S. Code) and copying, redistribution or sale, whether or not for profit, by the recipient is not permitted unless authorized by the rightsholder or by law. See FAQs about...
National Library of Spain Crawls
National Library of Spain Crawls
collection
6,742
ITEMS
275.5M
VIEWS
collection

eye 275.5M

Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
web_iq
web_iq
collection
2,637
ITEMS
263.7M
VIEWS
collection

eye 263.7M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Sermons & Religious Lectures
Sermons & Religious Lectures
collection
327,243
ITEMS
129.1M
VIEWS
collection

eye 129.1M

A number of religious and spiritual organizations regularly upload their sermons and lectures to the Archive through the Open Source Audio collection. You may easily locate them here.
Alexa Crawl EI
Alexa Crawl EI
collection
1,408
ITEMS
213.8M
VIEWS
collection

eye 213.8M

Crawl EI from Alexa Internet. This data is currently not publicly accessible.
Wikipedia Outlinks May 2011
Wikipedia Outlinks May 2011
collection
1,638
ITEMS
180.9M
VIEWS
collection

eye 180.9M

Crawl of outlinks from wikipedia.org started May, 2011. These files are currently not publicly accessible.
Alexa Crawl EH
Alexa Crawl EH
collection
1,218
ITEMS
178M
VIEWS
collection

eye 178M

Crawl EH from Alexa Internet. This data is currently not publicly accessible.
Music, Arts & Culture
Music, Arts & Culture
collection
815,386
ITEMS
117.2M
VIEWS
collection

eye 117.2M

This collection features audio collections reflecting music, art and culture. Collections include the unique contemporary compositions and performances found in the Other Minds collection, the hundreds of popular songs from the early 20th Century found in the 78 RPM collection and oral history projects.
Television Archive News Search Service
Television Archive News Search Service
collection
2,256,743
ITEMS
224.7M
VIEWS
collection

eye 224.7M

Items included in the Television News search service. Part of TV News Archive .
Shallow Crawls
Shallow Crawls
collection
1,042
ITEMS
177M
VIEWS
collection

eye 177M

Shallow crawls that collect content 1 level deep including embeds. This data is currently not publicly accessible.
Bibliotheque Nationale de France Domain Crawls
Bibliotheque Nationale de France Domain Crawls
collection
1,653
ITEMS
188.9M
VIEWS
collection

eye 188.9M

Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
Youtube Videos
Youtube Videos
collection
710,966
ITEMS
98M
VIEWS
collection

eye 98M

Captures of pages from YouTube. Currently these are discovered by searching for YouTube links on Twitter.
Topics: YouTube, Twitter, Video
News & Public Affairs
News & Public Affairs
collection
1,588,454
ITEMS
203M
VIEWS
collection

eye 203M

An analysis of news and public affairs independent from traditional corporate media is available from this diverse video library. From Democracy Now's daily news program, to three days of TV news coverage following the 911 attacks, to Mosaic’s timely clips of Middle East newscasts, to UCSF's Tobacco Industry Videos: These collections offer an alternative way to view and interpret current news and public affairs. Many of these videos are available for free download.
web_mon
web_mon
collection
3,809
ITEMS
147.9M
VIEWS
collection

eye 147.9M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl DX
Alexa Crawl DX
collection
1,442
ITEMS
176.7M
VIEWS
collection

eye 176.7M

Crawl DX from Alexa Internet. This data is currently not publicly accessible.
Television
Television
collection
296,146
ITEMS
78.8M
VIEWS
collection

eye 78.8M

Collections of items recorded from television, including commercials, old television shows, government proceedings, and more.
Radio Show and Programs Archive
Radio Show and Programs Archive
collection
33,521,964
ITEMS
216M
VIEWS
collection

eye 216M

National Archives and Records Administration
National Archives and Records Administration
collection
12,089
ITEMS
123.7M
VIEWS
collection

eye 123.7M

National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
Geocities Closing Crawl
Geocities Closing Crawl
collection
149
ITEMS
102.3M
VIEWS
collection

eye 102.3M

Geocities crawl performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Yahoo! GeoCities is a Web hosting service. GeoCities was originally founded by David Bohnett and John Rezner in late 1994 as Beverly Hills Internet (BHI), and by 1999 GeoCities was the third-most visited Web site on the World Wide Web. In its original form, site users selected a "city" in which to place their Web pages. The "cities" were metonymously named after...
Wikipedia Outlinks July 2011
Wikipedia Outlinks July 2011
collection
1,011
ITEMS
125.7M
VIEWS
collection

eye 125.7M

Crawl of outlinks from wikipedia.org started July, 2011. These files are currently not publicly accessible.
web_tran
web_tran
collection
4,192
ITEMS
134.2M
VIEWS
collection

eye 134.2M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Alexa Crawl EB
Alexa Crawl EB
collection
653
ITEMS
137.3M
VIEWS
collection

eye 137.3M

Crawl EB from Alexa Internet. This data is currently not publicly accessible.
Alexa Crawl DZ
Alexa Crawl DZ
collection
1,207
ITEMS
150.2M
VIEWS
collection

eye 150.2M

Crawl DZ from Alexa Internet. This data is currently not publicly accessible.
Wayback CDX Shards
Wayback CDX Shards
collection
1,214
ITEMS
151.9M
VIEWS
collection

eye 151.9M

CDX Index shards for the Wayback Machine. The Wayback Machine works by looking for historic URL's based on a query. This is done by searching an index of all the web objects (pages, images, etc) that have been archived over the years. This collection holds the index used for this purpose, which is broken up into 300 pieces so they fit into items more naturally and distribute the lookup load. Each of these 300 pieces is stored in at least 2 items, and then those are also stored on the backup...
Serials in Microfilm
Serials in Microfilm
collection
4,361,374
ITEMS
30.8M
VIEWS
collection

eye 30.8M

Digitized version from Serials In Microform collection originally from NA Publishing. Record of the acquisition of the microfilm:  https://archive.org/details/SerialsOnMicrofilmCollection
78rpm Records Digitized by George Blood, L.P.
78rpm Records Digitized by George Blood, L.P.
collection
342,130
ITEMS
46.6M
VIEWS
collection

eye 46.6M

Newest uploads! Auto-78-twitter .  Through the Great 78 Project the Internet Archive has begun to digitize 78rpm discs for preservation, research, and discovery with the help of George Blood, L.P. . 78s were mostly made from shellac, i.e., beetle resin, and were the brittle predecessors to the LP (microgroove) era.   @great78project for uploads as they happen. Turntable used for 78rpm digitization of four simultaneous recordings with different needles. The...
Topics: 78rpm, digitization
Source: 78
Periodicals
Periodicals
collection
4,362,734
ITEMS
31.4M
VIEWS
collection

eye 31.4M

Periodical publications including magazines, trade magazines, and journals.  Please peruse the growing list of publications .
Topics: periodicals, journals, serials, magazines
Alexa Crawl EF
Alexa Crawl EF
collection
975
ITEMS
97.2M
VIEWS
collection

eye 97.2M

Crawl EF from Alexa Internet. This data is currently not publicly accessible.
Newspapers
Newspapers
collection
1,025,630
ITEMS
56.7M
VIEWS
collection

eye 56.7M

The newspapers in this collection have been scanned as part of a pilot project using microfilm and microfiche. After using a microfilm/fiche scanner to create a digital image of each page, we process the resulting images so that each reel is contained in a single item with easily navigable files. For a few examples, please see: The New York times (Oct 16 31 1915) The New York times (1919 July 1-15) The New York times (May 1-15 1915)
Alexa Crawl DL
Alexa Crawl DL
collection
413
ITEMS
99.8M
VIEWS
collection

eye 99.8M

Crawl DL from Alexa Internet. This data is currently not publicly accessible.
Non-English Audio
Non-English Audio
collection
31,378
ITEMS
364.4M
VIEWS
collection

eye 364.4M

Non-English language collections contributed to the Open Source Audio collection are featured here.
COM Survey Crawl 2009-2010
COM Survey Crawl 2009-2010
collection
729
ITEMS
82.1M
VIEWS
collection

eye 82.1M

COM survey crawl data collected by Internet Archive in 2009-2010. This data is currently not publicly accessible.
Scanned in China
Scanned in China
collection
832,095
ITEMS
76.9M
VIEWS
collection

eye 76.9M

Books scanned in Shenzhen and Beijing, China.
Topic: books
Arts & Music
Arts & Music
collection
17,282
ITEMS
574M
VIEWS
collection

eye 574M

This library of arts and music videos features This or That (a burlesque game show), the Coffee House TV arts program, punk bands from Punkcast and live performances from Groove TV. Many of these movies are available for free download.
Accelovation Crawl
Accelovation Crawl
collection
1,324
ITEMS
89.7M
VIEWS
collection

eye 89.7M

Web crawl snapshots generously donated from Accelovation . This data is currently not publicly accessible. From the site : Accelovation is pioneering the delivery of Insight Discovery™ software solutions that help companies move from innovation idea to product reality faster and with more success. Our solutions are used by leading firms in the Fortune 500 and beyond – companies from a diverse set of industries ranging from consumer packaged goods to high tech, foods to chemicals, and...
University of Toronto - Robarts Library
University of Toronto - Robarts Library
collection
217,063
ITEMS
316.7M
VIEWS
collection

eye 316.7M

The John P. Robarts Research Library, commonly referred to as Robarts Library, is the main humanities and social sciences library of the University of Toronto Libraries and the largest individual library in the university. Opened in 1973 and named for John Robarts, the 17th Premier of Ontario, the library contains more than 4.5 million bookform items, 4.1 million microform items and 740,000 other items. The library building is one of the most significant examples of brutalist architecture in...
Biodiversity Heritage Library
Biodiversity Heritage Library
collection
281,003
ITEMS
166.6M
VIEWS
collection

eye 166.6M

Inspiring discovery through free access to biodiversity knowledge. | The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community. | Please read BHL's Acknowledgment of Harmful Content . About the Biodiversity Heritage Library The Biodiversity Heritage Library (BHL) is the world's largest open access digital library for biodiversity literature and archives. BHL is...
Alexa Crawl EE
Alexa Crawl EE
collection
484
ITEMS
74.5M
VIEWS
collection

eye 74.5M

Crawl EE from Alexa Internet. This data is currently not publicly accessible.
survey_net00000
survey_net00000
collection
300
ITEMS
66M
VIEWS
collection

eye 66M

Survey crawl of .net domains started December 2010.
Topic: webcrawl
Alexa Crawl DJ
Alexa Crawl DJ
collection
341
ITEMS
84.6M
VIEWS
collection

eye 84.6M

Crawl DJ from Alexa Internet. This data is currently not publicly accessible.
The Library of Congress
The Library of Congress
collection
167,359
ITEMS
108.9M
VIEWS
collection

eye 108.9M

Question or comment about digitized items from the Library of Congress that are presented on this website? Please use the Library of Congress Ask a Librarian form.    The  Library of Congress is the world’s largest library, offering access to the creative record of the United States—and extensive materials from around the world—both on-site and online. It is the main research arm of the U.S. Congress and the home of the U.S. Copyright Office. Explore...
Institut national de l’audiovisuel
Institut national de l’audiovisuel
collection
50
ITEMS
87.6M
VIEWS
collection

eye 87.6M

Crawl data from Institut national de l’audiovisuel in France. This data is currently not publicly accessible. from Wikipedia : The Institut national de l'audiovisuel (or INA, French for National Audiovisual Institute), is a repository of all French radio and television audiovisual archives. Since 2006, it has allowed free online consultation on a website called ina.fr with a search tool indexing 100,000 archives of historical programs, for a total of 20,000 hours.
Shallow Crawl Started 2013
Shallow Crawl Started 2013
collection
544
ITEMS
71.2M
VIEWS
collection

eye 71.2M

Shallow crawl started 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
web_ma
web_ma
collection
1,085
ITEMS
75.4M
VIEWS
collection

eye 75.4M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Shallow Crawl Started 2013
Shallow Crawl Started 2013
collection
252
ITEMS
72.1M
VIEWS
collection

eye 72.1M

Shallow crawl started 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Alexa Crawl DI
Alexa Crawl DI
collection
250
ITEMS
80.6M
VIEWS
collection

eye 80.6M

Crawl DI from Alexa Internet. This data is currently not publicly accessible.
web_con
web_con
collection
1,507
ITEMS
72.8M
VIEWS
collection

eye 72.8M

Crawl performed by Internet Archive. This data is currently not publicly accessible.
Old Time Radio
Old Time Radio
collection
8,372
ITEMS
119.6M
VIEWS
collection

eye 119.6M

Alexa Crawl Image
Alexa Crawl Image
collection
92
ITEMS
56.9M
VIEWS
collection

eye 56.9M

Crawl Image from Alexa Internet. This data is currently not publicly accessible.