Indiana University Web Archives on Archive-It.org

Indiana University Web Sites on Archive-It.org 
https://archive-it.org/collections/219 
Indiana University Social Media Accounts on Archive-it.org 
https://archive-it.org/collections/8920

Overview

The Indiana University Web Sites and Indiana University Social Media Accounts collections seek to preserve and facilitate access to web sites and social media produced by administrative offices, schools, departments, service units, institutes, centers, programs, and faculty, student and alumni organizations on the Indiana University, Bloomington campus. In addition, a few websites and accounts for Indiana University offices that are responsible for operations at the system-wide level have also been collected. Please note - some blogs and/or other social media may currently by accessible in the Web Sites collection. These sites were crawled prior to the creation of the Social Media Accounts collection.

Citing Web Sites in the Archives

Please cite the collection as follows:

[Collection Title]. Archived by the Indiana University Libraries Web Archive at http://www.archive-it.org/collections/219  <accessed [date]>

Please cite individual seeds or web pages as follows: “School of Education.” [Collection Title].  Archived by the Indiana University Libraries Web Archive at  http://www.archive-it.org/collections/219  <accessed [date]>

Selection Criteria

Scope: The goal is to preserve and make accessible every web site created by a unit on the IU Bloomington campus, and the web sites and accounts of a few, important system-wide offices. Select social media accounts are archived based on use, content, and technical capabilities of the web crawler. The only reason that an IU Bloomington web site would be excluded is if the site were password protected, blocked by robots.txt, or otherwise inaccessible to the Internet Archive’s automated systems. 

Volume: Currently, there are 742 unique domains or seeds being captured. New seeds are frequently added the the collections. To request that a page be archived in either collection, please contact the University Archives.

Crawl Parameters

Collection Dates:  

Indiana University Web Sites start date:  July 1, 2006

Indiana University Social Media Accounts start date: May 11, 2017

How often captured:  The frequency of capture is determined by an analysis of how often the site changes over time.  It is anticipated that most sites will be crawled on a quarterly basis.  A few active sites will be crawled monthly and some less active sites will be crawled on an annual basis.

Acquisition Parameters

Depth:  The complete web site, if possible. 

Breadth:  Links are followed out to one external level.

Searching

Archive-It provides full text search capability for all public collections. Alternately, if you know the site you are looking for, enter the URL into the search box, and Archive-It will search for instances of that archived URL.

Archive-It release 2.0 (July 24, 2006) enables searching of both the full text of web sites and the metadata that has been assigned to the seeds, or individual URL’s.  However, the ability to search on metadata elements is not yet available to the public. 

The search tool used to provide full-text access to the Library's Web archive collections is powered by the open-source search engine, Nutch.

Some hints on searching:

  • Generally, search results are ranked by relevance according to several factors:
  • how often the query terms appear in the page relative to how often they appear throughout the collection
  • how often the query terms appear in the page compared to the length of the page
  • whether the query terms appear in the URL
  • whether the query terms appear in the hostname
  • The Boolean search default is AND.
  • If you know that what you're looking for is in a specific type of file, you can limit your search to just that format by adding type:[file type] to your search terms. e.g., A PDF document about Herman Wells might be found using the following string: Herman Wells type:pdf.
  • If you want to find out about a topic discussed specifically on an archived web site, you can limit your search by adding site:[URL of archived site] to your search terms. e.g., David Baker site:www.music.indiana.edu/ will find mentions of David Baker on the School of Music web site.
  • You can refine search results in the following ways:
    • The link to other versions will take you to a list of archived versions that were captured on different dates.
    • The more from... link will take you to other hits from that host.

Since the Indiana University Libraries have been archiving web sites only since spring 2006, you may wish to look for earlier versions of many of the sites in the Library's collections through the Internet Archive's general Wayback Machine.  The Wayback Machine, however, is not text searchable; you must know the URL of the site that you would like to view.

Contact Information

IU Libraries University Archives 
archives @ indiana . edu