Jessica Dummer, Doug Emery & Dot Porter, University of Pennsylvania Libraries

OPenn and its place in the digital humanist’s landscape

On May 1, 2015, Penn Libraries and the Schoenberg Institute for Manuscript Studies launched a new website, OPenn: Digital Primary Sources Available to Everyone (http://openn.library.upenn.edu/), that makes digitized cultural heritage material freely available and accessible to the public. This presentation looks broadly at OPenn: what it is, how it works, whom it serves, and where it fits in the online landscape of the digital humanist.

OPenn is a major initiative at the Penn Libraries to embrace open data. All images and metadata on the OPenn website are available as free cultural works to be studied, copied, modified, or used by anyone, for any purpose at no cost. OPenn launched with the entire corpus of manuscripts donated to the Penn Libraries in 2011 by SIMS founder Lawrence J. Schoenberg and his wife Barbara Brizdle Schoenberg. The Schoenberg Collection features manuscripts from all over the world, with a focus on science, technology, engineering and mathematics. More datasets, including manuscripts from the University of Pennsylvania’s other holdings and items from other institutions, are currently being added to the site. Historic diaries from a variety of institutions belonging to the Philadelphia Area Consortium of Special Collections Libraries (PACSCL)(http://pacscl.org/) are next in line for inclusion on OPenn. Some of these documents are celebrated, such as the Union League’s Tanner manuscript: a firsthand account of the events surrounding the assassination of Abraham Lincoln.

OPenn contains complete sets of high-resolution archival images, standard all-purpose JPEGs and thumbnail JPEGs of manuscripts in its collections, along with machine-readable TEI P5 descriptions and technical metadata. All materials on this site, including metadata, are in the public domain or released under CC0 or CC-BY Creative Commons licenses as free cultural works. These licensing structures give users unmediated access to all data provided by OPenn, from a single image to the entire data set and allow them to use the data for any purpose with the only restriction being attribution for works not in the public domain. The site is also designed to allow users to download the data in bulk using HTTP, anonymous FTP, or anonymous RSYNC – all simple methods for using a computer to access open data. Both the licensing model and the ability to easily grab single files or all the data for one book or of a whole collection are keys to the openness of the project. It can be an organizational challenge for institutions to open their data in both of these ways, but the philosophy of OPenn is that anything short of open licensing and ready access to all images and metadata is not truly open data.

Dot Porter, SIMS’ Curator of Research Services, has already used the datasets for a variety of projects. She has created e-books from the images and metadata on OPenn. You can download the e-books in the free and open epub (http://idpf.org/epub) format at Penn Libraries’ Scholarly Commons (http://repository.upenn.edu/sims_ebooks/). She has also used the Internet Archive BookReader (https://openlibrary.org/dev/docs/bookreader), an open source online page-turning book reader, to generate online versions of each manuscript. An example using LJS 225, Litterarum simulationis liber, can be seen at http://dorpdev.library.upenn.edu/BookReaders/ljs225/#page/4/mode/2up . One can search and browse manuscripts in OPenn (along with digitized manuscripts from The Digital Walters) here: http://viewshare.org/views/leoba/openn-and-digital-walters/. OPenn also enables rigorous study and scholarly discovery by increasing ease of access for researchers interested in these manuscripts. For instance, Porter is working on a collation visualization project along with Alberto Campagnolo, Doug Emery and Dennis Mullen, in which images of individual pages can be manipulated to re-create the order in which the pages were written, as opposed to the order in which they were collated for binding, providing leeway in exploration that researchers might not have otherwise. The project page is available at https://github.com/leoba/VisColl.

OPenn fits into a complicated digital landscape that offers consumers of cultural works a range of options. Often institutions provide access to their images in institutional silos that allow users to view images and manipulate them within an interface, but make their download and reuse difficult. This can be the case even when the images are identified as open data. These applications offer the consumer a certain experience that is appropriate for many users, but what about the scholar who wants to go further? The raw data on OPenn is intended to be downloaded and reused by aggregators, digital humanists, and scholars for any purpose they see fit. OPenn’s main virtues are its total flexibility and its simplicity. It is not trying to do anything more than provide people with the images and metadata they need to create their own digital projects and initiatives. I will compare OPenn to other open data projects that may be founded on similar philosophies but take a different route. In particular I’ll focus on the International Image Interoperability Framework (IIIF). IIIF is building a community of institutions that support the interoperability of online image delivery and the use of common APIs and software to accomplish their goal. OPenn and IIIF may have different purposes, but they both come out of the premise that data needs to be shared easily between scholars and institutions. OPenn can support initiatives like the IIIF because it makes it easier to take data and fit it into the IIIF’s structure. In fact, Porter is integrating OPenn data into the IIIF’s image delivery framework. OPenn is a place where you can get the supplies, or content, to build the structure you want to create.

Bibliography

  • TEI Consortium, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 2.8.0. 6 Apr 2015. TEI Consortium. http://www.tei-c.org/Guidelines/P5/ (23 June 2015)
  • Creative Commons https://creativecommons.org/ (23 June 2015)
  • International Image Interoperability Framework http://iiif.io/ (23 June 2015)
  • Linux manual http://linux.die.net/man/1/rsync (23 June 2015)
  • Extensible Markup Languge (XML) http://www.w3.org/XML/ (23 June 2015)