Born Digital Archive Processing

Although the shift from paper to digital has been a major issue in the world of archives for several years, the delay in some disciplines between record creation and archive deposit has meant that, for many repositories, the challenge of processing born-digital archival material is a relatively new one. And although this development has prompted the creation of new practices, tools and methods of working, one constant has been the commitment that archives make to donors; to care for, preserve and where possible make available their valued content. Whether a filmmaker is handing over boxes full of 8mm film or hard drives worth of MP4 files an archivist must be able to confidently state that the material will be safe in their institution’s custody and that every effort will be made to afford access to researchers and the public as required. In the University Library the recent development of some of our research strengths within Special Collections has been prompted by the acquisition of collections which have enabled us to consider some of the unique questions surrounding digital archives, and to challenge ourselves to maintain these commitments as an archive.

Case Study 1: Alan Halsey archive

Founded in 2014, the Small Press Poetry Collection (housed in the Library’s Special Collections) is a developing resource for the study and teaching of avant-garde mostly British and American poetry from the twentieth century onwards. The collection includes many signed first edition pamphlets and poetry broadsides by renowned poets in collaboration with hundreds of small presses. Recently, we have been fortunate to have acquired the archives of two prominent, Sheffield-based poets, Alan Halsey and Geraldine Monk. Poet, printer, artist, publisher (West House Books), and antiquarian bookseller, Alan Halsey has been writing and performing his poetry since the late 1970s. He has collaborated with many poets, illustrated his own and others’ work, and is a respected figure worldwide in non-mainstream poetry circles. Among his major works is the text-graphic Memory Screen, In White Writing, The Text of Shelley's Death, and Days of '49. In addition to entrusting us with his valuable archive, Alan continues to support the Small Press Poetry Collection through donation.

Alongside physical items such as published works and correspondence, the Halsey collection contains a significant volume of digital files, comprising poems, recorded performances, photographs, publication proofs, and more. Although the processing and preservation of all types of files is no straightforward task, the nature of archives such as Alan Halsey’s presents specific difficulties which as yet have not been encountered by the Library. Making up a large percentage of the digital material are Word documents containing drafts and published poems. Form and structure are integral elements of poetry, often as important as the words themselves. Spaces, line breaks and deliberate physical patterns add meaningful conceptual dimensions and are also crucial to the rhythm of a poem; telling us how the work should be read and performed. With more than one way to read a poem, the risks of formatting errors to poetry are enormous, much in the same way that presenting aspects of a painted picture in an incorrect place on the canvas would completely misappropriate the artist’s message, a single word appearing on the wrong line could easily and irrevocably alter the concept behind a poem, and to lose form completely could be like hanging a painting upside down and back to front.

For files created in proprietary software such as Word, the dangers here are obvious. Formatting errors between older and newer versions of Word are common; the risk of operating systems over-compensating for layout changes are huge and could leave the archivist unaware of whether the representation of the file on screen in front of them is what the artist intended. Maintaining access to the original unaltered representation of poems is vital and there are solutions available: batch conversions to the more stable PDF format for example. But such actions are not without risks themselves. Can we ensure that the look and feel of the document has been maintained in its new form? Have the required fonts been embedded - will a poem containing special characters or Greek symbols display as it should? In the case of large migrations, can we ensure the conversions have been successful?

When reflecting on the the process of creating poems in a digital format over the analogue equivalent, Alan Halsey observed that “the ease of revision by computer has probably brought about changes in writing techniques and the finished work which we’ve as yet hardly registered. There’s one work of mine, The Text of Shelley’s Death, written 1994, which I think might have taken a different form if I’d written it a few years later, with the benefit (or not) of a computer. It’s a mock-variorum and I used the traditional conventions such as ‘var.’, ‘del.’, square brackets etc. In the first flush of computer enthusiasm I might well have gone for strike-throughs and all the other tricks. I’m glad I didn’t. But quite a lot of my writing 1997-2000 did explore the gamut of new toys, some of which I’ve found can’t now be easily or exactly reproduced.” In fact Alan could specifically pinpoint an example of the forced rewriting of a poem due to an updated special character in a proprietary font which, when revised, was unsuitable for the work. In terms of layout and formatting, Alan has found that fulfilling the role of editor for others’ work has presented challenges to ensuring that works appear as they should; “skittered layouts because of some unforeseen incompatibility, uncertainty through lack of hard copy, authors creating effects using a different route than I’d take myself, etc”. And aside from poetry-specific digital preservation issues, although he feels that he has been relatively lucky regarding the loss of digital material he has still suffered from some migration problems, specifically when floppy disks made way for the widespread use of CDs.

Case Study 2: Palliative Care Oral History project

The processing of born-digital archive material is currently undertaken by Special Collections staff using designated hardware and tools in accordance with continually evolving guidelines that the Library maintains and updates. These are aligned to recommended best practice guidance from the global digital preservation community and ensure that the ingest of this valuable cultural digital material will be methodical and safe at every stage of the content’s progress - from CD-ROM, floppy disks etc, to our digital preservation system. Using a dedicated processing PC we are able to monitor files for changes or corruption, check for viruses and generate detailed lists of content and file types. Tools such as DROID, TeraCopy and Quick View Plus allow the investigation and management of material and all aspects of ingest, manipulation and transfer are documented at each stage.

The Palliative Care Unit within the Northern General Hospital, Sheffield, began creating oral history audio recordings with people in the unit in 2007, supported by the League of Friends, Sheffield Hospitals Charity and Macmillan Cancer Support. The majority of the interviewees are Sheffield and South Yorkshire residents discussing their lives with volunteer interviewers and are a rich resource for students studying both social history, and palliative and end of life care

For the acquisition of this collection, recordings of interviews were received by file transfer and access copies of these recordings were then generated along with versions of reference documentation. Using a specially written script we were able to create an interactive finding aid which allows researchers to browse a PDF catalogue of the collection and select recordings directly from the document. Access to the collection is provided via a laptop in the Library under reading room conditions. Following processing, the files were then deposited in ArchiveUS, the University’s digital preservation system. ArchiveUs (the Sheffield brand name for Rosetta, provided by ExLibris) ensures the long-term safeguarding of all our valuable digital content by providing a series of preservation actions including monitoring files for format obsolescence and allowing large-scale content migration. In the future we plan to expand the number of collections available in this way, if unrestricted access via our online collections portal are not possible.