Libraries at University of Nebraska-Lincoln

Copyright, Fair Use, Scholarly Communication, etc.

Document Type

Article

Date of this Version

2-17-2020

Citation

code4lib (February 27, 2020) 47: 15016

Also available at: https://journal.code4lib.org/articles/15016.

Comments

This work is licensed under a Creative Commons Attribution 3.0 United States License, CC-BY 3.0

Abstract

This article will describe our process developing a script to automate downloading of documents and secondary materials from our library’s BePress repository. Our objective was to collect the full archive of dissertations and associated files from our repository into a local disk for potential future applications and to build out a preservation system.

Unlike at some institutions, our students submit directly into BePress, so we did not have a separate repository of the files; and the backup of BePress content that we had access to was not in an ideal format (for example, it included “withdrawn” items and did not effectively isolate electronic theses and dissertations). Perhaps more importantly, the fact that BePress was not SWORD-enabled and lacked a robust API or batch export option meant that we needed to develop a data-scraping approach that would allow us to both extract files and have metadata fields populated. Using a CSV of all of our records provided by BePress, we wrote a script to loop through those records and download their documents, placing them in directories according to a local schema. We dealt with over 3,000 records and about three times that many items, and now have an established process for retrieving our files from BePress. Details of our experience and code are included.

Download

Included in

Archival Science Commons, Collection Development and Management Commons, Databases and Information Systems Commons, Intellectual Property Law Commons, Scholarly Communication Commons, Scholarly Publishing Commons

COinS

DigitalCommons@University of Nebraska - Lincoln

Libraries at University of Nebraska-Lincoln

Copyright, Fair Use, Scholarly Communication, etc.

Scraping BePress: Downloading Dissertations for Preservation

Document Type

Date of this Version

Citation

Comments

Abstract

Included in

Search

Browse

Author Corner

Links

DigitalCommons@University of Nebraska - Lincoln

Libraries at University of Nebraska-Lincoln

Copyright, Fair Use, Scholarly Communication, etc.

Scraping BePress: Downloading Dissertations for Preservation

Authors

Document Type

Date of this Version

Citation

Comments

Abstract

Included in

Share

Search

Browse

Author Corner

Links