Date of this Version
SPARC, November 2023
Also available at https://zenodo.org/records/10078610
As libraries transitioned from buying materials to licensing content, serious threats to privacy followed. This change shifted more control over library user data (and whether it is collected or kept at all) from the local library to third-party vendors, including personal data about what people search for and what they read. This transition has further reinforced the move by some of the largest academic publishers to move beyond content and become data analytics businesses that provide platforms of tools used throughout the research lifecycle that can collect user data at each stage. These companies have an increasing incentive to collect and monetize the rich streams of data that these platforms can generate from users. As a result, user privacy depends on the strength of privacy protections guaranteed by vendors (e.g., negotiated for in contracts), and a growing body of evidence indicates that this should be a source of concern.
User tracking that would be unthinkable in a physical library setting now happens routinely through such platforms. The potential integration of this tracking with other lines of business, including research analytics tools and data brokering services, raises pressing questions for users and institutions.
Elsevier provides an important case study in this dynamic. Elsevier is many academic libraries’ largest vendor for collections, and its platforms span the knowledge production process, from discovery and idea generation to publication to evaluation. Furthermore, Elsevier’s parent company, RELX, is a leading data broker. Its “risk” business, which provides services to corporations, governments, and law enforcement agencies based on expansive databases of personal data, has surpassed its Elsevier division in revenue and profitability.
For these reasons, it is important to carefully consider Elsevier’s privacy practices, the risks they may pose, and proactive steps to protect users. This analysis focuses on ScienceDirect due to its position as a leading discovery platform for research as well as the Elsevier product that researchers are most likely to interact with regularly.
Based on our findings, many of ScienceDirect's data privacy practices directly conflict with library privacy standards and guidelines. The data privacy practices identified in our analysis are like the practices found in many businesses and organizations that track and harvest user data to sustain privacy-intrusive data-driven business models. The widespread data collection, user tracking and surveillance, and disclosure of user data inherent to these business models run counter to the library's commitment to user privacy as specified in the ALA Code of Ethics, Library Bill of Rights, and the IFLA Statement on Privacy in the Library Environment. Examples of current ScienceDirect practices found in our analysis that conflict with these standards include:
• Use of web beacons, cookies, and other invasive web surveillance methods to track user behavior outside and beyond the ScienceDirect website
• Collection of personal data by third parties, including search engines, social media platforms, and other personal-data aggregators and profilers such as Google, Adobe, Cloudflare, and New Relic, through extensive use of third-party trackers on the ScienceDirect site
• Disclosure of personal data to other Elsevier products and the potential for disclosure of personal data to other business units within RELX, including risk products and services sold to corporations, governments, and law enforcement agencies
• Processing and disclosure of personal data (and personal data inferred from personal data) for targeted, personalized advertising and marketing
In particular, ScienceDirect’s U.S. Consumer Privacy Notice, posted and updated in 2023, raises important concerns. The notice describes the disclosure of detailed user data—including geolocation data, sensitive personal information, and inference data used to create profiles on individuals—both for wide-ranging internal use and to external third parties, including “affiliates” and “business and joint venture partners.”
The collection and disclosure of data about who someone is, where they are, and what they search for and read by the same overarching company that provides sophisticated surveillance and data brokering products to corporations, governments, and law enforcement should be alarming. These practices raise the question of whether simultaneous ownership of key academic infrastructure alongside sophisticated surveillance and data brokering businesses should be permitted at all—by users, by institutions, or by policymakers and regulatory authorities.
As many of the largest publishers reinvent themselves as platform businesses, users and institutions should actively evaluate and address the potential privacy risks as this transition occurs rather than after it is complete. In closely analyzing the privacy practices of the leading vendor in this transition, this report highlights the need for institutions to be proactive in responding to these risks and provides initial steps for doing so.
This report underscores the significant expertise and capacity required for any institution to understand even one vendor’s privacy practices—and the power asymmetry this creates between vendors and libraries. Collaborative efforts, such as SPARC’s Privacy & Surveillance Community of Practice, can plan a key role in supporting future action to address the real privacy risks posed by vendors’ platforms. This report closes with options that institutions may consider to mitigate these risks over the short and longer term.