ABSTRACT
Artificial intelligence (AI) has the potential to positively impact library and archives collections and services-enhancing reference, instruction, metadata creation, recommendations, and more. However, AI also has ethical implications. This paper presents an extensive literature and review analysis that examines AI projects implemented in library and archives settings, asking the following research questions: RQ1: How is artificial intelligence being used in libraries and archives practice? RQ2: What ethical concerns are being identified and addressed during Al implementation in libraries and archives? The results of this literature review show that AI implementation is growing in libraries and archives and that practitioners are using AI for increasingly varied purposes. We found that AI implementation was most common in large, academic libraries. Materials used in Al projects usually involved digitized and born digital text and images, though materials also ranged to include web archives, electronic theses and dissertations (ETDs), and maps. AI was most often used for metadata extraction and reference and research services. Just over half of the papers included in the literature review mentioned ethics or values related issues in their discussions of AI implementation in libraries and archives, and only one-third of all resources discussed ethical issues beyond technical issues of accuracy and human-in-the-loop. Case studies relating to AI in libraries and archives are on the rise, and we expect subsequent discussions of relevant ethics and values to follow suit, particularly growing in the areas of cost considerations, transparency, reliability, policy and guidelines, bias, social justice, user communities, privacy, consent, accessibility, and access. As AI comes into more common usage, it will benefit the library and archives professions to not only consider ethics when implementing local projects, but to publicly discuss these ethical considerations in shared documentation and publications.
INTRODUCTION
Artificial intelligence (AI) has the potential to positively impact library and archives collections and services. Library and archives practitioners have implemented artificial intelligence tools to enhance reference, instruction, metadata creation, recommendations, and more. However, AI also has ethical implications; it may perpetuate harms through systematic bias, privacy violations, inaccuracy, misinformation, and lack of transparency, among other harms. This paper presents an extensive literature review and analysis that examines AI projects implemented in library and archives settings. We discuss how practitioners are currently using AI tools, noting the ethical concerns raised during implementation of those tools. We also suggest implications of this research for responsible AI practice.
METHODS
In this paper, we ask the following research questions:
* RQ1: How is artificial intelligence being used in libraries and archives practice?
* RQ2: What ethical concerns are being identified and addressed during AI implementation in libraries and archives?
To answer these questions, we conducted a literature search and then tagged the literature using qualitative coding methods, following Krippendorf and Charmaz.1 In following Charmaz, we applied the tools of constructivist grounded theory to analyze our research data. Charmaz detailed the process of constructing a theory for the subject of study-the theory is grounded in the data and developed from the point of view of the researcher. For our study, our research team constructed a theory of responsible AI in libraries and archives, grounded in our research data of recently-published literature. In following this research approach, we recognize that we inherently imbue our unique perspectives in the data analysis, notably including the thematic coding of the literature. Our analysis and results represent a view of the data from our eyes. For transparency and to support data reuse, our research data are published in Qualitative Data Repository.2
Searching the Literature
At the time of writing, published resources about AI in libraries continue to proliferate. Journals, conferences, and white papers continue to center the concerns and impact of AI on library services, professional practice, and library users. A literature review is a snapshot in time, and the published works reviewed here provide a sense of trends in the library profession concerning implementation, adoption, modification, and development of AI in libraries. The authors searched available resources in the fall of 2022 and again in the summer of 2023. Using a selection of AIrelated terms for keyword searching, the team evaluated several library and information sciencerelated databases for relevant journal articles, books, and published conference proceedings to begin generating a collection of resources. The following aggregators were consulted in the construction of the literature review (listed alphabetically):
* Academic Search Complete
* Code4Lib Journal and conference
* College & Research Library News
* DH Quarterly
* DH+Lib
* ERIC
* IEEE Big Data - Computational Archival Science
* IMLS-funded grant proposals
* Library and Information Science Abstracts (LISA)
* Library, Information Science and Technology Abstracts (LISTA)
* Web of Science
The authors filtered database results using the Al-related search terms below, using the Boolean search strategy of including "AND librar·" or "AND archiv·" to produce a corpus that was relevant to the field. Keywords for searches were consistent across databases. Search terms, which included the quotation marks, are listed alphabetically, below.
* AI
* algorithm
* "artificial intelligence"
* automation
* chatbot
* "computer vision"
* "deep learning"
* "image recognition"
* "machine learning"
* "neural networks"
* "text mining"
Search results were collected and stored in a shared Zotero library accessible to the authors. A CSV export of the Zotero library is included in the associated dataset in Qualitative Data Repository.3
Inclusion Criteria
We scoped our review to case studies that describe a specific AI-related project in libraries or archives. We also required that the literature have at least one library or archives practitioner in the author group to clearly identify projects that were implemented in libraries and archives, rather than theoretical projects. Consequently, we excluded pilot-style projects by iSchool faculty or other information science researchers if the project did not include a practitioner. For the purpose of our paper, we define practitioner as anyone who is professionally employed in a library or archive-this could include librarians, archivists, library IT staff, clerical staff, administrators, or any other library or archives workers. We scoped our search to projects conducted in the United States and Canada.
We know that library and archives practitioners commonly share their work at conferences. However, we found that conference presentation abstracts did not provide enough detail to be useful in answering RQ2 ("What ethical concerns are being identified and addressed during AI implementation in libraries and archives?"). We therefore only included conference presentations that were in the form of an extended abstract, full presentation slides, or a conference paper. To scope the literature review to the most pertinent and timely resources, we limited our search to case studies published from January 1, 2017, until our initial literature search in fall 2022. As we searched farther into the past, we found smaller and less relevant search results. Table 1 shows the number of case studies by year, which decreased with less recent article publication dates. To account for the rapidly changing environment and discussion around AI, another literature search was conducted in the summer of 2023 to include resources that were published in the latter part of 2022 and early 2023. In sum, our literature review spans the period from 2017 through July 2023. The table below shows the distribution of case studies per year.
Coding the Literature
After reviewing against our inclusion criteria, we identified 89 publications, white papers, and presentations related to AI implementation in libraries and archives. We then used inductive and deductive coding approaches to classify the literature.
Using the tagging function in Zotero, we coded each research output with the following deductive codes:
1. Demographic information: region (using Census Bureau regions and Canada as a single region), type of institution, and size of institution (based on Carnegie classifications)4
2. Types of collaborators, if applicable: computer science collaborators, statistics collaborators, iSchool collaborators, private sector collaborators
3. Types of materials used, if applicable: AV, text, images, either born-digital or digitized
These categories were decided upon through discussion among the author group and through discussion with the Responsible AI advisory board-Dorothy Berry, Stephanie Russo Carroll, Maria Matienzo, Thomas Padilla, and Bohyun Kim.
We also free-coded each research output with inductive codes for the following categories:
1. Types of AI used
2. Ethical considerations discussed
These inductive codes were later standardized, as described below in the Code Standardization section.
Four authors (Mannheimer, Bond, Young, and Kettler) reviewed and tagged a section of the articles in our Zotero library, with each article tagged by a single coder. First, the coder reviewed each article to determine relevance and made note of any articles that did not meet the inclusion criteria (a case study, with at least one practitioner author, based in the United States or Canada). If coders had questions about whether an article should be included, they tagged the article with "needs review"; Mannheimer then reviewed these articles and made final inclusion decisions. If multiple articles discussed the same project, we reviewed all the articles but ultimately grouped all of the codes into a single article.
To tag region, we used broad-level Census Bureau regions, (listed in division order, as designated by the Census Bureau):5
* New England (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont)
* Mid-Atlantic (New Jersey, New York, and Pennsylvania)
* East North Central (Illinois, Indiana, Michigan, Ohio, and Wisconsin)
* West North Central (Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota)
* South Atlantic (Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, Washington, DC, and West Virginia)
* East South Central (Alabama, Kentucky, Mississippi, and Tennessee)
* West South Central (Arkansas, Louisiana, Oklahoma, and Texas)
* Mountain (Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, and Wyoming)
* Pacific (Alaska, California, Hawaii, Oregon, and Washington)
We also coded for institution type with code categories listed alphabetically. (We included Tribal Colleges, HBCUs, and community colleges in the code options, but we did not find articles written by practitioners from these institutions.)
* Academic archive
* Academic data repository
* Academic library
* Community college library
* Government library
* HBCU library
* Library consultancy
* Library research organization
* Medical library
* Museum
* Public library
* Special library
* Tribal College library
If the case study included practitioner authors collaborating with other types of researchers, we noted that with a code. Collaborator codes are as follows (listed alphabetically):
* Computer science collaborators
* ¡School collaborators
* Private sector collaborators
* Statistics collaborators
If the case study was written by academic librarians or archivists, we included the academic institution size (based on Carnegie classifications).6 Note that we included "small" as a code option, but none of the articles we found were from small academic institutions.
* Large: 10,000 or more students
* Medium: Between 3,000 and 10,000 students
* Small: 3,000 or fewer students
If the case study used AI with library and archives materials, we tagged with the type of material (listed alphabetically):
* AV-born digital
* AV-digitized
* ETDs
* Images-born digital
* Images-digitized
* Maps
* Sensor data
* Text-born digital
* Text-digitized
* Web archives
We free-coded with inductive codes for the type of AI being discussed. Some examples, listed alphabetically:
* Chat reference
* Image recognition
* Recommender systems
* Text analysis
* Topic modeling
Lastly, we free-coded with inductive codes for ethical issues discussed in the case studies. Some examples, listed alphabetically:
* Accuracy
* Bias
* Labor practices
* Privacy
* Social justice
Code Standardization
After the inductive coding process was complete, we standardized the codes in the ethics and AI tools categories.7 Authors Mannheimer, Bond, and Young reviewed the ethics and values-related codes and categorized them by suggesting broader categories of codes that could encompass more granular codes. We then met to agree upon and standardize these categories. Ultimately, we arrived at 13 ethics and values-related codes (listed in order of frequency):
1. Accuracy
2. Human-in-the-loop
3. Bias
4. Social justice
5. Transparency
6. Privacy
7. Policy and guidelines
8. User communities
9. Cost considerations
10. Reliability
11. Accessibility
12. Access
13. Consent
Authors Mannheimer and Slipher reviewed and standardized the AI-related codes. The final set of codes for AI tools had a wider variety of themes, with a total of 51 codes. From that set of 51 total codes, the top 20 most-used codes are listed below. The full set of codes is listed in Table 8 and Figure 8.
1. Metadata extraction
2. Natural language processing (NLP)
3. Image recognition
4. Text mining
5. Machine learning
6. Classification
7. Computer automation
8. Search and discovery
9. Optical character recognition (OCR)
10. Topic modeling
11. Recommendation systems
12. Proprietary tools
13. Neural networks
14. Evaluation and assessment
15. Virtual assistants - chat
16. Machine learning - supervised
17. Chat reference
18. Object detection
19. Metadata
20. Crowdsourcing
RESULTS AND DISCUSSION
Our final literature corpus consisted of 89 case studies describing AI implementation in libraries and archives in the United States and Canada.
Again, the research questions guiding this literature review were:
* RQ1: How is artificial intelligence being implemented in libraries and archives practice?
* RQ2: What ethical concerns are being identified and addressed during AI implementation in libraries and archives?
We answered RQ1 by first examining who is implementing AI in library and archives: institution region, type, and size. This demographic information helped us understand what types of libraries and archives are implementing AI. We then examined how AI is being implemented by documenting the types of AI being used. We answered RQ2 by documenting ethical issues that were discussed in the literature and by measuring the proportion of literature that discussed ethics at all. Authors Marcus and Slipher of Montana State University Statistical Resources helped with data wrangling and created the figures in this section.
Demographic Information: Institution Type, Academic Institution Size, and Institution Region
The overwhelming majority of institutions represented in the reviewed case studies were academic libraries, most of which fell under the "large" category (FTE enrollment of more than 10,000 students).8 Other kinds of institutions at which implementation of AI was discussed included government libraries, special libraries, museums, public libraries, academic archives, library consultancies, academic data repositories, medical libraries, and library research organizations, though several of these institutional categories are, generally, part of or associated with academic libraries. See 2ables 2 and 3 and figures 2 and 3.
Regionally the institutions represented the entirety of North America, with the majority located in the South Atlantic, Mid-Atlantic, and East North Central United States Census divisions (areas which comprise much of the Eastern seaboard and Midwest states).9 The West region had the lowest representation in the case studies. This geographic representation generally correlates with both population density and institutional density. See table 4 and figure 4.
Finally, we looked at those groups and individuals with whom library and archives practitioners were collaborating. Fourteen of the 89 articles had co-authors from other disciplines-iSchool, computer science, statistics, and private sector. See table 5 and figure 5.
Use of Library Materials and Type of Material Used
Articles that used library materials-for training and/or analysis-made up about 60% of the relevant literature (see table 6 and figure 6). Of those articles, the majority dealt primarily with either born-digital or digitized text and images, though both web archives and digitized and borndigital AV materials also comprised a significant portion of materials deployed in the reviewed literature (see table 7 and figure 7). These materials are broadly reflective of libraries and archives' electronic holdings, and the identification and application of AI tags revealed several clustered themes in the projects and investigatory work highlighted in the literature.
Types of AI Used and General Themes of AI Implementation
The literature described using AI for a variety of purposes in libraries and archives. See table 8 and figure 8 for a full accounting of AI-related codes. We identified three main themes in AI implementation: metadata extraction, recommendations and reference, and assessment.
Metadata Extraction
The AI tag assigned with the most frequency was "metadata extraction"-reflective of ongoing and established efforts in many repositories to extract, automate, and enhance accurate, usable, respectful, and empathetic metadata for digital content held in collections. This metadata extraction work embodies the first theme we identified in AI implementation, in which practitioners seek to provide and enhance access to digital collections using AI tools. Other frequently assigned and relevant tags in this theme included "natural language processing (NLP)," "image recognition," "classification," "computer automation," "optical character recognition (OCR)," "object detection," and "metadata." Examples of projects clustered within this particular theme included facilitating gender recognition in historical postcards; generating metadata to support discovery and use of audiovisual collections; and utilizing AI and machine learning to extract data from Japanese American confinement records.10
Recommendations and Reference
The second theme centered on the use of AI to automate and streamline digital recommender systems and reference services, particularly as relates to automated chat tools. Practitioners commonly deployed AI in chatbots and discovery systems, providing users with recommendations and directions to appropriate resources based on data like online chat text and past account activity. Relevant tags included "natural language processing (NLP)," "topic modeling," "recommendation systems," "virtual assistants - chat" (e.g., Siri or Alexa), "chat reference," "sentiment analysis," and "reference services." Examples of projects clustered within this theme included efforts to meet growing virtual reference needs with a university library chatbot; utilizing predictive modeling to assess the difficulty level of incoming reference chat queries and direct users to appropriate library staff operators; multiple efforts to implement AI-driven library recommender systems in open discovery environments for users to more easily locate relevant resources; and an AI Navigator tool enabling Library of Congress users to retrieve photographs according to visual similarity.11
Assessment
Finally, the third theme surfaced by the AI tags centered on deploying AI in order to identify patterns and glean new insights from large bodies of data, perhaps one of the more broadly - interpreted topical areas in the literature. All case studies included in the literature review relied, to varying extents, on deploying AI tools to automate processes and generate new information. Looking closer, the case studies included within this third theme specifically sought to reveal patterns, enhance corpuses of data, and assess the efficacy, accuracy, and biases of processes. Relevant tags included "text mining," "topic modeling," "search and discovery," and "natural language processing (NLP)." Projects included using text mining to enhance cataloging for digitized dime novels; using text analysis to add value to thesis and dissertation ETDs; and examining patterns of text reuse in large-scale digital libraries.12
Ethical Considerations in Literature
The literature we reviewed covered 13 different considerations related to ethics and library professional values: accuracy, human-in-the-loop, bias, social justice, transparency, privacy, policy and guidelines, cost considerations, user communities, reliability, access, accessibility, and consent. Accuracy was the most commonly discussed issue and our approach to this issue is further discussed in the section below, Number of Papers that Included a Discussion of Ethics. Table 9 and figure 9 show the number of articles that addressed each ethics and values-related issue.
Number of Papers that Included a Discussion of Ethics
Just over half of the papers included in the literature review (n=49, approximately 55%) mentioned issues related to ethics or values in their discussions of AI implementation in libraries and archives, based on tags assigned by Responsible AI team members. Table 10 and figure 10 show the initial visualization of the proportion of articles that mentioned one or more ethics or values considerations.
However, we noted that issues of "accuracy" (n=23) and "human-in-the-loop" (n=16) were the most heavily discussed. Accuracy is a natural concern for practitioners engaging in technology - based work, particularly when processes are automated. Human-in-the-loop is a term for human intervention in machine learning and other AI modeling-when humans are part of the AI process to support accuracy. With the relative newness of accessible AI technology and its use in libraries and archives, practitioners are likely primarily concerned about the actual functionality of AI- that is, making sure it works. While "accuracy" and "human-in-the-loop" can potentially relate to ethics and values, these issues are expressed in the articles as primarily technical rather than ethical considerations. With this in mind, we removed both tags from the ethics and values tags list to reveal a more nuanced picture of the discussion of ethics in the literature review. If "accuracy" and "human-in-the-loop" are not considered in an examination of the case studies' discussion of ethics, then significantly fewer papers mention ethics than originally indicated (n=31, approximately 35%); see table 11 and figure 11. Those case studies that did address ethics discussed topics including bias, social justice, transparency, privacy, policies and guidelines, user communities, cost considerations, reliability, accessibility, access, and consent.
Analysis of Ethical Considerations for AI in Libraries and Archives
The ethical and values-related considerations discussed in the literature fell into a few broad categories. An overview of these categories and examples of how they were discussed in the literature is provided.
Accuracy and Human-in-the-Loop
In the ordered realm of libraries and archives, both accuracy and human intervention-"humanin-the-loop"-are valued and necessary components of the work required to mediate access to and use of collections and со-mingle in much of the AI-related work reviewed in the literature. The emergence of accuracy as a significant focus makes sense in context. Many of the case studies with this tag sought to generate (meta) data, analyze corpuses of data (like chat reference transcripts), and generally promote and enhance access and discoverability of collections. Accuracy is an important and common measure for evaluating the success of AI tools. The importance of considering human intervention in the use and deployment of AI tools is indicated by the prevalence of case studies assigned the "human-in-the-loop" tag. "Mediating the differing roles and biases of author, librarian, and machine," as Harper et al. wrote, "requires ongoing research and human devotion to consistency."13 Accuracy is also important to users. On this point, it is necessary and important to bring humans back in to verify and validate content.14 Human expertise is also necessary for training machine learning algorithms.15 Dunn et al. also noted the potential harm of computers' lack of moral judgment, citing the central role of human judgment in bias research.16 While accuracy remains a crucial denominator of AI tools, it is not the only evaluative factor for institutions in determining its utility and success.17
Cost Considerations
Cost considerations emerged as both a practical and values-related concern for institutions engaging with AI technologies. Adoption of new technologies to streamline or enhance library services generally carries additional cost, not only for the procurement and maintenance of hardware and software but also in staff time and wages. Several case study authors took these fiduciary and labor considerations into account, acknowledging the cost/benefit analyses they undertook in investigating, developing, and training AI tools. One case study noted the benefits of establishing mutually-beneficial relationships with vendors, leveraging an established relationship with IBM to negotiate complimentary research and development platforms and benefiting from the alignment of their project and the company's product development plans.18 As these authors noted, adoption of new technologies and/or technological workflows is often complex and requires support and buy-in from a variety of institutional stakeholders. With many AI tools, additional costs are incurred through the expenditure of the time and resources needed to train models. For example, one case study noted the computational cost of training a deep neural network versus the cost of human labor in creating training images when developing an algorithm designed to apply a local classification system to images based on visual elements.19 Ultimately, institutions interested in using AI tools will likely need to balance such additional expenses with the cost-savings associated with streamlined workflows and more efficient use of staff time. These cost considerations relate to questions of institutional operations, prompting institutions to name and create space for the valuation of human and machine labor.20
Policy and Guidelines
Policy and guidelines can be used as a strategy for supporting ethical practice in libraries and archives, and a few of the articles in the reviewed literature either mentioned the need for policy and guidelines or specifically followed certain policies and guidelines. Lorang et al., who conducted a document segmentation demonstration project for the Library of Congress, produced a set of recommendations that included developing "a statement of values or principles that will guide how the Library of Congress pursues the use, application, and development of machine learning for cultural heritage."21 In a project that used AI to generate metadata, Dunn et al. described compiling a list of criteria drawn from a number of existing sources, "notably the Principles for Accountable Algorithms and a Social Impact Statement for Algorithms, which guided criteria around social impact and accuracy."22 There were no concrete policies or guidelines that were followed consistently in the literature, which suggests that librarians and archivists would benefit from standardized policies that support responsible use of AI in practice.
Bias, Reliability, and Transparency
Some consideration of bias, reliability, and transparency was evident in the reviewed literature. Contextually, bias in AI systems is understood as deriving primarily from already-biased training data: "[A] 11 machine learning systems share a problem, which is that they are only as good as the data they are trained on. If that corpus has significant biases or omissions, those will be reflected in the outputs."23 It is now widely understood that, given that the vast majority of training data for AI systems has been generated by humans, the subjectivity of the human experience has subsequently infiltrated AI tools themselves. Such intrinsic bias in AI systems is a recognized factor that should thus be addressed or mitigated when developing responsible AI systems. The literature revealed discussion of "neutralizing" or "debiasing" AI systems and AI training data, though other authors underscored the inherently human aspects of AI that simply cannot be made neutral or free of bias.24 A responsible approach for addressing such predispositions in AI tools may then necessarily lie in persistent transparency about such built-in bias, rather than attempting to eliminate it altogether.25 Companies and developers can provide technical and development reports with the introduction of systems and tools, for example, and adopters (like academic libraries) should, in the same vein, be transparent with their users about the inherent and structural bias embedded in AI tools that they deploy. Mitigating the potential harms of AI systems in this way can, consequently, also extend the applicability and use of AI tools.26 Ultimately, understanding bias in AI tools and building transparency remains a work in progress.27
Social Justice and User Communities
The promises of AI as it relates to social justice and user communities are numerous: AI can make a library's materials available in timely and accessible ways.28 AI can produce faster, more extensive, and more integrative approaches to collections processing.29 AI can enhance virtual reference by delivering targeted content to users.30 And, libraries can use AI to enhance civic engagement and economic vitality by deploying advanced data analytics to support communitycentered innovation.31 Concurrent with these opportunities, however, are unintended negative civic consequences and social harms which can also result from AI systems.32 Practitioners recognize the tension that stands between the potential operational benefits of AI and the potential social harms of AI, and some have attempted to address problems centered around social constructs like gender and ethnicity.33 In a project that looks at Al-driven metadata creation for a photo archive, for example, practitioners noted that "mistakes, failures to understand and surface important or salient features of a collection, and a lack of moral judgement would ultimately cause more harm than good."34 For some, the technical problems of an AI project are less challenging to solve than the accompanying social challenges.35
In response to a widespread concern with the social impacts of AI, a consensus emerged in the literature around practically addressing aspects of social responsibility-for example, the notion that humans should maintain high-touch supervision over AI systems. Libraries can, for instance, cultivate relationships with relevant communities in order to support the needs and values of those represented in or affected by AI systems.36 In one example, library collaboration with a local community group helped produce co-authored policies that support responsible access to AIproduced records.37 Practitioners can furthermore guide and monitor the implementation of machine learning models to ensure useful results that mitigate harms.38 Yelton effectively summarized this position: "Artificial intelligence does not actually remove the need for human intelligence."39 Critical, human-centered questions should be asked of AI systems, not with the goal of preventing implementation, but rather with a view toward a responsible implementation that is in tune with library values such as user-centeredness and social responsibility.
Privacy and Consent
The issues relating to privacy and consent discussed in the literature range from passing consideration to using more in-depth strategies to protect user privacy. Hahn and McDonald, whose article described a library resource recommender app, suggested that when mining user data, the library should consider user privacy, data confidentiality, and consent.40 However, the authors did not discuss these privacy considerations in detail in their article; they suggested that such privacy considerations and risks are addressed by a broader library privacy policy. Friedman et al., who used AI to analyze Japanese American Confinement Records, described future plans to convene a community advisory group to consider ethical challenges of these records, including privacy.41 In an example that considers privacy in more depth, Ehrenpreis and DeLooper described how their library prioritized privacy considerations in AI tools and services, through working with a chatbot vendor to adjust privacy settings.42 By default, the chatbot provided by the vendor tracked library users and collected user data; the library was able to negotiate with the vendor to enhance privacy, then work with their IT department to implement the more privacy - focused chatbot setup. Privacy has historically been a key professional value for library and archives practitioners and should be considered when implementing AI in libraries and archives.
Access and Accessibility
Many AI powered tools enhance access writ large (for example, metadata generation for discovery, recommender systems that lead users to new resources), and a few articles reviewed specifically speak to access concerns. For example, Schneider et al. described using an AI-powered email archiving tool to support access to archived emails.43 Only three articles reviewed discussed accessibility for users with disabilities or the idea of access as a library professional value. As an example, Rodriguez and Mune, describing an AI-powered reference chatbot implementation in the library, briefly touched on potential accessibility implications of chatbot text and interactive elements. The authors suggested that the chatbot should be "carefully tested using existing accessibility standards such as W3C"; however, such testing was not described in detail in the article.44 While the idea of accessibility for users with disabilities was present in some case studies, it has not yet been deeply considered in the AI implementation literature.
IMPLICATIONS FOR RESPONSIBLE AI PRACTICE IN LIBRARIES AND ARCHIVES
The case studies reviewed in this paper suggest several implications for responsible AI practice in libraries and archives. First, our data showed that most case studies were published by large academic libraries. This trend in publishing means that a few large libraries may set the tone for other smaller libraries and that the unique needs of public libraries and smaller institutions may not be reflected in the literature.
Many of the ethics considerations in the literature related to commonly-held library values such as privacy, human-centeredness, and social responsibility. In looking at the resources included in our literature review, we noted a strong desire for librarians and archivists to include human intervention in AI processes and tools and to consider our library communities by being clear and transparent about resources and services. Bias is an example of an especially thorny issue for librarians that can be somewhat mitigated with transparent messaging: So long as tools are created by (inherently biased) humans and trained on data also created by (inherently biased) humans, there will be no perfect way to remove bias. In other words, removing bias or "debiasing" is likely an unattainable goal. We can only strive to be transparent about our known biases. Practicing transparency in adoption and implementation is one way that librarians can make clear the biases and other problems that may be inevitable when implementing AI systems.
The data also showed that computer science, statistics, iSchool, and private sector collaborators were involved in many case studies. Among the case studies reviewed here, deployment of AI tools in libraries and archives often benefited from additional technical expertise. We also observed that most of the case studies analyzed focused on library-driven projects. Generative AI tools have proliferated in the time since we completed our literature review, and many vendors are offering new Al-powered products to libraries. As AI resources and services in libraries and archives are increasingly influenced by external collaborators and vendor products, we will need to strengthen our advocacy for library and archives professional ethics and values in the face of potentially conflicting viewpoints.
Lastly, the literature analysis indicated that ethics is currently an underdocumented aspect of AI in libraries. A potential reason for this lack of documentation is that all of the articles in this literature review were case studies-practical "how-to" pieces aimed at other librarians and archivists, which made sense for our practitioner community. Based on the literature, librarians have had and will continue to have difficulty finding information or guidance on how to ethically and responsibly implement AI within specific library contexts. Ideally, ethics would be operationalized as an integral part of the "how-to" process in case studies. Clearer ethical statements and guidelines, both in the private sector and in the libraries and archives profession, could support this integration. As it currently stands, ethics statements are diverse and distributed, and there are not currently any professional guidelines for responsible AI that are widely accepted or well established across the field. Responsible AI practice can be supported by considering ethics iteratively throughout the process of implementing AI systems in libraries and archives.
LIMITATIONS
This literature review was subject to three limitations. First, our research was limited by the venues in which librarians and archivists share their work. We sought to include conference presentations in our literature search, but as noted above in the section Inclusion Criteria, short conference abstracts generally did not provide enough information for us to know whether ethics or values were discussed in the presentation; we therefore only included conference presentations with associated slides or an extended abstract. Moreover, because of the practicebased nature of the field, many librarians and archivists may not publish or present about their work in formal settings such as academic journals and professional conferences, instead sharing their work via informal avenues such as blog posts, social media, or conversations at professional gatherings. Our literature review does not account for these informal sharing strategies.
Second, we focused our literature review on the ways in which librarians and archivists consider ethics and values when implementing AI. Practitioners described a broad range of implementation: at an individual, unit, and institutional level; formal and informal; using vendor products and house-built tools; pilot projects and fully implemented projects. However, differentiating between these different technical and practical aspects of AI implementation was out of scope for our analysis. Future research could delve deeper into the details of AI implementations (see Future Research and Conclusion below). Pedagogical case studies were also beyond the scope of this literature review. This body of literature would merit its own ethics- and values-centered investigation.
Finally, AI technology is advancing quickly. This acceleration is borne out in the proliferation of new and evolving AI strategies and procedures at colleges and universities, which have in turn moved quickly to amend and create policy in tandem with AI advancements. Our literature review therefore reflects the landscape of AI implementation in libraries and archives through mid-2023, and does not include more recent works that may reflect new practices and considerations. This time limitation, combined with the rapid and unpredictable advancements of AI technology, means that such work is a snapshot in time: There is a unique urgency to this particular field of inquiry, one that will benefit from an iterative appraisal of the state of the field.
FUTURE RESEARCH AND CONCLUSION
AI implementation is growing in libraries and archives, and practitioners are using AI for increasingly varied purposes. As AI implementation in libraries and archives continues to grow and change into the future, we see several potential future directions for research. Starting with the literature review itself, we noted additional questions that were out of scope for this paper but would still be of interest to our community. Of the types of AI that we found in the literature (image recognition, text analysis, topic modeling, etc.), which types are more or less suitable for responsible implementation in libraries? Looking next at library size, we noted that most AI projects are currently being undertaken by larger libraries. What implications does this have for AI implementation in libraries and archives across the board? How will medium-sized and smaller libraries be affected by the policies, standards, and norms set by AI projects that are led by larger institutions? And as more AI-related case studies are published, future research could investigate how ethical AI practice in libraries and archives has-and continues to-change over time.
Looking further afield, we see interesting questions related to collaborative projects with nonlibrary organizations and entities: Where are these collaborations happening, what are libraries learning from collaborating with adjacent fields, and will collaborations dwindle as AI becomes more integrated with vendor products? We also observed urgent questions related to labor issues and staffing implications of AI, and answering these questions would help steward the field towards responsible AI implementation. From a methods standpoint, future research could apply additional methods to survey the landscape of AI in libraries and archives, using different approaches such as surveys or web searching to identify more AI implementation activities that are not represented in the current research, and how these implementations address ethics, values, and responsible practice.
Turning toward the practical questions of responsible AI, we also see future work related to ethical guidance, values-based tools, and other decision aids for librarians who want to responsibly implement AL What ethical tools are currently available for practitioners? What new tools could be developed to help support responsible AI implementation?
AI is the most recent technological innovation demanding library and archival adoption and adaptation, situated in a long history of technology developments. This literature review is a snapshot in time, both taking the temperature of current dialogues within the field around responsible AI, as well as providing an understanding of necessary ethical considerations related to AI use in libraries and archives as AI implementation grows. Conversations about ethics are happening-just over half of the papers included in this review mentioned ethics or values-related issues in their discussions of AI implementation in libraries and archives, though these conversations were dominated by technical issues related to accuracy and human-in-the-loop interventions. Case studies relating to AI in libraries and archives are on the rise, and we expect subsequent discussions of relevant ethics and values to follow suit, particularly growing in the areas of cost considerations, transparency, reliability, policy and guidelines, bias, social justice, user communities, privacy, consent, accessibility, and access. As AI comes into more common usage, it will benefit the library and archives professions to not only consider ethics when implementing local projects, but to publicly discuss these ethical considerations in shared documentation and publications.
Submitted: 25 October 2023. Accepted for Publication: 19 August 2024. Published: 23 September 2024.
ENDNOTES
1 Klaus Krippendorff, Content Analysis: An Introduction to Its Methodology (SAGE Publications, 2012); Kathy Charmaz, Constructing Grounded Theory, 2nd ed., Introducing Qualitative Methods (London: Sage, 2014).
2 Sara Mannheimer, Addison Marcus, Sally K. Slipher, Natalie Bond, Scott W. H. Young, Hannah Scates Kettler, Jason Clark, Yasmeen Shorish, Doralyn Rossmann, and Bonnie Sheehey, "Data for: Responsible AI Practice in Libraries and Archives: A Review of the Literature," Qualitative Data Repository, QDR Main Collection (2024), https://doi.org/10.5064/F60ME490.
3 Mannheimer et al., "Data for: Responsible AI Practice in Libraries and Archives."
4 US Census Bureau, "Regions and Divisions," updated October 8, 2021, https://www.census.gov/programs-surveys/economic-census/guidancegeographies/levels.html#par textimage 34; American Council on Education, "Carnegie Classification of Institutions of Higher Education," https://carnegieclassifications.acenet.edu/.
5 US Census Bureau, "Regions and Divisions."
6 American Council on Education, "Size & Setting Carnegie Classification," https://carnegieclassifications.acenet.edu/carnegie-classification/classificationmethodology/size-setting-classification/.
7 Charmaz, Constructing Grounded Theory.
8 American Council on Education, "Size & Setting Carnegie Classification."
9 US Census Bureau, "Regions and Divisions."
10 Ryan Schuerkamp, Jared Barrett, Amber Bales, Alia Wegner, and Philippe J. Giabbanelli, "Enabling New Interactions with Library Digital Collections: Automatic Gender Recognition in Historical Postcards via Deep Learning, "Journal of Academic Librarianship 49, no. 4 (2023): 102736, https://doi.Org/10.1016/j.acalib.2023.102736; Jon W. Dunn, Ying Feng, Juliet L. Hardesty, Brian Wheeler, Maria Whitaker, Thomas Whittaker, Shawn Averkamp, et al., "Audiovisual Metadata Platform Pilot Development (AMPPD) Final Project Report" (IU ScholarWorks, 2021), https://hdl.handle.net/2022/26989: Marissa Friedman, Cameron Ford, Mary Elings Pi, Vijay Singh, and Tracey Tan, "Using AI/Machine Learning to Extract Data from Japanese American Confinement Records," in IEEE International Conference on Big Data (IEEE, 2021): 2210-19, https://doi.org/10.1109/BigData52589.2021.9672076.
11 Sharesly Rodriguez and Christina Mune, "Uncoding Library Chatbots: Deploying a New Virtual Reference Tool at the San Jose State University Library," Reference Services Review 50, no. 3/4 (2022): 392-405, https://doi.org/10.1108/RSR-05-2022-0020; Jeremy Walker and Jason Coleman, "Using Machine Learning to Predict Chat Difficulty," College & Research Libraries 82, no. 5 (2021): 683-707, https://doi.Org/10.5860/crl.82.5.683; Richard Shrake and Scott Ziegler, "You'll Probably Like This One, Too: Using Circulation Data to Automate Recommendations in a Special Collections Library" (presentation at Code4Lib, Los Angeles, 2017), https://2017.code41ib.org/talks/Youll-probably-like-this-one-too-Using-circulationdata-to-automate-recommendations-in-a-special-collections-library; Ben Lee, "Newspaper Navigator" (web page, Library of Congress Labs, 2020),<https://web.archive.Org/web/20240824031755/https://labs.loc.gov/work/experiments/ne wspaper-navigator/.
12 Matthew Short, "Text Mining and Subject Analysis for Fiction; or, Using Machine Learning and Information Extraction to Assign Subject Headings to Dime Novels," Cataloging & Classification Quarterly 57, no. 5 (2019): 315-36, https://doi.org/10.1080/01639374.2019.1653413; Bipasha Banerjee et al., "Applications of Data Analysis on Scholarly Long Documents," in IEEE International Conference on Big Data (IEEE, 2022): 2473-81, https://doi.Org/10.1109/BigData55660.2022.10020935: Peter Organisciak et al., "Examining Patterns of Text Reuse in Digitized Text Collections," in АСМ/IEEE Joint Conference on Digital Libraries (Champaign, 2019): 361-62, https://doi.org/10.1109/1CDL.2019.00071.
13 Charlie Harper, Anne Kumer, Shelby Stuart, and Evan Mészáros, "Chapter 8. Al-Informed Approaches to Metadata Tagging for Improved Resource Discovery," in The Rise of AI: Implications and Applications of Artificial Intelligence in Academic Libraries, ed. Sandy Hervieux and Amanda Wheatley, ACRL Publications in Librarianship 78 (Chicago: Association of College and Research Libraries, 2022).
14 Kirsten Strigel Carter, Abby Gondek, William Underwood, Teddy Randby, and Richard Marciano, "Using AI and ML to Optimize Information Discovery in Under-Utilized, Holocaust-Related Records," AI & Society 37, no. 3 (2022): 837-58, https://doi.org/10.1007/s00146-021-01368w.
15 Matt Enis, "Library of Congress Trains Machine Learning with Crowdsourcing," Library Journal 147, no. 2 (2022): 8-11.
16 Matthew Lincoln, Julia Corrin, Emily Davis, and Scott B. Weingart, "CAMPI: Computer-Aided Metadata Generation for Photo Archives Initiative" (preprint), Carnegie Mellon University, 2020, https://doi.org/10.1184/Rl/12791807.v2.
17 Dunn et al., "Audiovisual Metadata Platform Pilot Development (AMPPD) Final Project Report."
18 Aaron Trehub and Ali Krzton, "Chapter 12. Using IBM Watson for Discovery and Research Support: A Library-Industry Partnership at Auburn University," in The Rise of AI: Implications and Applications of Artificial Intelligence in Academic Libraries, ed. Sandy Hervieux and Amanda Wheatley, ACRL Publications in Librarianship 78 (Chicago: Association of College and Research Libraries, 2022).
19 Ellen Prokop, X. Y. Han, Vardan Papyan, David L. Donoho, and C. Richard Johnson, "AI and the Digitized Photoarchive: Promoting Access and Discoverability," Art Documentation : Bulletin of the Art Libraries Society of North America 40, no. 1 (2021): 1-20, https://doi.org/10.1086/714604.
20 Kevin W. Walker and Zhehan Jiang, "Application of Adaptive Boosting (AdaBoost) in DemandDriven Acquisition (DDA) Prediction: A Machine-Learning Approach," The Journal of Academic Librarianship 45, no. 3 (2019): 203-12, https://doi.Org/10.1016/j.acalib.2019.02.013.
21 Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, and Chulwoo Pack, "Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project" (DigitalCommons@University of Nebraska-Lincoln, 2020): 3.
22 Dunn et al., "Audiovisual Metadata Platform Pilot Development (AMPPD) Final Project Report," 13.
23 Andromeda Yelton, "Chapter 2. HAMLET: Neural-Net-Powered Prototypes for Library Discovery," Library Technology Reports 55, no. 1 (2018): 10-15.
24 James Powell, Kari Sentz, Elizabeth Moyer, and Martin Klein, "Teaching AI When to Care about Gender," The Code4Lib Journal, no. 54 (2022), https://journal.code41ib.org/articles/16718: Matthew Reidsma, "Auditing Algorithms in Commercial Discovery Tools" (presentation at Code4Lib, Washington DC, 2018), https://2018.code41ib.org/talks/auditing-algorithms-incommercial-discovery-tools; Benjamin Lee, "Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset" (Knowledge Commons, 2020), https://doi.org/10.17613/k9gt-6685.
25 Eileen Jakeway, Lauren Algee, Laurie Allen, Meghan Ferriter, Jaime Mears, Abigail Potter, and Kate Zwaard, Machine Learning + Libraries Summit Event Summary (Library of Congress Labs - Digital Strategy Directorate, February 13, 2020), https://labs.loc.gov/static/labs/meta/MLEvent-Summary-Final-2020-02-13.pdf.
26 Clara Turp and Lucy Kiester, "Ethics of AI Algorithms: Looking into PubMed's 'Best Match' Algorithm" (presentation at Code4Lib, online, 2021), https://2021.code41ib.org/talks/Ethicsof-AI-algorithms-looking-into-PubMeds-Best-Match-Algorithm; Dunn et al., "Audiovisual Metadata Platform Pilot Development (AMPPD) Final Project Report."
27 Charlene Chou and Tony Chu, "An Analysis of BERT (NLP) for Assisted Subject Indexing for Project Gutenberg," Cataloging & Classification Quarterly 60, no. 8 (2022): 807-35, https://doi.org/10.1080/01639374.2022.2138666.
28 Lorang et al., "Digital Libraries, Intelligent Data Analytics, and Augmented Description."
29 Lincoln et al., "CAMPI."
30 Rodriguez and Mune, "Uncoding Library Chatbots."
31 Emad Khazraee and Susan Winter, "Library Knowledge Extensions (KNEXT): Data Analytics to Support Innovation Communities," IMLS grant proposal, 2017, http://www.imls.gov/grants/awarded/lg-71-17-0124-17.
32 Dunn et al., "Audiovisual Metadata Platform Pilot Development (AMPPD) Final Project Report."
33 Powell et al., "Teaching AI When to Care about Gender"; Lee, "Compounded Mediation."
34 Lincoln et al„ "CAMPI," 3.
35 Lorang et al., "Digital Libraries, Intelligent Data Analytics, and Augmented Description."
36 Jim Hahn, "Semi-Automated Methods for BIBFRAME Work Entity Description," Cataloging & Classification Quarterly 59, no. 8 (2021): 853-67, https://doi.Org/10.1080/01639374.2021.2014011; Chou and Chu, "An Analysis of BERT (NLP)."
37 Friedman et al., "Using AI/Machine Learning."
38 Walker and Jiang, "Application of Adaptive Boosting (AdaBoost)."
39 Yelton, "Chapter 2. HAMLET," 14.
40 Jim Hahn and Courtney McDonald, "Account-Based Recommenders in Open Discovery Environments," Digital Library Perspectives 34, no. 1 (2018): 70-76, https://doi.Org/10.1108/DLP-07-2017-0022.
41 Friedman et al., "Using AI/Machine Learning."
42 Michelle Ehrenpreis and J. DeLooper, "Implementing a Chatbot on a Library Website," Journal of Web Librarianship 16, no. 2 (2022): 120-42, https://doi.Org/10.1080/19322909.2022.2060893.
43 J. Schneider, C. Adams, S. DeBauche, R. Echols, C. McKean, J. Moran, and D. Waugh, "Appraising, Processing, and Providing Access to Email in Contemporary Literary Archives," Archives and Manuscripts 47, no. 3 (2019): 305-26, https://doi.org/10.1080/01576895.2019.1622138.
44 Rodriguez and Mune, "Uncoding Library Chatbots," 403.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under https://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Artificial intelligence (AI) has the potential to positively impact library and archives collections and services-enhancing reference, instruction, metadata creation, recommendations, and more. However, AI also has ethical implications. This paper presents an extensive literature and review analysis that examines AI projects implemented in library and archives settings, asking the following research questions: RQ1: How is artificial intelligence being used in libraries and archives practice? RQ2: What ethical concerns are being identified and addressed during Al implementation in libraries and archives? The results of this literature review show that AI implementation is growing in libraries and archives and that practitioners are using AI for increasingly varied purposes. We found that AI implementation was most common in large, academic libraries. Materials used in Al projects usually involved digitized and born digital text and images, though materials also ranged to include web archives, electronic theses and dissertations (ETDs), and maps. AI was most often used for metadata extraction and reference and research services. Just over half of the papers included in the literature review mentioned ethics or values related issues in their discussions of AI implementation in libraries and archives, and only one-third of all resources discussed ethical issues beyond technical issues of accuracy and human-in-the-loop. Case studies relating to AI in libraries and archives are on the rise, and we expect subsequent discussions of relevant ethics and values to follow suit, particularly growing in the areas of cost considerations, transparency, reliability, policy and guidelines, bias, social justice, user communities, privacy, consent, accessibility, and access. As AI comes into more common usage, it will benefit the library and archives professions to not only consider ethics when implementing local projects, but to publicly discuss these ethical considerations in shared documentation and publications.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 is Data Librarian, Montana State University
2 is Head of Reference and Instruction and Government Information Librarian, University of Montana
3 is User Experience & Assessment Librarian, Montana State University
4 is Associate University Librarian for Academic Services, Iowa State University
5 is Data Science Graduate Research Assistant, Montana State University