Abstract/Details

Entity Information Extraction and Normalization From Scientific and Clinical Texts

Almudaifer, Abdullateef Ibrahim.   The University of Alabama at Birmingham ProQuest Dissertations & Theses,  2024. 31147171.

Abstract (summary)

In the analysis of scientific and clinical texts, the extraction of named entities and their relevant information, such as modifiers, play a pivotal role. Recent advancements in natural language processing (NLP), particularly through the application of transfer learning from pre-trained Transformer models, have greatly enhanced the performance of entity extraction tasks. However, challenges persist with nested entities. This thesis investigates the impact of transfer learning on extracting nested entities and their modifiers, using Opioid Use Disorder (OUD) as a prototype. By adopting a multi-task training strategy, this work enhances the model's capacity to discern and categorize overlapping entities, a task that traditional transfer learning models often struggle with due to their single-focus training on flat entities. Moreover, entity modifiers, which can alter the semantics of entities extracted from clinical texts, are critical for interpreting clinical narratives accurately. Traditional models for identifying these modifiers often rely on regular expressions or feature weights, trained in isolation for each modifier. In contrast, this thesis proposes a novel, unified multi-task Transformer architecture that simultaneously learns and predicts various modifiers. The effectiveness of this approach is validated on the ShARe and OUD data sets, demonstrating state-of-the-art results and highlighting the potential of transfer learning between data sets with partially similar modifiers in clinical texts. This work extends into document-level entity relation extraction, enhancing the ability to understand and analyze the relationships between entities within scientific literature comprehensively. Furthermore, the thesis addresses the essential task of entity normalization - linking textual mentions to ontology concepts. Despite the challenges posed by the diverse expression of concepts and the complexity of ontology graphs, this work introduces a model that utilizes graph neural networks (GNN) to encode entity mentions and ontology concepts in a common hyperbolic space, aiming to enhance entity normalization performance in scientific and clinical texts.

Indexing (details)


Business indexing term
Subject
Computer science;
Artificial intelligence;
Engineering
Classification
0984: Computer science
0800: Artificial intelligence
0537: Engineering
Identifier / keyword
Information extraction; Multi-task learning; Natural language processing; Opioid Use Disorder
Title
Entity Information Extraction and Normalization From Scientific and Clinical Texts
Author
Almudaifer, Abdullateef Ibrahim
Number of pages
108
Publication year
2024
Degree date
2024
School code
0005
Source
DAI-B 85/11(E), Dissertation Abstracts International
ISBN
9798382343990
Advisor
Yan, Da; Wang, Tianyang
Committee member
Feldman, Sue; Guo, Guimu; Osborne, John; Zhao, Kai
University/institution
The University of Alabama at Birmingham
Department
Computer and Information Sciences
University location
United States -- Alabama
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31147171
ProQuest document ID
3050755548
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/docview/3050755548