Master's Thesis Proposal: Kwangmin Cho | Daniel Guggenheim School of Aerospace Engineering

February

2025

10:00 AM

Location

Weber Space and Technology Building (SST II), Collaborative Visualization Environment (CoVE)

Thursday, February 27, 2025 10:00AM

Master's Thesis Proposal

Kwangmin Cho

(Advisor: Prof. Dimitri Mavris)

"Improving LLM Performance in Aerospace NER Task:

A Study on Data Augmentation and Fine-tuning Strategy"

Thursday, February 27

10:00 a.m.

Weber Space and Technology Building (SST II), Collaborative Visualization Environment (CoVE)

MS Teams

Abstract
As digital transformation progresses across various sectors, systems engineering is also transitioning from document-based practices to Model-Based Systems Engineering (MBSE). This shift is anticipated to improve traceability, streamline verification and validation processes, and enable better integration across system components. In alignment with this transition, there is a growing need for Named Entity Recognition (NER) methods capable of extracting machine-readable entities from requirements written in natural language (NL). NER plays a critical role in identifying and classifying data belonging to target entity types. Among the various approaches for NER, fine-tuning Large Language Models (LLMs) has shown significant promise due to the rapid advancements in their capabilities.

However, fine-tuning LLMs for domain-specific tasks presents significant challenges, particularly in low-resource domains where open-source data is scarce, and in labor-intensive pre-processing tasks such as NER, which requires every token in the training data to be paired with corresponding entity labels. Aerospace requirements exemplify both challenges: their confidential nature restricts data availability, and NER tasks demand not only extensive annotation efforts but also expert-level knowledge. Consequently, the NER task for aerospace requirements engineering remains underexplored compared to other NLP and fine-tuning applications.

To address the challenges of low-resource domains and labor-intensive pre-processing, this study proposes a domain-entity adaptive data augmentation strategy aimed at improving the performance of fine-tuned LLMs without requiring extensive manual labeling efforts. This strategy employs Synonym Replacement (SR) and Label-wise Token Replacement (LwTR) adaptively, based on a detailed analysis of domain-specific entity characteristics. These characteristics are identified by evaluating entity-wise performance across varying replacement rates and augmentation methods. By tailoring the augmentation strategy to account for the desired levels of variability and method preferences for each entity type, this study explores optimal combinations of replacement rates and augmentation methods. The proposed approach seeks to enhance the overall performance of fine-tuned LLMs for NER tasks in aerospace domains, addressing key challenges in data scarcity and annotation costs, while contributing to advancements in requirements engineering.

Committee

Prof. Dimitri Mavris – School of Aerospace Engineering (advisor)
Dr. Olivia Fischer – School of Aerospace Engineering
Dr. Woongje Sung – School of Aerospace Engineering

Daniel Guggenheim School of Aerospace Engineering

College of Engineering

Search

Kwangmin Cho

Georgia Institute of Technology