Graduate Studies
First Advisor
Robert Dyer
Second Advisor
Witawas Srisa-an
Degree Name
Doctor of Philosophy (Ph.D.)
Department
Computer Science
Date of this Version
12-11-2024
Document Type
Dissertation
Citation
A dissertation presented to the faculty of the Graduate College at the University of nebraska in partial fulfillment of requirements for the degree of Doctor of Philosophy
Major: Educational Studies (Educational Leadership and Higher Education)
Under the supervision of Professor Deryl K. Hatch-Tocaimaza
Lincoln, Nebraska, February 2020
Abstract
Software engineering maintenance tasks often require associating code changes into groupings of related units of work to have as much information as possible about the developments toward addressing a specific code task. A comprehensive understanding of how a code task has evolved helps developers make better decisions about changes in the overall codebase, where a commit represents the set of code changes made to the codebase at a specific time. While the concept of work items as logically related code changes has been primarily theoretical, its impact on software maintenance tasks, such as tracing the origins of bugs or fixes spanning multiple commits while scanning through real-world software repositories' commit histories, remains unexplored. This thesis introduces heuristic-based algorithms to mine work items from commit histories in open-source repositories across different scenarios. First, when issue tags are available throughout the commit history, we developed our first heuristic that mines associations out of validated issue tags from issue tracker systems such as Jira and GitHub. We generated a dataset of approximately 130,000 work items across repositories written in Java, Kotlin, and Python, with each work item group having a numerical confidence score for relatedness. Second, in scenarios where a reference commit is known and the goal is to generate work items associated with it, we developed our second heuristic that implements a method-level tracking mechanism. This approach scans the repository’s commit history backward, identifying overlapping code modifications linked to the reference commit to generate related work items. Third, when an automated and fast way for identifying work items is needed, we explore using pre-trained LLMs with prompts containing different levels of detail from commit diffs and logs to classify commit pairs as related or unrelated work items. Alongside this, we generate two work item datasets with labeled ground truth for fine-tuning purposes. Finally, we apply our top-performing work item heuristic to a software maintenance task in the context of the SZZ algorithms, which aim to track a bug's introducing commit for a given fix commit. Specifically, we built a new SZZ variant that integrates work item awareness, which generated the first empirical evidence that bugs and fixes constitute work items; and reported a 2-9% F1 score improvement in bug-introducing commit identification over traditional SZZ algorithms, that increases from 3% to 14% when considering only the subset of cases where work items were identified.
Recommended Citation
Perez-Rosero, Salomé, "Mining Work Items to Streamline Software Maintenance Tasks" (2024). Dissertations and Doctoral Documents from University of Nebraska-Lincoln, 2023–. 249.
https://digitalcommons.unl.edu/dissunl/249
Comments
Copyright 2024, the author. Used by permission