Large Language Models for Automated Characterization of Cybersecurity Vulnerabilities using N-Shot Learning
DOI:
https://doi.org/10.32473/flairs.38.1.138858Keywords:
Cybersecurity, LLM, AIAbstract
The US National Vulnerability Database is a public repository of cybersecurity vulnerabilities in software and hardware. This repository is maintained by the National Institute of Standards and Technology (NIST) that developed a Vulnerability Description Ontology framework for characterizing vulnerabilities. Despite advancements in secure software development and vulnerability detection techniques, the number of registered cybersecurity vulnerabilities continues to grow. Characterizing vulnerabilities is essential for selecting effective protection mechanisms to prevent or mitigate cybersecurity vulnerabilities in software and hardware and reduce cyber risks. Manual characterization of vulnerabilities is both time-consuming and costly. While many researchers employ Machine Learning (ML)-based methods to predict characterizations, these methods heavily rely on large amounts of labeled training data. To overcome the challenge of limited labeled data, this paper proposes a solution utilizing three Large Language Models (LLMs) - GPT-4o, Llama-3.1-405B, and Gemini-1.5-flash - to automate the characterization of vulnerabilities across 27 categories, grouped into five noun groups. We use both few-shot and zero-shot learning to prompt the LLMs. Our experimental results show that GPT-4o achieves F1-scores of 80%, 90%, 90%, and 73% in the context, impact-method, attack theater, and logical impact noun groups, respectively, using a few labeled samples. Additionally, Llama achieves an F1-score of 83% in the mitigation noun group.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ayesha Dina, Elijah Needham, Denis Ulybyshev

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.