Context is Key: Aligning Large Language Models with Human Moral Judgments through Retrieval-Augmented Generation
DOI:
https://doi.org/10.32473/flairs.38.1.138947Keywords:
Large Language Models, Applied Natural Language Processing, social media analysis, Retrieval Augmented Generation, Case-Based Reasoning, GPTAbstract
In this paper, we investigate whether pre-trained large language models (LLMs) can align with human moral judgments on a dataset of approximately fifty thousand interpersonal conflicts from the AITA (Am I the A******) subreddit, an online forum where users evaluate the morality of others. We introduce a retrieval-augmented generation (RAG) approach that uses pre-trained LLMs as core components. After collecting conflict posts from AITA and embedding them in a vector database, the RAG agent retrieves the most relevant posts for each new query. Then, these are used sequentially as context to gradually refine the LLM's judgment, providing adaptability without having to undergo costly fine-tuning. Using OpenAI's GPT-4o, our agent outperforms directly prompting the LLM while achieving 83\% accuracy and a Matthews correlation coefficient of 0.469 while also reducing the rate of toxic responses from 22.53\% to virtually zero. These findings indicate that the integration of LLMs into RAG agents is an effective method to improve their alignment with human moral judgments while mitigating toxic language.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Matthew Boraske, Richard Burns

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.