Reward-Guided Fine-Tuning of Language Models with Social Feedback
DOI:
https://doi.org/10.32473/flairs.39.1.141777Abstract
Large language models (LLMs) are increasingly used in assistive conversational systems but often struggle to adapt to human tone and context. While prior work emphasizes factual accuracy and safety, less attention has been given to context sensitive conversational behavior. In this work, we explore whether real world interaction signals can improve context driven reward adaptability. We use Reddit conversations, as a proxy to group conversations, to train a reward model that predicts the effectiveness of replies in context, then fine-tune a language model with Proximal Policy Optimization (PPO) to encourage responses aligned with conversational tone and user expectations. Across benchmarks, the resulting models show improved humor and engagement while maintaining comparable reasoning ability, alongside shifts in toxicity and bias consistent with the training signal. These results suggest that alignment requires not only correctness, but also sensitivity to tone, intent, and conversational context.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Jared Scott, Jesse Roberts

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.