Reward-Guided Fine-Tuning of Language Models with Social Feedback

Authors

  • Jared Scott Tennessee Tech University
  • Jesse Roberts

DOI:

https://doi.org/10.32473/flairs.39.1.141777

Abstract

Large language models (LLMs) are increasingly used in assistive conversational systems but often struggle to adapt to human tone and context. While prior work emphasizes factual accuracy and safety, less attention has been given to context sensitive conversational behavior. In this work, we explore whether real world interaction signals can improve context driven reward adaptability. We use Reddit conversations, as a proxy to group conversations, to train a reward model that predicts the effectiveness of replies in context, then fine-tune a language model with Proximal Policy Optimization (PPO) to encourage responses aligned with conversational tone and user expectations. Across benchmarks, the resulting models show improved humor and engagement while maintaining comparable reasoning ability, alongside shifts in toxicity and bias consistent with the training signal. These results suggest that alignment requires not only correctness, but also sensitivity to tone, intent, and conversational context.

Downloads

Published

06-05-2026

How to Cite

Scott, J., & Roberts, J. (2026). Reward-Guided Fine-Tuning of Language Models with Social Feedback. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141777