MineObserver: A Deep Learning Framework for Assessing Natural Language Descriptions of Minecraft Imagery

Authors

  • Jay Mahajan University of Illinois at Urbana Champaign
  • Samuel Hum University of Illinois at Urbana Champaign
  • Jeff Ginger University of Illinois at Urbana Champaign
  • H. Chad Lane University of Illinois at Urbana Champaign

DOI:

https://doi.org/10.32473/flairs.v35i.130729

Keywords:

Computer Vision, Natural Language Processing, Pedagogical Agent

Abstract

This paper introduces a novel approach for learning natural language descriptions of scenery in Minecraft. We apply techniques from Computer Vision and Natural Language Processing to create an AI framework called MineObserver for assessing the accuracy of learner-generated descriptions of science-related images. The ultimate purpose of the system is to automatically assess the accuracy of learner observations, written in natural language, made during science learning activities that take place in Minecraft. Eventually, MineObserver will be used as part of a pedagogical agent framework for providing in-game support for learning. Preliminary results are mixed, but promising with approximately 62% of images in our test set being properly classified by our image captioning approach. Broadly, our work suggests that computer vision techniques work as expected in Minecraft and can serve as a basis for assessing learner observations.

Downloads

Published

04-05-2022

How to Cite

Mahajan, J., Hum, S., Ginger, J., & Lane, H. C. (2022). MineObserver: A Deep Learning Framework for Assessing Natural Language Descriptions of Minecraft Imagery. The International FLAIRS Conference Proceedings, 35. https://doi.org/10.32473/flairs.v35i.130729

Issue

Section

Special Track: Artificial Intelligence in Games, Serious Games, and Multimedia