Evaluating Vision-Language Models on the TriangleCOPA Benchmark
DOI:
https://doi.org/10.32473/flairs.37.1.135485Abstract
The TriangleCOPA benchmark consists of 100 textual questions with videos depicting the movements of simple shapes in the style of the classic social-psychology film created by Fritz Heider and Marianne Simmel in 1944. In our experiments, we investigate the performance of current vision-language models on this challenging benchmark, assessing the capability of these models for visual anthropomorphism and abstract interpretation.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Ankur Chemburkar, Andrew Gordon, Andrew Feng

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.