Evaluating Vision-Language Models on the TriangleCOPA Benchmark
DOI:
https://doi.org/10.32473/flairs.37.1.135485Resumen
The TriangleCOPA benchmark consists of 100 textual questions with videos depicting the movements of simple shapes in the style of the classic social-psychology film created by Fritz Heider and Marianne Simmel in 1944. In our experiments, we investigate the performance of current vision-language models on this challenging benchmark, assessing the capability of these models for visual anthropomorphism and abstract interpretation.
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2024 Ankur Chemburkar, Andrew Gordon, Andrew Feng
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.