Evaluating Vision-Language Models on the TriangleCOPA Benchmark

Ankur Chemburkar; Andrew Gordon; Andrew Feng

doi:10.32473/flairs.37.1.135485

Evaluating Vision-Language Models on the TriangleCOPA Benchmark

Autores/as

Ankur Chemburkar University of Southern California
Andrew Gordon University of Southern California
Andrew Feng University of Southern California

DOI:

https://doi.org/10.32473/flairs.37.1.135485

Resumen

The TriangleCOPA benchmark consists of 100 textual questions with videos depicting the movements of simple shapes in the style of the classic social-psychology film created by Fritz Heider and Marianne Simmel in 1944. In our experiments, we investigate the performance of current vision-language models on this challenging benchmark, assessing the capability of these models for visual anthropomorphism and abstract interpretation.

Descargas

PDF (English)

Publicado

2024-05-13

Cómo citar

Chemburkar, A., Gordon, A., & Feng, A. (2024). Evaluating Vision-Language Models on the TriangleCOPA Benchmark. The International FLAIRS Conference Proceedings, 37(1). https://doi.org/10.32473/flairs.37.1.135485

Descargar cita

Número

Vol. 37 (2024): Vol. 37 (2024): Proceedings of FLAIRS-37

Sección

Posters

Licencia

Derechos de autor 2024 Ankur Chemburkar, Andrew Gordon, Andrew Feng

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.

Evaluating Vision-Language Models on the TriangleCOPA Benchmark

Autores/as

DOI:

Resumen

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Desarrollado por

Enviar un artículo

Idioma