Shop-The-Room

A Zero-Shot Foundation Model Framework for Visual Discovery in E-Commerce

Authors

  • Vaidyanath Areyur Shanthakumar Bed Bath & Beyond Inc.
  • Clark Barnett
  • Vipul Mehra
  • Komson Chanprapan Bed Bath & Beyond Inc.
  • Ravi Shankar
  • Tathagata Mukherjee University of Alabama in Huntsville

DOI:

https://doi.org/10.32473/flairs.39.1.141898

Keywords:

E-commerce, Recommender System, Zero-Shot, Visual Product Discovery, Grounding-DINO, CLIP, Object Detection, Feature Extraction

Abstract

Visual product discovery systems have become integral to major e-commerce platforms enabling customers to identify visually similar items from complex scene imagery. Traditionally, such systems have relied on a supervised pipeline comprising object detection, feature extraction, and nearest-neighbors retrieval. However, building these systems at scale necessitates frequent and extensive model-training with vast amounts of annotated data which is both cost-prohibitive, and labor-intensive, particularly for small and medium enterprises managing dynamic inventories. The advent of “Pre-trained Foundation Models” characterized by their capability for zero-shot transfer, presents a compelling alternative that eliminates the need for domain-specific model training and labeled annotations. In this work we demonstrate the implementation of a scene-based visual shopping system called Shop-The-Room, utilizing state-of-the-art foundation models at a major US online retailer. We detail the proposed framework, implementation details, pitfalls, and learning outcomes of this endeavor. Finally, we present the results of both quantitative and qualitative evaluations to validate the system’s efficacy in a real-world setting here at Bed Bath & Beyond.

Downloads

Published

06-05-2026

How to Cite

Areyur Shanthakumar, V., Barnett, C., Mehra, V., Chanprapan, K., Shankar, R., & Mukherjee, T. (2026). Shop-The-Room: A Zero-Shot Foundation Model Framework for Visual Discovery in E-Commerce. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141898

Issue

Section

Special Track: Neural Networks and Data Mining