A Study on Title Encoding Methods for e-Commerce Downstream Tasks
Keywords:natural language processing, deep learning, word embeddings, fasttext, word2vec, meta-prod2vec, bootstrap your own latent, e-commerce, text classification, bert, spanish bert
In an e-Commerce marketplace there are usually many downstream tasks which have (relatively) less available resources than the few mainstream priority tasks, like recommendation or search. Examples of these tasks are product categorization, counterfeit detection, forbidden products detection, package size estimation, etc. Usually in these tasks the product titles are an appealing feature since they integrate key aspects of the product, are cheaply available and are easy to process. In this setting it makes sense to invest in a few high quality models that extract as much information as possible from the title and are shared
among many downstream tasks. The present work explores the performance of different models to address different downstream tasks that are present in our marketplace. We also propose an adaptation of a deep network architecture from the Computer Vision field: ``Bootstrap Your Own Latent'' (BYOL), to learn product embeddings based on the title and compare it to several industrial baselines as well as some state-of-the-art supervised models. We found that although in some cases neural network based encoders can be very useful, in many scenarios the baselines given by shallower models are still hard to beat.
How to Cite
Copyright (c) 2022 Cristian Cardellino, Rafael Carrascosa
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.