A Study on Title Encoding Methods for e-Commerce Downstream Tasks

Cristian Cardellino; Rafael Carrascosa

doi:10.32473/flairs.v35i.130550

Authors

Cristian Cardellino Mercado Libre
Rafael Carrascosa Mercado Libre

DOI:

https://doi.org/10.32473/flairs.v35i.130550

Keywords:

natural language processing, deep learning, word embeddings, fasttext, word2vec, meta-prod2vec, bootstrap your own latent, e-commerce, text classification, bert, spanish bert

Abstract

In an e-Commerce marketplace there are usually many downstream tasks which have (relatively) less available resources than the few mainstream priority tasks, like recommendation or search. Examples of these tasks are product categorization, counterfeit detection, forbidden products detection, package size estimation, etc. Usually in these tasks the product titles are an appealing feature since they integrate key aspects of the product, are cheaply available and are easy to process. In this setting it makes sense to invest in a few high quality models that extract as much information as possible from the title and are shared
among many downstream tasks. The present work explores the performance of different models to address different downstream tasks that are present in our marketplace. We also propose an adaptation of a deep network architecture from the Computer Vision field: ``Bootstrap Your Own Latent'' (BYOL), to learn product embeddings based on the title and compare it to several industrial baselines as well as some state-of-the-art supervised models. We found that although in some cases neural network based encoders can be very useful, in many scenarios the baselines given by shallower models are still hard to beat.

A Study on Title Encoding Methods for e-Commerce Downstream Tasks

Authors

DOI:

Keywords:

Abstract

Published

How to Cite

Issue

Section

License

Developed By

Make a Submission

Language