With the constant growth of the E-commerce worldwide market, competitive intelligence has become an increasingly complex task due to:

  • An increasing volume of products on sale
  • A high inventory turnover rate, implying a large quantity of products to monitor in a limited time frame
  • A diverse range of e-commerce products  (electronics, fashion, food, etc)
  • The emergence of new e-commerce websites, as well as the transition of traditional retailers to E-business, combining physical and online sale points

In such a context, how can e-commerce retailers position themselves with respect to the competition? For over a decade, we have been supporting our clients with our competitive intelligence solutions. Recognized for the quality of our data, we are constantly innovating to improve our algorithms and adapt to the rapidly changing e-commerce ecosystem.  Using cutting-edge Machine Learning models, we are building our next generation of algorithms to extensively cover the e-commerce market.

One of our most important challenges is what we call matching. This consists of identifying the product being sold in each offer. For example, if we consider an offer titled black Apple Iphone 7 256 Go, it’s easy for a human to identify that this offer corresponds to an Apple (brand) smartphone (product type). More precisely, it refers to a black (color) IPhone 7 (model) with 256 GB (storage capacity). But how can a machine accomplish such a task? We could teach a machine to learn to recognize key product characteristics, such as its storage capacity, color and model,  as in the smartphone example. Similarly, different features need to be learned to identify a particular dress, such as, amongst others, its length, collar type and fabric. For bed linens, the key identifying factor would more likely be the thread count. We would also need to teach the machine the different ways that we can describe the same product, e.g., sports shoes can also be called running shoes, sneakers or trainers.

So as you can imagine, in a universe where domain-dependent knowledge is required, it would be very expensive to manually cover the entire e-commerce market.  It is thus vital to have intelligent algorithms that are able to learn inherent product characteristics across the whole spectrum of e-commerce products.

Based on text (description, title, brand, color, size, etc.) and image data , our algorithms learn to identify the products associated to particular offers. The first crucial step is what we call feature engineering. This consists of finding the best mathematical representation that contains the semantics of the offers. The representation comes in the form of vectors and is obtained by digesting images and textual data associated to the offers.

The power of our algorithms therefore depends on our ability to find the best vector representations of the offers.  On the one hand, we implement Natural Language Processing (NLP) methods to preprocess and transform textual data.  On the other hand, we use Deep Learning algorithms to encode images into feature vectors. This results in two independent mathematical representations for the text and the image. Using both representations and mathematical methods to combine them, we are able to match similar offers based on the similarity of their vector representations.

Before the matching process, we start with a preliminary automatic categorisation step. This step serves two purposes: we end up with an organized catalog of products and we only look for potential offer matches  within the same category.

Given that data quality has always been our number one priority, we have opted for an Active Learning approach. Instead of blindly relying on what machine learning models predict, we expose the predictions to human matchers. Initially, all predictions must be validated by a human. After some feedback cycles, we will learn under which conditions we can trust the predictions, while keeping high value-added tasks for our quality control team. Finally, the method is called active learning because the model is constantly retrained using human feedback in order to improve its accuracy.

So, that was the first glimpse of the latest developments at WorkIT Software. This incursion into the Machine Learning world is a crucial step that allows us to be a leader in the competitive intelligence market and to provide our customers with  an ever growing catalog of e-commerce products

Article written by Felipe Aguirre Martinez
Lead Data Scientist at WorkIT Software – Doctor of Philosophy (PhD), Computer and Information Sciences