Attribute Extraction from ecommerce data - the generation of structured fields from unstructured text - is a popular product offering of ours. Our customers use it to improve the quality of their search catalogs, and thereby, their search relevance, faceted search and ad targeting.

Thus far, the scope of this product offering has extended to catalog data, primarily product titles, description and specifications. Now, we’re extending this capability to process user-generated content, including customer questions & answers.

Catalog Data
User-Generated Content

By distilling factual information from customer content, these algorithms can help boost the number and relevancy of structured attributes on popular ecommerce product pages. For brands, sellers and other content creators, this means that the more customers interact with your product listing, the better the quality of the listing gets.

What’s more, since these attributes come from data that customers have volunteered themselves, the importance of these attributes in influencing future purchase decisions is likely to be high.

Consider the example of a Fogg Analog Watch on which this algorithm was applied. The following Q&A entry was detected on the page and run through the Semantics3 Attribute Engine:

Question: is this water ressisant?
Answer: ?? it is
Inference: IS_WATER_RESISTANT - yes

This particular site enables faceted search for water resistant watches. Prior to the addition of IS_WATER_RESISTANT to the attribute list, this particular product did not turn up during faceted searches, and hence had limited visibility for relevant searches. Moreover, since the catalog listing didn’t carry explicit information about the fact that the watch is water resistant, some potential customers, unsure about the product’s characteristics, may have decided against making the purchase.

So how does this algorithm work? It relies heavily on our core TAE (Text Attribute Extraction) engine, and is layered with decision engines that parse the intent and meaning behind the input text. At a high-level, it involves three distinct steps:

  1. Intent inference: Understanding what the user is talking about
  2. TAE: Text Attribute Extraction of meaningful values from the input
  3. Conflation: Conflation of intent and attribute values to understand the semantics of what the user is looking to communicate.

Here are some examples of the algorithm in action:

Q: can we adjust it to our wrist size?
A: yes of course , we can adjust it.
I: IS_ADJUSTABLE - yes

Q: is it suitable to 15 years boy hand
A: yes
I: SUITABLE_AGE - 15

Q: what is the dial size of this watch?
A: 25mm
I: DIAL_SIZE - 25mm

Q: is there only one watch or two???
A: brother two watch nice on this price
I: PACKAGE_QUANTITY - two

Q: chain leather or metal
A: leather
I: STRAP_MATERIAL - leather

Q: Is it ladies watch
A: No it's only for men...
I: GENDER - Men

Q: is it suitable to 15 years boy hand
A: no
I: [None, since this doesn’t give us a comprehensive picture of either gender or age suitability]

Q: How many kidneys enough to buy this ??
A: Try selling your friend's, coz your's aren't worth it!
I: [None, and yes, we have to deal with many such entries]

Interested in using this to boost you product listings? Book a call with us, or drop us an email at [email protected].