Retail catalog feeds from top retailers across the ecommerce world, made possible by:
Content Extraction: Supervised and unsupervised systems (patent pending) that can systematically spider through websites and extract meaningful information from HTML.
WhisperCrawl: Scalable crawling infrastructure that crawls and processes terabytes of data on a daily basis.
The suite includes:
Extraction:?Generates structured attributes from unstructured text, HTML and images.
Normalization: Maps attribute values to a standardized unified representation.
Inference: Imputes and estimates structured attributes by association, even if the data isn't explicitly stated in the input.
A curated database of hundreds of millions of products, aggregated from thousands of retailers across the web.
The Catalog is kept updated by virtue of daily crawls which both refresh this data, and discover?new SKUs. Quality is maintained by an elaborate pipeline built on the back of algorithms, statistics and humans-in-the-loop.
This data is maintained as a standard database, and in the form of a knowledge graph.
Automatic assignment of HS codes to products. The system is built on the back of real-world decisions made by professionals on-the-ground. The system is also tuned to understand Schedule descriptions and CROSS Rulings.
Given a URL, UPC, ASIN, model number or even just a keyword, this system is capable of intelligently launching search missions across the web to identify product information relevant to the request. This helps automatically enrich incomplete datasets.
Algorithmic pipeline that helps identify and eliminate duplicates from catalogs. This is built on the back of a Product2Vector search engine that enables similarity search at scale across hundreds of millions of products.
This system mines demand signals such as rank, rating and review count from ecommerce websites. This helps reverse engineer the demand of a SKU on a particular platform. Our historical databases built over the years span tens of billions of data points.
This data feeds into demand forecasting and dynamic pricing models.