Matching and Analyzing Products in Marketplaces Using LLMs
towardsai.net
Author(s): Igor Novikov Originally published on Towards AI. Image by the authorThere is a classical problem in any marketplace of making sense of product listings, that is especially exacerbated by users creating a mess of a description of really simple products. Youve got a simple, everyday item like a blue T-shirt described as everything from Sky Blue to Royal Blue to Navy Aqua Marine. Some sellers even cram product details into the image itself, scribbling Best Shirt Ever! in neon green letters over a poorly lit photo. The result? Two identical products can appear as if they came from completely different universes.This makes searching, categorizing, and analyzing these products a real headache (or a job security guarantee for data scientists depends on how you look at it). For marketplace pricing management, it could be important to parse competitors to do the competitive pricing analysis, so it adds another level of complexity as competitive platforms have different product card formats.Of course, over the years several ways to deal with that were invented:Category-Specific MatcherTrain a dedicated matcher model or algorithm for each product category one for electronics, one for clothing, one for cosmetics, and so on. Each approach is specialized but can become a juggling act if you have 10,000 categories.Candidate Search Using EmbeddingsEmbeddings are vector representations of data (e.g., product descriptions or names) used to determine similarity. Using text or image processing methods (e.g., word2vec, sentence-transformers), it is possible to find similar products.Attribute Extraction for Each ProductProduct data (such as brand, model, color, size, etc.) is extracted from descriptions to analyze and match products at a deeper level.Gradient BoostingGradient boosting algorithms (such as CatBoost or XGBoost) are used for classification tasks, determining whether products are similar or not. These models are trained on pre-labeled data and take into account both textual and numerical attributes.It does help to a point, but people are still amazingly creative in describing items. Attribute hell is real when youve got 400 ways to say comfortable, and clothing items can look identical but be labeled in drastically different ways (Eco-friendly hemp T-shirt vs. 100% sustainable plant-based top).One common problem is attribute hell, where you have so many attributes for similar things that matching fails.For clothing and things like that, that can be identical but described very differently matching can only be done using photos and images based on colors, shapes, and fabric patterns.The new approach uses LLMs and Vision-Language Models (VLMs) multimodal abilities to do that. Many platform already are doing that:Weve used multiple large language models to accurately create or improve over 850 million pieces of data in a catalog.Without the use of generative AI, this work would have required nearly 100 times the current headcount to complete in the same amount of time.Doug McMillon, CEO, WalmartHere is the general framework:1. Candidate Search Using EmbeddingsThese embeddings allow the system to compute the similarity between different products efficiently, so the first step is the same as before. For example, embeddings generated by advanced LLMs (like OpenAIs models or Sentence-Transformers) can be used to identify potential matches by comparing the vectors of product descriptions or attributes.2. Attribute Extraction for Each Product Using LLMsLLMs extract specific product attributes (e.g., brand, color, size, material) from their textual descriptions.For instance, Stylish bright red cotton T-shirt for men is broken down into color=red, material=cotton, targeted demographic=men, etc. Its like having your own personal (text) shopper whos annoyingly accurate.See examples below.3. Matching Two Products Based on Extracted Attributes Using LLMsOnce attributes are extracted, LLMs are used to compare two products based on these attributes and determine if they are a match.LLMs can consider both the explicit similarities (e.g., same brand and size) and implicit ones (e.g., eco-friendly vs. sustainable).For example, matching Mens Nike Air Max sneakers with Nike Air Max shoes for men by recognizing that they refer to the same product.See examples below.4. Matching Based on Images Using VLMs (Vision-Language Models)Some attributes, like color, design, or unique patterns, are best analyzed visually rather than textually. VLMs combine visual and textual data for more accurate product matching. These models analyze product images alongside their descriptions to better understand the product.For example: matching a product image of a black leather handbag with another similar handbag by identifying visual features (shape, texture, etc.) and combining them with textual data.Benefits of LLMs for matchingMore accurate matching. Fewer false mismatches (like that time you got a turquoise shirt instead of teal).Precision and recall in the 9099% range, especially with fine-tuned modelsHandles diverse data. Text, images, random emojis, you name it.Enhanced understanding of product descriptions and visual features.No specialized training needed: No more building and maintaining 50 different specialized matchers.DrawbacksRequires much more computing powerThe latency is not great. It works for batch processing but not real-timeOverall, for some categories, there is no point in using LLM if it can be avoided. If the category is simple and standard algorithms handle it well, it will be significantly cheaper. LLMs are more suitable for processing medium and complex categories. For example, car parts they have VINs that identify the part, and matching by VINs is enough for most purposes.Attributes extraction using LLMI have a product card from the "Refrigerators" category on a marketplace. I need to extract and format attributes from it.key_attributes_list = [ { "name": "model", "attribute_comment": "Refrigerator model. Include brand or company name but exclude color and size." }, { "name": "capacity", "attribute_comment": "Total volume of the refrigerator, typically measured in liters (L). Look for terms like 'Total Capacity'. If unavailable, set value to null." }, { "name": "energy_efficiency", "attribute_comment": "Extract the energy efficiency class, such as 'A++', 'A+', or 'B'. Look for terms like 'Energy Efficiency Class'. If unavailable, set value to null." }, { "name": "number_of_doors", "attribute_comment": "Number of doors, such as '1', '2', or 'Side-by-Side'. Look for terms like 'Doors', 'Number of Doors'. If missing, set value to null." }, { "name": "freezer_position", "attribute_comment": "Position of the freezer, such as 'Top', 'Bottom', or 'Side'. Extract from terms like 'Freezer Position'. If unspecified, set value to null." }, { "name": "defrost_system", "attribute_comment": "Type of defrost system, such as 'No Frost' or 'Manual Defrost'. Look for terms like 'Defrost System'. If not mentioned, set value to null." }, { "name": "dimensions", "attribute_comment": "Include height, width, and depth in centimeters (e.g., '180x70x60'). Look for terms like 'Dimensions''. If not available, set value to null." }]Output:extracted_attributes = { "model": "Upstreman UF70", "capacity": "7.0 cubic feet", "energy_efficiency": "ENERGY STAR certified", "number_of_doors": "1", "freezer_position": "Convertible", "defrost_system": "Low Frost Design", "dimensions": "56.1x21.3x22.76 inches", }Matching with LLMMatching Color examplePrompt:I have two product cards from the category "Freezers" on a marketplace.I need to compare the attribute "color."The response should include:is_match: A boolean indicating whether the values match.reason: A detailed explanation of the comparison result.Format the answer in JSON.Output:{"is_match": false,"reason": "The color attributes do not match. Product 1 has 'White' while Product 2 has 'Stainless Steel'."}Matching multiple attributesPrompt:I have two product cards from the category "Freezers" on a marketplace.I need to compare their attributes to determine if the two products are similar enough to be considered the same model.Attributes to compare:Model name: Replace + with plus and remove spaces for comparison.Capacity: Match if the difference is less than or equal to 0.2 cubic feet.Energy efficiency: Match if both are certified by ENERGY STAR or have identical efficiency ratings.Dimensions: Match if the height, width, and depth differences are each within 1 inch.Color: Match only if the color is identical.The response should include:is_match: A boolean indicating whether the two products are considered the same model.reason: A detailed explanation of the comparison result.Format the answer in JSON.Output:{ "is_match": true, "reason": "The two freezers are considered the same model based on the following comparisons: 1. Model name matches after normalization ('UpstremanUF70'). 2. Capacity is similar (7.0 vs 6.9 cubic feet, difference is 0.1 cubic feet, which is within the acceptable range of 0.2 cubic feet). 3. Both are ENERGY STAR certified, indicating identical energy efficiency. 4. Dimensions are nearly identical (differences: height 0.1 inch, width 0.1 inch, depth 0.1 inch, all within the acceptable tolerance of 1 inch). 5. Both products share a similar color: 'White and Pearl White."}Working with Photos and ImagesAttribute extractionSay we have an image like that. We are going to use the same prompt as for text extraction:I have a product card from the "Refrigerators" category on a marketplace. I need to extract and format attributes from it.key_attributes_list = [ { "name": "dimensions", "attribute_comment": "Include height, width, and depth in centimeters (e.g., '180x70x60'). Look for terms like 'Dimensions''. If not available, set value to null." }]Output:{ "dimensions": "56.10x21.30x22.76 inches"}One more:Output:{ "category": "Headphones", "brand": "KVIDIO", "color": "Black", "features": [ "Over-ear design", "Wireless", "Bluetooth connectivity", "Padded ear cushions" ]}Comparing clothing itemsPrompt:As a fashion expert, you are tasked with comparing two photographs of women's tops from a marketplace.Your analysis should focus exclusively on the tops themselves, disregarding:Any other visible clothing items, even if part of the outfit or matching in style.Variations in poses, body movements, or how the tops are worn.Your goal:Determine whether the tops are identical or an exact match.Provide a clear "Identical" or "Different" answer, followed by a single concise sentence explaining your reasoning.Answer in JSON format.And the result? I was able to detect a very subtle difference in the neckline, which I, being a man, was not able to notice.Output:{"result": "Different","reason": "The tops have distinct necklines; the first features a crew neck while the second has a scoop neck."}Heres another one:Output:{"result": "Identical","reasoning": "Both tops have the same color, design, and fabric characteristics, including the long sleeves, slim fit, and light aqua tone."}Perfecto!Models and throughputThroughput is very important for a marketplace as there are thousands of products and more added every day. Throughput will depend on two things:Your hardwareModel size and typeIf you use a 70b model like LLama or Qwen, it will work well but slow. So unless you have a supercomputer, you will face 05 RPS for large models on commercially available GPUs.To improve performance several things can be done.Using a smaller model fine-tuned on your product dataset. How small depends on the task and likely you will have to train several small models as that can fit only so many trainable categories. 7b models in general work well for text extraction, and 1b for a limited subset of fine-tuned attributes. It is possible to distill such a model from a bigger one.That may improve your throughput by 1020x. However, note that smaller models cant handle lots of attribute extraction in one go, and will struggle with complex prompts, so youll have to test for your task.Quantization also shows a 2050% improvement in RPS without a significant drop in model work quality.In a huge volumes scenario, it doesnt make sense to use anything but a self-hosted model as costs for using OpenAi or Anthropic will be prohibitively high, but they are good for prototyping and trying things up. For the self-hosted scenario, I would go with the latest instruct versions of LLama or QWen as they have shown the best results in our testing, starting with 70b for testing and then optimizing to smaller models until the moment you stop getting a satisfactory performance.You will likely have to fine-tune the model for certain categories, for example, medical space has a lot of abbreviations as well as the construction industry. Generic model may not handle well thos, so a LORA pass is a good idea in this case. Additionally take the language into consideration, for example, LLama works well in English but in Chinese is likely to hallucinate. Gwen would likely be a better choice.Have fun!Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
0 Comments
·0 Shares
·33 Views