Add InstantSearch and Autocomplete to your search experience in just 5 minutes
A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...
Senior Product Manager
A good starting point for building a comprehensive search experience is a straightforward app template. When crafting your application’s ...
Senior Product Manager
The inviting ecommerce website template that balances bright colors with plenty of white space. The stylized fonts for the headers ...
Search and Discovery writer
Imagine an online shopping experience designed to reflect your unique consumer needs and preferences — a digital world shaped completely around ...
Senior Digital Marketing Manager, SEO
Winter is here for those in the northern hemisphere, with thoughts drifting toward cozy blankets and mulled wine. But before ...
Sr. Developer Relations Engineer
What if there were a way to persuade shoppers who find your ecommerce site, ultimately making it to a product ...
Senior Digital Marketing Manager, SEO
This year a bunch of our engineers from our Sydney office attended GopherCon AU at University of Technology, Sydney, in ...
David Howden &
James Kozianski
Second only to personalization, conversational commerce has been a hot topic of conversation (pun intended) amongst retailers for the better ...
Principal, Klein4Retail
Algolia’s Recommend complements site search and discovery. As customers browse or search your site, dynamic recommendations encourage customers to ...
Frontend Engineer
Winter is coming, along with a bunch of houseguests. You want to replace your battered old sofa — after all, the ...
Search and Discovery writer
Search is a very complex problem Search is a complex problem that is hard to customize to a particular use ...
Co-founder & former CTO at Algolia
2%. That’s the average conversion rate for an online store. Unless you’re performing at Amazon’s promoted products ...
Senior Digital Marketing Manager, SEO
What’s a vector database? And how different is it than a regular-old traditional relational database? If you’re ...
Search and Discovery writer
How do you measure the success of a new feature? How do you test the impact? There are different ways ...
Senior Software Engineer
Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...
Sr. Developer Relations Engineer
In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...
Chief Executive Officer and Board Member at Algolia
When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...
Senior Digital Marketing Manager, SEO
Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...
Senior Digital Marketing Manager, SEO
“Hello, how can I help you today?” This has to be the most tired, but nevertheless tried-and-true ...
Search and Discovery writer
Search is a complex problem that is hard to customize to a particular use case — even search experts need a lot of iteration to configure a search engine.
Search is composed of several steps that all require different configurations. It is usually split into three steps:
In most modern search engines, the first two steps are executed concurrently for performance reasons, but each has distinct benefits — first to expand the search for the largest possible result set (ie, optimize for recall) and then order results from most to least relevant (ie, optimize for precision).
Independently of the technology we use, both the retrieval and the ranking processes are difficult and contain a lot of parameters.
In this article, we explain where weights (also called boosts) are used as a solution to these challenges in search. Weights can feel like an intuitive solution, which is why they’ve been used for a long time. However, they can actually be dangerous and often counterproductive.
The main challenge associated with the retrieval phase is to ensure that all the potentially relevant records are found. This challenge takes a different approach depending on the technology that is used.
Today, more and more search engines rely both on keyword search and semantic search, meaning that the two challenges need to be addressed.
In the ranking phase, the challenge is to merge all the signals together to have one final way to order the results. In particular, there are three categories of signals that are merged together in the ranking
Mixing those criteria is very complex and the reflex is often to use weights set by the business to merge all those signals.
In the retrieve phase, the query information can be found inside multiple attributes (also called fields) of a record. Not all attributes have the same importance and this is why most engines ask the customer to set a score (or a boost) to every attribute to translate the idea of the business importance of an item. Setting those scores is not an easy task and often leads to relevance problems as there is an infinite number of configurations that are possible, a lot of these configurations producing very similar results.
From the business point of view, there is a difference between the value of each attribute. For example, in an ecommerce store, it is not the same to match the query “laptop 13” in the “category” attribute than in the “description” attribute (you can imagine this is not relevant to match a backpack that has a pocket for laptop on the “laptop” query).
The problem with setting the weights manually is that you never know if you have the best configuration or not, and the best configuration actually varies for different query types. So there is no one solution. You are limited by the number of different configurations you can test, each of them requiring a test period. Even worse, it is common to iterate on the weights based on the observation of a small set of queries without checking in advance the global impact of this change.
Instead of setting those weights manually, it is better to give a “hint” to an AI algorithm that will optimize the weight automatically and constantly (because the data are also changing over time). Such a hint is usually a list of attributes ordered from the most important to the least important with potentially some equality. For example, (“Category”, “Brand”, “Color”) > “Name” > “Description” when the business knows that “Category”, “Brand” are clean data reviewed by the business and Name/Description are more generic.
Another common usage of weight is to merge the textual signals and the business signals (score = 𝝈 x TextualScore + (1 – 𝝈 ) * BusinessScore). For example, a weight of 0.5 will give the same importance to the textual score and to the business score and compute a final ordering. This seems very intuitive but creates a lot of relevancy issues. To illustrate the problem, let’s take an extreme example where you want to sort by a business signal. In this case, you totally ignore the textual signals to sort by one business criteria (sorting by increasing price for example). Because the search engine is designed to consider all potential hits, even the ones that are far away from the query, you will end up with results that are very far from the query because they are cheap. This is a problem you find on a lot of websites, often forcing the customer to filter the query to have something a bit more relevant.
The fundamental problem is that the business score is merged in the same way for all results, while some of them are very far from the query and should be eliminated. For the sort, one classical solution is to filter the result set and only keep the good textual scores before applying the sort. In practice, most marketplaces use such an approach. (you can detect such an approach if you’re on a site where the number of results for the “sort by” is lower than the initial search).
For the general problem of merging the Textual Signals and the Business Signals, we have exactly the same problem as in the extreme case of a “sort by”. To be appropriate, it requires to have several buckets of textual relevance and to then apply the sort inside each bucket.
Similar to the problem of weights inside the Textual Score, the best approach is to define the business score and let an algorithm optimize the merging for you depending on the user behavior.
Ranking is a complex merge problem between a lot of signals. It is tempting to merge all of them with weights, at least at a high level between the Textual Signals, Business Signals, and User Signals. In practice setting those weights manually is one order of magnitude more complex than the previous two examples and is a setup for failure.
Depending on the context (the query, the section of the website, etc.), the merge can be different to provide relevant results to the user. The search engine can potentially use weights and a machine learning algorithm to set them, this is fine. But setting those weights manually is never something that delivers the best results.
Weights are more often part of the problem than part of the solution when they are set manually. The next time you will be asked to manually configure a weight, you should think about the danger of such a setting and if there is a way to automatically configure them. Noting that LTR (Learning To Rank) algorithms often use boosted trees and other non-linear ways of finding dependencies and relationships between the ranking signals. Thus the optimal way of setting the weights ends up different for basically every query. For the sort example above, this could learn that the relevance cut off should increase when a hard sort — like the price attribute — is applied.
If you want to learn how Algolia is tackling this challenge, watch this space to learn more.
Powered by Algolia Recommend