[ARCHIVED] Filtering, Boosting, and Other Advanced Topics

Miso provides powerful personalized results out of the box, but we realize that sometimes you will want to adjust those results with certain rules and parameters. In this guide, we'll explain all the tools at your disposal to customize the results that you get back from your Miso engines.

When developing these tuning options, our team was informed by previous work at large search & discovery platforms like Alibaba, Yahoo!, Tencent, and Rakuten. We knew that a key feature of any such platform was its flexibility to promote products, merchandize, and develop very specific personalized user experiences. That's why we designed Miso to help you do the same.

How Miso ranks and filters results

First, some background on how Miso works. This is a simplified explanation, but you can think of Miso's algorithms as performing a two-stage process in order to rank a large number of products in real-time. For example, consider the Category Pages recipe that makes a Search API request with q=*.

In the first stage, Miso's engines compute a set of thousands or tens of thousands candidate products based on the parameters of the request. Then, in the selection stage, Miso applies boosting, deep personalization, and other tie-breaking criteria in real time to produce the final ranking. This allows us to speed up a very time-consuming ranking process, because we only need to do these computations over the smaller set of relevant products. At the same time, we maintain a lot of flexibility.

Both the pre-computed ranking and real-time ranking are based on personalization. But the pre-computed ranking is based on a faster personalization algorithm that only considers the data that is available during training time, so that we can pre-compute at scale.

The real-time ranking additionally considers the "context" in which the requests are being made and a full spectrum of other features, for example, the user's most recent interaction, time of day, current product popularity, etc. (That's why it's so important to stream interaction data to Miso.)

Filtering

There may be times when you want Miso to only return a certain subset of results. For instance, you may want to show users the Trending For You products only within a certain category, like shoes or clothing. There are several basic parameters you can use to achieve this.

It's important to note that Miso will automatically filter out items that the user has recently interacted with from its recommendations. However, Miso does not filter out OUT_OF_STOCK items by default.

type: You can use this parameter to make the API return only a certain type of products. For example, a travel site might have rental cars, hotels, and activities and want to restrict the results to a certain kind of product.
dedupe_product_group_id: If this is set to true, Miso will prevent products with the same product_group_id from showing multiple times in the search or recommendation results. This is particularly useful when one product has multiple variants (for example, different sizes, colors, or materials), but you want to show this product only once in the results. Miso will return the variant that is most likely to interest the user.
exclude: An array of product_ids of products you want to exclude from search results.

When you need to go beyond these options, there's the fq parameter, which lets you perform pretty much any advanced type of filtering.

Using the filter query parameter `fq`

Defines a query in Solr syntax that can be used to restrict the superset of products to return, without influencing the overall ranking. fq can enable users to drill down to products with specific features based on different product attributes.

For example, the query below limits the search results to only show products whose size is either M or S and brand is Nike:

{"fq": "size:(\"M\" OR \"S\") AND brand:\"Nike\""}

You can use fq to apply filters against your custom attributes as well. For example, the query below limits the search results to only products whose designer attribute is Calvin Klein.

{"fq": "attributes.designer:\"Calvin Klein\""}

fq can also limit search results by numerical range. For example, the following query limits the results to products that have rating >= 4.

{"fq": "attributes.designer:\"Calvin Klein\""}

Boosting

Boosting lets you promote results to the top of Miso's ranking, or to certain specified positions.

Using `boost_fq`

For any Miso API recipe, you can provide a boost_fq parameter. This lets you use a query in Solr syntax to boost a subset of products to the top of the ranking, or to specific boost positions (See more on boost_positions below.) For example, the query below will promote all the relevant products whose brand is Nike to the top of recommendation list:

{
    "boost_fq": "brand:\"Nike\""
}

For a slightly more complex example, the query below will promote the Nike products which have also been tagged as ON SALE to the top of the ranking:

{
   "boost_fq": "brand:\"Nike\" AND tags:\"ON SALE\""
}

Note: Miso will only boost products that are relevant and have a high likelihood of converting, and will not boost a low-performing product only because it matches the boosting query.

If you want to boost only a static set of products, one common way to achieve this is by flagging those products with a custom attribute. For example, you can set custom_attributes.boost to 1 and then use boost_fq to query for all the products that have this flag set.

Using `boost_positions`

Depending on your boosting rules, in certain cases, you might want to prevent recommendation results from becoming too monotone. You can achieve this by specifying boost_positions to place promoted products at specific positions in the ranking. For example, the query below will place boosted products only at the first and fourth places in the ranking (positions are 0-based), and place the remaining products in their original ranking, skipping these two positions.

{
   "boost_fq": "brand:\"Nike\" AND tags:\"ON SALE\"",
   "boost_positions": [0, 3]
}

To make sure your results aren't overwhelmed by your boosted products, It can be a good idea to use boost_positionstogether with diversification, discussed below.

Diversifying your results

To make sure that you increase product discovery, it can be useful to ensure that there's a good diversity of products shown side-by-side.

The diversification parameter lets you tells Miso's ranking algorithms to try to maintain the desired minimum distance between any two products that have the same attributes (you can specify which attributes). For example, the following query will tell Miso that products with the same brand should be at least two slots apart from each other in the ranked results.

{
   "boost_fq": "tags:\"ON SALE\"",
   "diversification": {
       "brand": {"minimum_distance": 2}
    }
}

Revenue optimization

Miso's default behavior is to optimize for the goals that you set in your engine training. For example, you might want to increase conversions by having your engine suggest products that will lead to more add_to_cart and checkout interactions. For many customers, this is enough — but some Miso customers see even better revenue numbers when we trade off conversion rate slightly against other attributes of products, such as price.

To do revenue optimization against the likelihood of conversion, Miso provides an uprank parameter. For example, if you uprank by price, Miso will not simply rank products by conversion probability, as in the default behavior. Instead, products will be ranked by conversion probability * price to the power of α (alpha), where α is a value between 0 and 1. In essence, this factors price into the ranking to boost products that have a higher price, potentially increasing your average order value. Miso's team will run experiments that find the value for αthat gives you the best revenue returns.

{
  "q": *, 
  "category": ["Ice Cream," "All Pints"]
  "uprank": {
    "fields": ["price"],
    "power": 0.15 // alpha
    }
 }

Another option is to optimize for margin in addition to price, in order to slightly favor products that give you a higher return. This is an advanced feature, and we recommend consulting with your Miso solutions engineer to learn more.

Tuning your rankings

For each call to a Search API or the Product to Products API, Miso returns a _score field that contains up to four numbers. Miso's default ranking algorithm is a four-step process. The numbers in the _score field correspond to the scores that Miso uses to calculate the ranking at each step.

Why are multiple steps required? The reason is that Miso wants to do soft tie-breaking to make results that have relatively close values become a tie. Then, the scores in subsequent steps will take effect to break the tie. This cascading effect of breaking ties allows Miso to not just consider one factor, but balance search relevancy with boosting and personalization. The scores after tie-breaking are shown in the _tie_break_score field.

Search Relevance or _search_score: In the first step, Miso generates a score that rates the degree of "match" between search keywords and a product's catalog, with a focus on the product's titles. This score is mostly based on a variant of BM25, but additionally considers the term proximity, typos, and term semantic similarity. Its value is always larger than 0, but its range is unbounded.

After calculating the _search_score, Miso applies a soft-tie breaking mechanism with a heavier threshold. This means that products with a similar relevancy will be ranked by the boosting score.
Boosting Score or _boost_score: Next, Miso determines whether the product is boosted by your boost_fq query. This score is a binary number and can be used to break ties.
Search Relevance or _search_score: Because the product might still be tied with other results after the first two steps, Miso uses the search relevance score again but with a smaller tie-break threshold to do more fine-grained ranking based on the personalization score.
Personalization Score or _personalization_score: Finally, we estimate the probability that a user will interact with a product based on your Miso Search Engine's personalization models. The range of this score is between [0, 1] and scores are non-uniformly distributed. Products that are relevant to the user's interests will have scores much closer to 1 than products that are not.

The personalization score is used as the final tie-breaker among scores that are still tied in ranking after the first three steps.

Here is an example of what a potential _score and _tie_break_score could look like for your results:

{ 
  "_tie_break_score:" [...] {
    "0": "1",
    "1": "0",
    "2": "5",
    "3": "0.61959601656602987"
  },
{
   "_score:" [...] {
     "0": "43.775826",
     "1": "0",
     "2": "43.775826",
     "3": "0.61959601656602987",
   }
}

Changing the default search result ranking with `order_by`

You can override the behavior described above by using the order_by parameter to specify your own ranking steps and tie-breakers.

For example, the following code tells Miso to use the _personalization_score to rank the products. Then, In the case of soft ties (as explained below), the tie-breaker will be the custom_attributes.promote_score field.

{
   "order_by": [
        {
            "field": "_personalization_score",
            "tie_breaker": {
                "type": "relative_difference",
                "threshold": "0.05"
            },
            "order": "desc"
        },
        {
            "field": "custom_attributes.promote_score",
            "order": "desc"
        }

   ]
}

Using Tie-Breakers

For scores that have granular resolutions, for example _personalization_score,_search_scores, or products' sale_price, we usually want to set a threshold where products are considered close enough to be "tied." After all, a 0.001 difference in _personalization_scoreor $0.01 difference in sale price typically will not make a difference in users' preferences. Instead, you probably want to default to another field to determine the ranking.

That's why we allow you to specify where this threshold is and when a tie_breaker is needed for each field that Miso uses for ranking. For example, in the query above, we would consider two products a "soft tie" for ranking purposes if the relative difference between their _personalization_score is no more than threshold of 0.05 or 5%. When there is such a tie, the next field (i.e. custom_attributes.promote_score) will be used to determine their ranking.

It's also common to set a large tie-breaker threshold when you want to combine the effects of two types of scores. For example, in the following query, we set threshold=0.2 or 20% for _personalization_score. Then, only the products that users are 20% more likely to interact with will be ranked higher, and the remaining products will be ranked by their sale prices. In this way, we combine the effect of personalization score and sale prices, where the products are roughly ranked by personalization, but favor the pricier products when they have comparable personalization scores.

{
   "order_by": [
        {
            "field": "_personalization_score",
            "tie_breaker": {
                "type": "relative_difference",
                "threshold": "0.20"
            },
            "order": "desc"
        },
        {
            "field": "sale_price",
            "order": "desc"
        }

   ]
}

Also note that, when search keywords are present, it is recommended to always include _search_score as the first field (plus a tie-breaker) to maintain the relevance of the search results.

{
   "q": "toy story",
   "order_by": [
       {
            "field": "_search_score",
            "tie_breaker": {
                "type": "relative_difference",
                "threshold": "0.20"
            },
            "order": "desc"
        },
        {
            "field": "_personalization_score",
            "tie_breaker": {
                "type": "relative_difference",
                "threshold": "0.20"
            },
            "order": "desc"
        },
        {
            "field": "sale_price",
            "order": "desc"
        }
   ]
}

Enabling partial matches in search

By default, Miso's Search API only returns products that contain all the keywords in the search query (i.e. an AND operator over keywords). This strategy usually leads to highly relevant results.

However, when we don't have enough search results to return to the users, you can tell the API can relax its criteria by setting enable_partial_match_threshold to an integer value. When the number of products the exact search query matches is lower or equal to that number, Miso will return results that match only some of the keywords. This strategy is particularly useful to avoid users from seeing an empty search results page and abandoning the search.

For example, let's consider the query request below:

{
"query": "Toy story 5",
"enable_partial_match_threshold": 3
}

Since there is no movie called "Toy story 5", we have zero products to return by default. However, because we set enable_partial_match_threshold=3, we will return other products that partially match the query in the partially_matched_products field as follows:

{
"data": {
    "products": [],
    "total": 0,
    "partially_matched_products": [
        {
            "title": "Toy Story",
            "_missing_keywords": ["5"]
         },
        {   
            "title": "Toy story 2",
            "_missing_keywords": ["5"]
        },
        ...
    ]
}
}

As you can see from the result above, when we don't have the exact products the user is looking for, showing partially_matched_products is a decent strategy to let them know what alternatives are available, and prevent them from seeing an empty search results page.

Filtering out low-relevance search results

When Miso returns many results for a query, it's likely that the top results are highly relevant. However, if only a few results are available, they may be low relevance but will still appear at the top of the results page. In this case, you may want to exclude these results and show partial matches instead. You have a few criteria that you can use to determine whether your results might be low-relevance:

products[ ]._matched_fields: By turning on enable_matched_fields in the API request, Miso will tell you which product fields contained tokens matching the search query. This lets you filter out products where the search query only matched the description field, for example.
products[ ]._search_score: If the score is less than -2, it is not likely to be relevant.
Length of products array: If there are only a few results, you may want to exclude low-relevance results and use partial matches instead.

Tuning the search spellcheck

It's possible to enable auto spelling correction in search only when Miso has high confidence about the correction. The confidence will be a function of edit distance between the query and correction + # of search results + query logs.

{"spellcheck": 
 {"enable_auto_spelling_correction": "high"}
}

While spellcheck is turned on by default for all queries, you can also disable spellcheck entirely if needed.

{"spellcheck": 
 {"enable_spellcheck": false}
}

How Miso ranks and filters results

Filtering

Using the filter query parameter fq

Boosting

Using boost_fq

Using boost_positions

Diversifying your results

Revenue optimization

Tuning your rankings

Changing the default search result ranking with order_by

Enabling partial matches in search

Filtering out low-relevance search results

Tuning the search spellcheck

[ARCHIVED] Filtering, Boosting, and Other Advanced TopicsWIP

How Miso ranks and filters results

Filtering

Using the filter query parameter `fq`

Boosting

Using `boost_fq`

Using `boost_positions`

Diversifying your results

Revenue optimization

Tuning your rankings

Changing the default search result ranking with `order_by`

Using Tie-Breakers

Enabling partial matches in search

Filtering out low-relevance search results

Tuning the search spellcheck

Frequently asked questions