Elasticsearch tips – Inconsistent search results

We experienced that executing the same query multiple times resulted in different responses.

How could this happen? bloqpost_monitoring_elasticsearch_at_scale_2_1400x700-1

A bit of theory

For results with identical score, the order of the elements can differ – depending on how the interrogated node arranges them. This is expected, unless you use a preference parameter, which forces the requests to be executed for the same user on the same resource. This provides consistency from the user’s point of view.

It is much more inconvenient, when the scores are different, depending on which of the nodes the request ends up on, and whether a replica or a primary shard gets interrogated. In this case we have a primary-replica shard inconsistency. This has happened to us.

In a lucky case, the cluster is in warning state, so it lets us know in advance, that there is a problem – but this wasn’t the case for us. Let’s go deeper now, and look at the reasons behind.

Things we could compare:

  • number of documents in a shard and its replica
  • deleted docs in a shard and its replica
  • max_docs (total number of documents) in a shard and its replica

 How did this hit us?

In our case, the cluster was green, because the number of documents in both the primary shard and its replica were the same.

However max_docs (= docs + deleted docs) differed.

Deleted documents were different in the primary and its replica, and since deleted documents contribute to the inverse document frequency score calculation, this lead to different scores.

The original cluster health endpoint does not cover this case, which prevented us in noticing the replication problem in advance. However, such a check is easy to implement by comparing this number for all of the nodes and replicas inside the cluster.

How did we fix it?

There isn’t an easy way to force a healthy looking replica to resynchronise. We found two options, we could go for.

1. Rerouting or reallocation

Force reallocation of a shard (in our case a replica). While reallocation is done, a resynch will also be applied, which will fix the corrupted replica.

However /reroute is not an allowed operation for Amazon Elasticsearch service, so we had to pick a more harsh solution.

2. Setting replicas to zero

It’s not the most elegant solution we could think of, but it solved our problems. The number of replicas can be modified at any time, even after an index creation. A possible way to fix the corrupt replicas is to set the number of replicas to zero, wait for the changes to apply, then reset the number of replicas to one (or the number of replicas you want to have).

This just worked. It’s annoying though, that AES did not complain while the replicas (delete operations) were not correctly indexed.


Elasticsearch tips – Poor result relevance

In the recent months I’be been developing a Search API – using Amazon Elasticsearch Service in the background.

Elasticsearch is a well known, widely used, and well documented product. It was very simple to get it run. However, we encountered some interesting behaviour, which determined us to dig deeper.

My first posts will cover two problems we had to face: poor result relevance and inconsistent results. Our findings and solutions can be useful for any of you interested in Elasticsearch, because the topics don’t focus on the Amazon ES implementation.

1-1232907563i7woAnd now, let’s get into the details.

Poor result relevance

After the first iteration of our implementation (we mostly went on with the basic ES cluster settings and default mappings) we realised that our free text search results are not relevant enough.

When searching for “London” for example, we would have received a lot of organisations with “London” in their names, but the location “London” was not in the first 50 results.

How could this happen?

A bit of theory

Elasticsearch calculates relevance score for free text search based on some field statistics, such as: term frequency, inverse document frequency and field-length norm.

  • Term frequency describes how frequently a term appears inside a text: the more often, the more relevant the document is.
  • Inverse document frequency balances the above metric, following the idea that the more often a term appears in all the documents, the less relevant is for our specific search. This means, that if we have a term which is present in all the documents, it’s most probably a general one, which wouldn’t have an added value for our search, so it’s less relevant in scoring.
  • Field length norm attributes a higher score for matches inside a short field, than for matches found in a longer one.

In order to better understand the impact of the above scoring mechanism, we must understand how Elasticsearch splits its content between shards and replicas.

  • An element of the content inside an index is called a document.
  • Documents are split over multiple nodes (physical units).
  • A shard is a collection of documents, that makes the distribution of the data possible all over the nodes.
  • A replica is a copy of a primary shard.
  • On a single node there can be multiple shards (both primaries and replicas).


And now, let’s get back to our metrics. The inverse document frequency is calculated over the documents inside a shard – and not over all the data inside an index. With longer text fields, and a big volume of total documents: the results should be balanced. 

How did this hit us?

The discrepancy for the “London” case was caused exactly by the above mentioned shard/document distribution (check poor document relevance). It happened that the location term “London” got into a shard, where there were numerous other documents with “London” in their names. In the other shards, there were only a few documents with “London”, so they ended up with higher scores, and came up as being more relevant in the final result set.

How did we fix it?

It is very important not to have more shards than required for an index. Some advice about shard optimisation is described in this article.

The main ideas here are:

  • max shard size should be between 30-32 GB
  • total number of shards has to be between 1.5-3 x no. of nodes
  • in our case we have 5 shards/index for 64 GB – a bit high, but should do it

We realized that we cannot fully fix the accuracy by fixing the shard configs, so we decided to double the score for exact matches against the partial ones, using Query boosting.


This was only one example from my exciting Elasticsearch journey. The Amazon experience was also quit pleasant. Maybe I will add a post about that some day.