Perspective Unspoken

My perspective on Git, Docker, Python, Django, PHP and other stuff

Building a text completion service with ElasticSearch

ElasticSearch provides a few suggestion endpoints for achieving completion. There is a keyword, phrase and completion suggestion service. In this article, I will cover the function of the Completion Suggester. I’m test driving this service in order to determine if this service meets the following requirements I have:

  • Type-ahead completion – I want users to be able to get suggestions while they are searching for entries in our database.
  • Fuzziness – I want suggestions to be given despite slight misspellings or mistakes in the user’s query.
  • Filtering – the search is to be restricted by user. The information a user searches for is strictly tied to a user, so I need to be able to do filtering based on a given parameter.
  • Weighting – I want to be able to boost some suggestions over others.

The Completion Suggestion service provides services for all of these. So let’s test it out. In order to test out the completion service, I opted to use the name of some Jamaican restaurants and the city / town they belong to. If you’ve never been to Jamaica, you must try the following restaurants when you do. Island Grill makes great authentication Jerk chicken, Juici Patties makes really good Jamaican patties. I’m using three cities / communities for this test… New Kingston, Half Way Tree and Liguanea.

The Completion Suggester can be added as a field in your document. Which means, you can add several of them and you can supply different configurations for each. Only caveat being that it’s expensive, so don’t go overboard with it.

If you want to follow along with me, I recommend using this docker-compose file to get both ElasticSearch and Kibana running. Once you have Docker installed, drop this file in a directory and type docker-compose up . Elasticsearch will be running at http://localhost:9200 and Kibana will be running at http://localhost:5601. Once ElasticSearch and Kibana have both loaded, they’ll probably need a minute or so, you can browse Kibana. Click Dev Tools in the menu on the left hand side. From Dev Tools you can enter queries to be sent to ElasticSearch and see the results immediately.

Type-ahead completion

Let’s start with the first objective, trying to get the service to actually do completions. In order to do this, let’s create a simple document with the name  and city . To add a suggester, I just need to add a field of type completion . See the JSON below to create this simple mapping.

The first line can be used to remove the index if needs be. Then you will need to run the commands here against your ElasticSearch instance to get the data in. You can paste these commands in Kibana, and select them all then hit the Play button to run all the commands at once.

Let’s take a look at how data is submitted to ElasticSearch.

In the above query you can see we are submitting the name of the restaurant “Nyammings” in a city called “Half Way Tree”. More importantly, since we have a completion field called name_suggest , we are submitting the same value. In essence, we are telling ElasticSearch that “Nyammings” is a suggestion.

NB: I’m using the refresh  query string to tell ElasticSearch to make the data immediately available.

Now we can begin doing queries to get suggestions. If I want to get a suggestion for a query starting with “Ny”, I can do that simply by hitting the _search  endpoint with a suggest  query. There used to be a _suggest  endpoint, but that’s deprecated. Let’s see how this is done below:

In the query above, I create a suggest  query. Inside, suggest, you give your query a name. I called mine restaurant_names . Then, we have to specify the completion  details and tell Elasticsearch which field it is to use for providing the suggestions. Finally, our actual search term is our prefix  which I have specified as “Ny”. As expected… we get back “Nyammings” as a suggestion.

In addition to getting back “Nyammings” as a suggestion, you can also see that it returns the _source  document as well. This is great since if you are auto-populating information for a form, you can potentially fill in the rest of the details for the form based on the source document.

Fuzziness

Now on to my next goal. I want to ensure that I can still get suggestions even if the user types a letter or two incorrectly. There is a restaurant called “Juici Patties” for instance. Let’s explore what happens if we do a search for “Juicy”. Here’s the query:

And the results should not be too surprising… it returns no results.

The Completion Suggester supports fuzzy queries. We can add fuzzy parameters to the completion  block of the query. We tell ElasticSearch how many characters of fuzziness we want. Let’s adjust the query below:

Once we start using fuzziness, our query returns as we expect.

 

Filtering

There’s a restaurant called “Island Grill” that exists in several of the cities. There’s one in New Kingston, Half Way and Liguanea. Let’s suppose I want to only filter my search to be in Half Way Tree. Let’s see how this would be done.

Let’s quickly talk about the approach that won’t work. Going at this problem, my first thought was to combine the suggestion with a regular query. For example, something like this below.

See how I used a match  query first, then I placed the suggest  query after. This approach actually runs both as completely separate and parallel queries. The result returns hits  matching all restaurants whose city is “New Kingston”, then the restaurant_names  query returned all restaurants that had a prefix of “Island”.

See a snippet of the results below:

As you can see the first result in hits  is “Chez Marie”, which doesn’t match of prefix of “Island”, and further more, the suggestion result lists Island Grill in Liguanea. We want the Island Grill that’s in New Kingston.

ElasticSearch provides Context Suggesters to do filtering with the Completion Suggesters. I think of it as a way of providing context for your suggestion. Instead of asking Elasticsearch to broadly suggest documents from the index, we want to provide some context to refine the search.

Context suggestions are defined on the suggestion field in the mapping. So we need to adjust our mapping to contain the context. There are two types of context suggesters, “category” and “geo”. For “category” contexts, you provide the category that the suggestions are to be filtered by, where as for geo, you provide some “geographic” reference that the suggestions must be filtered around. For our use case, the “category” works just fine. Let’s adjust the mapping to facilitate this.

So as you can see, I’ve added a contexts  parameter to the definition of the name_suggest  field. We’re essentially adding some configuration to this special completion  field.

Once that is done, we can now start augmenting our search queries with a city  context. We will adjust our regular query to now include a contexts  parameter inside the completion  block of the suggestion. Recall for each completion query, we need to start with a suggest  block, then we give the suggestion a name, then we at the same level, we want to specify a completion  block which tells Elasticsearch which field we’re targeting, then we add the prefix.

Let’s look at the query below.

As seen in the highlights above, I’ve added contexts  to the completion  block of the query. There I’m specifying that I want to filter / contextualize this query by refining suggestions to those documents were the city  is equal to “New Kingston”.  The results are as we expected them to be. We should only be seeing the “Island Grill” restaurant in New Kingston.

If you were watching carefully, you would have noticed that contexts.city 

1
 

 took a list. Meaning, we could filter for several items all at the same time. To filter for “Liguanea” and “New Kingston” at the same time, we could do the following:

 

Weighting

Now on to weighting or boosting. Suppose for some reason I wanted to boost the results in a certain city. Perhaps, the restaurants in a given city are more popular than the others and are more likely to be the results that a user wants. For this query I will use “Island Grill” as an example. Island Grill is in all three cities (Half Way Tree, Liguanea and New Kingston). Let’s say however that New Kingston is the most popular of all the cities and I want to boost results from New Kingston. How would I do that?

At the time of searching, you can specify what contexts are boosted. So, when I specify my search query, for each context that I make, I can tell Elasticsearch how that context should be boosted, if at all. We will need to make a simple change to our query to facilitate this. Instead of providing a list of strings to context.city , we will need to provide a dictionary were we specify the name of the city in the context  key and then use the boost  parameter to dictate how results are boosted.

Let’s look at a query for this in detail:

Looking at the first parameter, you can see I’ve specified that “New Kingston” should be boosted by a factor of 2. After that, I included as just strings, the other cities that I want to be included in the query. When we check the result, we should see that “Island Grill” in New Kingston should be listed first, ahead of Liguanea and Half Way Tree. Let’s take a look.

Sure enough, our query does indeed show that Island Grill in New Kingston appears first in the list.

Another approach that could be taken is to specify the weight  of the suggestion at index time. Using this approach, you can specifically boost certain documents. Let’s say I wanted to boost Island Grill in New Kingston. I could do that at index time by doing the following:

Then, you could do a similar search to confirm that this works. Island Grill in New Kingston will be the top result, since we boosted it at indexing time.

 

In concluding, the Completion Suggestion feature is a simple but useful one. One major drawback is that it only does Prefix search which is a little limiting if you want to let users search by any term in the name, for example searching for “Grill” and expecting to get a suggestion for “Island Grill”. All in all, still a good feature to use. It’s simple and fast.

Hope you enjoyed reading! Share if you did!

 

completionelasticsearchsuggester

jaywhy13 • September 21, 2018


Previous Post

000webhost logo