Elasticsearch releases always include many features and improvements. You review all the features in the official release guide. The most interesting features introduced from our perspective are the new Natural Language Processing (NLP) features.
Elasticsearch NLP “The fun part!”
NLP as a term describes methods and techniques that allow software to understand natural language in text or audio. The Elasticsearch machine learning features are based on BERT and transformer models that align to the standard BERT model interface.For practical purposes, in Elasticsearch, we use ML models to facilitate NLP. These collections of models allow us to preform text processing to enrich text-based content, making it more robust and useful. This enrichment allows search requests to better understand strings of text allowing a search experience, for example, to interpret a user’s intent providing a better experience. Constructing and training models is a topic for a separate article, so for the sake of this one, we’ll assume that you already have an existing model in place. We’ll walk through how the new Elasticsearch interface capabilities can be used to store and leverage your model in the search solution. The first key is that most of these capabilities are applied at index time. This means the document’s processed and enriched, adding additional metadata information that was ‘inferred’ based on input coming from the same document and a pre-trained model during the ingestion process. For our example enrichment, we’ll utilize some common NLP tasks in Elastic:
- Language detection
- Extract named entities
- Phrase prediction
Language detection
In this simple example, before indexing a document, the language detection model enriches the document based on what the model inferred from the document – in this case, the content of the document title. Understanding the language of a document is a simple but powerful capability. You could, for example, use specific language mappings ‘automagically’ based on the output of this model to provide more precise and meaningful results to your users. You know what’s the best part? This model is available by default in Elasticsearch!
[Insert Screenshot Here]
Using Language detection in Elasticsearch
To use the model, navigate to Dev Tools and in Kibana run the following query:
This POST runs the string through the model and, as you can see below, adds many fields to our ‘document’.
[Insert Screen Shot]
Extracting named entities (NER)
According to Wikipedia, named-entity recognition or NER “is a subtask of information extraction that seeks to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.” This classification attempts to extract words from a selection of text into proper names or numerical entities.
Using NER in Elasticsearch
To use named-entity recognition in Elasticsearch we need to load one of the many supported 3rd party model. Good news is that the process of loading models is straightforward.
1. Install Eland client to load models into Elasticsearch
2. Push the model to elasticsearch
3. Replace URL with yours in format user:password@url
Sit tight while the models are pushed to your cluster. After it is pushed, you can just try the model.
[Insert Screen Shot]
As you can see above, the model has again added additional fields to our document, this time describing the classification. Pretty cool!
Mask Filling
Mask Filling or masked language modeling is an ML task of masking some words in a sentence and predicting which words should replace those masks. Mask filling can be very helpful when you need a statistical understanding of texted-based data, and it can be applied to domain-specific content, such as a large corpus of research papers.
Using Mask Filling in Elasticsearch
Since we already have eland in place, we just need to rerun it with a different model and task-type.
Mask filling is personally one of my favorite techniques to use because you never know what the algorithm will predict. Let’s take a closer look by executing this line:
What do you think the answer will be?
[Insert Screen Shot]
As you can see, the mask was replaced by ‘business.’ Not too bad out of the box!
[Insert Screen Shot]
Conclusion
In this article, we covered some of the more interesting NLP capabilities in Elasticsearch, along with a few demonstrations of how you can use them in Elasticsearch.