Tutorial: Semantic search with ELSER
Perform semantic search using ELSER, an NLP model trained by Elastic.
Elastic Learned Sparse EncodeR - or ELSER - is an NLP model trained by Elastic that enables you to perform semantic search by using sparse vector representation. Instead of literal matching on search terms, semantic search retrieves results based on the intent and the contextual meaning of a search query.
The instructions in this tutorial shows you how to use ELSER to perform semantic search on your data.
Note
Only the first 512 extracted tokens per field are considered during semantic search with ELSER. Refer to this page for more information.
Requirements
To perform semantic search by using ELSER, you must have the NLP model deployed in your cluster. Refer to the ELSER documentation to learn how to download and deploy the model.
Note
The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in Elasticsearch Service if deployment autoscaling is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself.
Create the index mapping
First, the mapping of the destination index - the index that contains the tokens
that the model created based on your text - must be created. The destination
index must have a field with the
sparse_vector
or rank_features
field
type to index the ELSER output.
Note
ELSER output must be ingested into a field with the sparse_vector
or
rank_features
field type. Otherwise, Elasticsearch interprets the token-weight pairs as
a massive amount of fields in a document. If you get an error similar to this
"Limit of total fields [1000] has been exceeded while adding new fields"
then
the ELSER output field is not mapped properly and it has a field type different
than sparse_vector
or rank_features
.
curl -X PUT "${ES_URL}/my-index" \
-H "Authorization: ApiKey ${API_KEY}" \
-H "Content-Type: application/json" \
-d'
{
"mappings": {
"properties": {
"content_embedding": {
"type": "sparse_vector"
},
"content": {
"type": "text"
}
}
}
}
'
To learn how to optimize space, refer to the Saving disk space by excluding the ELSER tokens from document source section.
Create an ingest pipeline with an inference processor
Create an ingest pipeline with an inference processor to use ELSER to infer against the data that is being ingested in the pipeline.
curl -X PUT "${ES_URL}/_ingest/pipeline/elser-v2-test" \
-H "Authorization: ApiKey ${API_KEY}" \
-H "Content-Type: application/json" \
-d'
{
"processors": [
{
"inference": {
"model_id": ".elser_model_2",
"input_output": [ <1>
{
"input_field": "content",
"output_field": "content_embedding"
}
]
}
}
]
}
'
Load data
In this step, you load the data that you later use in the inference ingest pipeline to extract tokens from it.
Use the msmarco-passagetest2019-top1000
data set, which is a subset of the MS
MARCO Passage Ranking data set. It consists of 200 queries, each accompanied by
a list of relevant text passages. All unique passages, along with their IDs,
have been extracted from that data set and compiled into a
tsv file.
Download the file and upload it to your cluster using the
Data Visualizer
in the Machine Learning UI. Assign the name id
to the first column and content
to
the second column. The index name is test-data
. Once the upload is complete,
you can see an index named test-data
with 182469 documents.
Ingest the data through the inference ingest pipeline
Create the tokens from the text by reindexing the data throught the inference pipeline that uses ELSER as the inference model.
curl -X POST "${ES_URL}/_reindex?wait_for_completion=false" \
-H "Authorization: ApiKey ${API_KEY}" \
-H "Content-Type: application/json" \
-d'
{
"source": {
"index": "test-data",
"size": 50
},
"dest": {
"index": "my-index",
"pipeline": "elser-v2-test"
}
}
'
The call returns a task ID to monitor the progress:
curl -X GET "${ES_URL}/_tasks/<task_id>" \
-H "Authorization: ApiKey ${API_KEY}" \
You can also open the Trained Models UI, select the Pipelines tab under ELSER to follow the progress.
Semantic search by using the text_expansion
query
To perform semantic search, use the text_expansion
query, and provide the
query text and the ELSER model ID. The example below uses the query text "How to
avoid muscle soreness after running?", the content_embedding
field contains
the generated ELSER output:
curl -X GET "${ES_URL}/my-index/_search" \
-H "Authorization: ApiKey ${API_KEY}" \
-H "Content-Type: application/json" \
-d'
{
"query":{
"text_expansion":{
"content_embedding":{
"model_id":".elser_model_2",
"model_text":"How to avoid muscle soreness after running?"
}
}
}
}
'
The result is the top 10 documents that are closest in meaning to your query
text from the my-index
index sorted by their relevancy. The result also
contains the extracted tokens for each of the relevant search results with their
weights.
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 26.199875,
"hits": [
{
"_index": "my-index",
"_id": "FPr9HYsBag9jXmT8lEpI",
"_score": 26.199875,
"_source": {
"content_embedding": {
"muscular": 0.2821541,
"bleeding": 0.37929374,
"foods": 1.1718726,
"delayed": 1.2112266,
"cure": 0.6848574,
"during": 0.5886185,
"fighting": 0.35022718,
"rid": 0.2752442,
"soon": 0.2967024,
"leg": 0.37649947,
"preparation": 0.32974035,
"advance": 0.09652356,
(...)
},
"id": 1713868,
"model_id": ".elser_model_2",
"content": "For example, if you go for a run, you will mostly use the muscles in your lower body. Give yourself 2 days to rest those muscles so they have a chance to heal before you exercise them again. Not giving your muscles enough time to rest can cause muscle damage, rather than muscle development."
}
},
(...)
]
}
To learn about optimizing your text_expansion
query, refer to
Optimizing the search performance of the text_expansion query.
Combining semantic search with other queries
You can combine text_expansion
with other queries in a
compound query. For example using a filter clause in a
Boolean query or a full text query which may or may not use the same
query text as the text_expansion
query. This enables you to combine the search
results from both queries.
The search hits from the text_expansion
query tend to score higher than other
Elasticsearch queries. Those scores can be regularized by increasing or decreasing the
relevance scores of each query by using the boost
parameter. Recall on the
text_expansion
query can be high where there is a long tail of less relevant
results. Use the min_score
parameter to prune those less relevant documents.
curl -X GET "${ES_URL}/my-index/_search" \
-H "Authorization: ApiKey ${API_KEY}" \
-H "Content-Type: application/json" \
-d'
{
"query": {
"bool": {
"should": [
{
"text_expansion": {
"content_embedding": {
"model_text": "How to avoid muscle soreness after running?",
"model_id": ".elser_model_2",
"boost": 1
}
}
},
{
"query_string": {
"query": "toxins",
"boost": 4
}
}
]
}
},
"min_score": 10
}
'
Optimizing performance
Saving disk space by excluding the ELSER tokens from document source
The tokens generated by ELSER must be indexed for use in the text_expansion query. However, it is not necessary to retain those terms in the document source. You can save disk space by using the source exclude mapping to remove the ELSER terms from the document source.
Warning
Reindex uses the document source to populate the destination index. Once the ELSER terms have been excluded from the source, they cannot be recovered through reindexing. Excluding the tokens from the source is a space-saving optimsation that should only be applied if you are certain that reindexing will not be required in the future! It's important to carefully consider this trade-off and make sure that excluding the ELSER terms from the source aligns with your specific requirements and use case.
The mapping that excludes content_embedding
from the _source
field can be
created by the following API call:
curl -X PUT "${ES_URL}/my-index" \
-H "Authorization: ApiKey ${API_KEY}" \
-H "Content-Type: application/json" \
-d'
{
"mappings": {
"_source": {
"excludes": [
"content_embedding"
]
},
"properties": {
"content_embedding": {
"type": "sparse_vector"
},
"content": {
"type": "text"
}
}
}
}
'
Further reading
- How to download and deploy ELSER
- ELSER limitation
- Improving information retrieval in the Elastic Stack: Introducing Elastic Learned Sparse Encoder, our new retrieval model
Interactive example
- The
elasticsearch-labs
repo has an interactive example of running ELSER-powered semantic search using the Elasticsearch Python client.