9.7 KiB
title | updated | created |
---|---|---|
Query_DSL | 2021-05-04 14:58:11Z | 2021-05-04 14:58:11Z |
Elasticsearch Query DSL
Queries can be classified into three types
- Filtering by exact values
- Searching on analyzed text
- A combination of the two
Every document field can be classified:
- either as an exact values
- analyzed text (also called full text)
Exact values
are fields like user_id, date, email_addresses Querying documents can be done by specifying filters over exact values. Whether the document gets returned is a binary yes or no
Analyzed search
Analyzed text is text data like product_description or email_body
- Querying documents by searching analyzed text returns results based on relevance (score)
- Highly complex operation and involves different analyzer packages depending on the type of text data
-
- The default analyzer package is the standard analyzer which splits text by word boundaries, lowercases and removes punctuation
- less performant than just filtering by exact values
Expensive queries
-
Lineair scans
- script queries
-
high up-front
- fussie queries
- reqexp queries
- prefix queries without index_prefixes
- wildcard queries
- range queries on text and keyword fields
-
joinig queries
-
Queries on deprecated geo shapes
-
high per-document cost
- script score queries
- percolate queries
The execution of such queries can be prevented by setting the value of the search.allow_expensive_queries
setting to false
(defaults to true
).
Queries behave different: query context or filter context
Queries | filters |
---|---|
Fuzzy, scoring | Boolean |
Slower | Faster |
not Cachable | Cachable |
Scoring queries
By default, Elasticsearch sorts matching search results by relevance score, which measures how well each document matches a query. But depends if the query is executed in query or filter context
=> Query context
“How well does this document match this query clause?” The relevance is stored in the _score meta_field Query context is in effect whenever query clause is passed to the query parameter.
=> Filter context
“Does this document match this query clause?” Answer is a true of false. No score is calculated == scoring of all documents is 0.
Mostly used for filtering structured data, eq
- Does this timestamp fall in range....
- is the status field set to "text value"
Frequently used filters will be cached
Filter context in effect when filter clause is used
- such as filter or must_not parameters in bool query
- filter parameter ins constant_score query
- filter aggregation
Example
GET /_search
{
"query": { <= query context
"bool": { <= query context, together with matches: how well they match documents
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [ <= filter context
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
Difference term vs match
- match : query aplies the same analyzer to the search at the time the data was stored
- term : does not apply any analyzer, so will look for exactly what is stored in the inverted index
The Query DSL
Elasticsearch queries are comprised of one or many Leaf query clauses. Query clauses can be combined to create other query clauses, called compound query clauses. All query clauses have either one of these two formats:
{
QUERY_CLAUSE: { // match, match_all, multi_match, term, terms, exists, missing, range, bool
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
{
QUERY_CLAUSE: {
FIELD_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
}
Query clauses can be repeatedly nested inside other query clauses
{
QUERY_CLAUSE {
QUERY_CLAUSE: {
QUERY_CLAUSE: {
QUERY_CLAUSE: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
}
}
}
Two type of Query DSL (Leaf and Compound)
Leaf query clause
Look for a partiqulair value in a particulair field, such as match, term, range queries/ These queries can be used by themselves. Use such as match, term or range.
Compound query clause
wrap other leaf(s) or compound queries and are used to combine multiple queries in a logical fashion (bool or dis_max)
Or alter their behaviour (such as constant_score)
-
bool => must, must-not, should, filter, minimum_should_match
multiple leaf or compound query clauses
must, should => scores combined (), contributes to score
must_not, filter => in context filter
must ==> like logical AND.
should ==> like logical OR.
You can use the
minimum_should_match
parameter to specify the number or percentage ofshould
clauses returned documents must match.If the
bool
query includes at least oneshould
clause and nomust
orfilter
clauses, the default value is1
. Otherwise, the default value is0
POST _search { "query": { "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "filter": { "term" : { "tag" : "tech" } }, "must_not" : { "range" : { "age" : { "gte" : 10, "lte" : 20 } } }, "should" : [ { "term" : { "tag" : "wow" } }, { "term" : { "tag" : "elasticsearch" } } ], "minimum_should_match" : 1, "boost" : 1.0 } } }
-
boosting query
-
constant_score query
-
dis_max query
-
function_score query
Match Query Clause
Match query clause is the most generic and commonly used query clause:
- run on a analyzed text field, it performs an analyzed search on the text
- run on an exact value field, it performs a filter
- calculates the score
example:
{ "match": { "description": "Fourier analysis signals processing" }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "visible": true }}
The Match All Query Clause
Returns all documemts
{ "match_all": {} }
Term/Terms Query Clause
The term and terms query clauses are used to filter by a exact value fields by single or multiple values, respectively. In the case of multiple values, the logical connection is OR.
{
"query": {
"term": { "tag": "math" }
}
}
{
"query": {
"term": { "tag": ["math", "second"] }
}
}
Multi Match Query Clause
Is run across multiple fields instead of just one
{ "query": {
"multi_match": {
"query": "probability theory", // value
"fields": ["title^3", "*body"], // fields, with wildcard *
// no fields == *
// title 3* more important
"type": "best_fields",
}
}
}
Exists and Missing Filters Query Clause
- The exists filter checks that documents have a value at a specified field
{
"query": {
"exists": {
"field": "*installCount" // also with wildcards
}
}
}
- The missing filter checks that documents do not have have a value at a specified field
{
"missing" : {
"field" : "title"
}
}
Range Filter Query Clause
Number and date fields in ranges, using the operators gt gte lt lte
{ "range" : { "age" : { "gt" : 30 } } }
{
"range": {
"born" : {
"gte": "01/01/2012",
"lte": "2013",
"format": "dd/MM/yyyy||yyyy"
}
}
}
Query in filter context
No scores are calculated: yes or no
The query parameter indicates query context. The bool and two match clauses are used in query context, which means that they are used to score how well each document matches. The filter parameter indicates filter context. Its term and range clauses are used in filter context. They will filter out documents which do not match, but they will not affect the score for matching documents. Must clause is not required (score == 0.0)
GET /.kibana/_search
{
"query": {
"bool": {
"must": [
{"match": {"type" : "ui-metric"}},
{"match": {"ui-metric.count" : "1"}}
],
"filter": [
{"range": {"updated_at": {"gte": "2020-04-01"}}}
]
}
}
}
Bool Query Clause
Are built from other query clauses are called compound query clauses. Note that compound query clauses can also be comprised of other compound query clauses, allowing for multi-layer nesting .
The three supported boolean operators are must (and) must_not (not) and should (or)
{
"bool": {
"must": { "term": { "tag": "math" }},
"must_not": { "term": { "tag": "probability" }},
"should": [
{ "term": { "favorite": true }},
{ "term": { "unread": true }}
]
}
}
Combining Analyzed Search With Filters
Example: query to find all posts by performing an analyzed search for “Probability Theory” but we only want posts with 20 or more upvotes and not those with that tag “frequentist”.
{
"filtered": {
"query": { "match": { "body": "Probability Theory" }},
"filter": {
"bool": {
"must": {
"range": { "upvotes" : { "gt" : 20 } }
},
"must_not": { "term": { "tag": "frequentist" } }
}
}
}
}