Summaries/Databases/ElasticSearch/Query_DSL.md

9.7 KiB

title updated created
Query_DSL 2021-05-04 14:58:11Z 2021-05-04 14:58:11Z

Elasticsearch Query DSL

Queries can be classified into three types

  1. Filtering by exact values
  2. Searching on analyzed text
  3. A combination of the two

Every document field can be classified:

  • either as an exact values
  • analyzed text (also called full text)

Exact values

are fields like user_id, date, email_addresses Querying documents can be done by specifying filters over exact values. Whether the document gets returned is a binary yes or no

Analyzed text is text data like product_description or email_body

  • Querying documents by searching analyzed text returns results based on relevance (score)
  • Highly complex operation and involves different analyzer packages depending on the type of text data
    • The default analyzer package is the standard analyzer which splits text by word boundaries, lowercases and removes punctuation
  • less performant than just filtering by exact values

Expensive queries

  1. Lineair scans

    • script queries
  2. high up-front

    • fussie queries
    • reqexp queries
    • prefix queries without index_prefixes
    • wildcard queries
    • range queries on text and keyword fields
  3. joinig queries

  4. Queries on deprecated geo shapes

  5. high per-document cost

    • script score queries
    • percolate queries

The execution of such queries can be prevented by setting the value of the search.allow_expensive_queries setting to false (defaults to true).

Queries behave different: query context or filter context

Queries filters
Fuzzy, scoring Boolean
Slower Faster
not Cachable Cachable

Scoring queries

By default, Elasticsearch sorts matching search results by relevance score, which measures how well each document matches a query. But depends if the query is executed in query or filter context

=> Query context

How well does this document match this query clause?” The relevance is stored in the _score meta_field Query context is in effect whenever query clause is passed to the query parameter.

=> Filter context

Does this document match this query clause?” Answer is a true of false. No score is calculated == scoring of all documents is 0.

Mostly used for filtering structured data, eq

  • Does this timestamp fall in range....
  • is the status field set to "text value"

Frequently used filters will be cached

Filter context in effect when filter clause is used

  • such as filter or must_not parameters in bool query
  • filter parameter ins constant_score query
  • filter aggregation

Example

GET /_search
{
  "query": {    <= query context
    "bool": { 	<= query context, together with matches: how well they match documents
      "must": [
        { "match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [ 	<= filter context
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}

Difference term vs match

  • match : query aplies the same analyzer to the search at the time the data was stored
  • term : does not apply any analyzer, so will look for exactly what is stored in the inverted index

The Query DSL

Elasticsearch queries are comprised of one or many Leaf query clauses. Query clauses can be combined to create other query clauses, called compound query clauses. All query clauses have either one of these two formats:

{
  QUERY_CLAUSE: {         // match, match_all, multi_match, term, terms, exists, missing, range, bool
    ARGUMENT: VALUE,
    ARGUMENT: VALUE,...
  }
}

{
  QUERY_CLAUSE: {
    FIELD_NAME: {
      ARGUMENT: VALUE,
      ARGUMENT: VALUE,...
    }
  }
}

Query clauses can be repeatedly nested inside other query clauses

{
  QUERY_CLAUSE {
    QUERY_CLAUSE: {
      QUERY_CLAUSE: {
        QUERY_CLAUSE: {
          ARGUMENT: VALUE,
          ARGUMENT: VALUE,...
        }
      }
    }
  }
}

Two type of Query DSL (Leaf and Compound)

Leaf query clause

Look for a partiqulair value in a particulair field, such as match, term, range queries/ These queries can be used by themselves. Use such as match, term or range.

Compound query clause

wrap other leaf(s) or compound queries and are used to combine multiple queries in a logical fashion (bool or dis_max)

Or alter their behaviour (such as constant_score)

  • bool => must, must-not, should, filter, minimum_should_match

    multiple leaf or compound query clauses

    must, should => scores combined (), contributes to score

    must_not, filter => in context filter

    must ==> like logical AND.

    should ==> like logical OR.

    You can use the minimum_should_match parameter to specify the number or percentage of should clauses returned documents must match.

    If the bool query includes at least one should clause and no must or filter clauses, the default value is 1. Otherwise, the default value is 0

    POST _search
    {
      "query": {
        "bool" : {
          "must" : {
            "term" : { "user" : "kimchy" }
          },
          "filter": {
            "term" : { "tag" : "tech" }
          },
          "must_not" : {
            "range" : {
              "age" : { "gte" : 10, "lte" : 20 }
            }
          },
          "should" : [
            { "term" : { "tag" : "wow" } },
            { "term" : { "tag" : "elasticsearch" } }
          ],
          "minimum_should_match" : 1,
          "boost" : 1.0
        }
      }
    }
    
  • boosting query

  • constant_score query

  • dis_max query

  • function_score query

Match Query Clause

Match query clause is the most generic and commonly used query clause:

  • run on a analyzed text field, it performs an analyzed search on the text
  • run on an exact value field, it performs a filter
  • calculates the score

example:

{ "match": { "description": "Fourier analysis signals processing" }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "visible": true }}

The Match All Query Clause

Returns all documemts

{ "match_all": {} }

Term/Terms Query Clause

The term and terms query clauses are used to filter by a exact value fields by single or multiple values, respectively. In the case of multiple values, the logical connection is OR.

{
  "query": {
    "term": { "tag": "math" }
    }
}

{
  "query": {
    "term": { "tag": ["math", "second"] }
    }
}



Multi Match Query Clause

Is run across multiple fields instead of just one

{ "query": {
  "multi_match": {
    "query": "probability theory",    // value
    "fields": ["title^3", "*body"],    // fields, with wildcard *
                                      // no fields == *
                                      // title 3* more important
    "type":       "best_fields",
    }
  }
}

Other types

Exists and Missing Filters Query Clause

  • The exists filter checks that documents have a value at a specified field
{
  "query": {
   "exists": {
     "field": "*installCount"   // also with wildcards
   }
}
}
  • The missing filter checks that documents do not have have a value at a specified field
{
  "missing" : {
    "field" : "title"
  }
}

Range Filter Query Clause

Number and date fields in ranges, using the operators gt gte lt lte

{ "range" : { "age" : { "gt" : 30 } } }

{ 
  "range": {
    "born" : {
       "gte": "01/01/2012",
       "lte": "2013",
       "format": "dd/MM/yyyy||yyyy"
    }
  }
}

Query in filter context

No scores are calculated: yes or no

The query parameter indicates query context. The bool and two match clauses are used in query context, which means that they are used to score how well each document matches. The filter parameter indicates filter context. Its term and range clauses are used in filter context. They will filter out documents which do not match, but they will not affect the score for matching documents. Must clause is not required (score == 0.0)

GET /.kibana/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"type" : "ui-metric"}},
        {"match": {"ui-metric.count" : "1"}}
      ],
      "filter": [
        {"range": {"updated_at": {"gte": "2020-04-01"}}}
      ]
    }
  }
}

Bool Query Clause

Are built from other query clauses are called compound query clauses. Note that compound query clauses can also be comprised of other compound query clauses, allowing for multi-layer nesting .

The three supported boolean operators are must (and) must_not (not) and should (or)

{
    "bool": {
        "must":     { "term": { "tag":    "math" }},
        "must_not": { "term": { "tag":    "probability"  }},
        "should": [
                    { "term": { "favorite": true   }},
                    { "term": { "unread":  true   }}
        ]
    }
}

Combining Analyzed Search With Filters

Example: query to find all posts by performing an analyzed search for “Probability Theory” but we only want posts with 20 or more upvotes and not those with that tag “frequentist”.

{
   "filtered": {
     "query": { "match": { "body": "Probability Theory" }},
     "filter": { 
        "bool": {
          "must": {
            "range":  { "upvotes" : { "gt" : 20 } } 
           },
          "must_not": { "term":  { "tag": "frequentist" } }
        }
     }
  }
}

Source: Understanding the Elasticsearch Query DSL