Summaries/Databases/ElasticSearch/Query_DSL.md

---
title: Query_DSL
updated: 2021-05-04 14:58:11Z
created: 2021-05-04 14:58:11Z
---

# Elasticsearch Query DSL

### Queries can be classified into three types
1. Filtering by exact values
2. Searching on analyzed text
3. A combination of the two

Every __document field__ can be classified:

- either as an exact values
- analyzed text (also called full text)

## Exact values
are fields like user_id, date, email_addresses
Querying documents can be done by specifying __filters over exact values__. Whether the document gets returned is a __binary__ yes or no
---

## Analyzed search
__Analyzed text__ is text data like  product_description or email_body

- Querying documents by searching analyzed text returns results based on __relevance__  (score)
- Highly complex operation and involves different __analyzer packages__ depending on the type of text data
- -  The default analyzer package is the _standard analyzer_ which splits text by word boundaries, lowercases and removes punctuation
- less performant than just filtering by exact values

## Expensive queries
1. Lineair scans
   - script queries
2. high up-front
   - fussie queries
    - reqexp queries
    - prefix  queries without index_prefixes
    - wildcard  queries
    - range  queries on text and keyword fields
3. joinig queries

4. Queries on deprecated geo shapes

5. high per-document cost

   - script score queries
   - percolate queries

The execution of such queries can be prevented by setting the value of the `search.allow_expensive_queries` setting to `false` (defaults to `true`).


Queries behave different: **query context** or **filter context**

| Queries         | filters  |
| --------------- | -------- |
| Fuzzy, scoring  | Boolean  |
| Slower          | Faster   |
| not Cachable    | Cachable |


## Scoring queries
By default, Elasticsearch sorts matching search results by **relevance score**, which measures how well each document matches a query. But depends if the query is executed in **query** or **filter** context


## => Query context
“*How well does this document match this query clause?*” The relevance is stored in the **_score** meta_field
Query context is in effect whenever query clause is passed to the query parameter.


## => Filter context
“*Does this document match this query clause?*” Answer is a true of false. No score is calculated == scoring of all documents is 0.

Mostly used for filtering structured data, eq

- Does this timestamp fall in range....
- is the status field set to "text value"

Frequently used filters will be cached

Filter context in effect when filter clause is used

- such as filter or must_not parameters in bool query
- filter parameter ins constant_score query
- filter aggregation

Example
```json
GET /_search
{
  "query": {    <= query context
    "bool": { 	<= query context, together with matches: how well they match documents
      "must": [
        { "match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [ 	<= filter context
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}
```

---

### Difference term vs match
- match : query aplies the same analyzer to the search at the time the data was stored
- term  : does not apply any analyzer, so will look for exactly what is stored in the inverted index

## The Query DSL

Elasticsearch queries are comprised of one or many __Leaf query clauses__. Query clauses can be combined to create other query clauses, called __compound query clauses__. All query clauses have either one of these two formats:

```json
{
  QUERY_CLAUSE: {         // match, match_all, multi_match, term, terms, exists, missing, range, bool
    ARGUMENT: VALUE,
    ARGUMENT: VALUE,...
  }
}

{
  QUERY_CLAUSE: {
    FIELD_NAME: {
      ARGUMENT: VALUE,
      ARGUMENT: VALUE,...
    }
  }
}
```
Query clauses can be __repeatedly nested__ inside other query clauses

```json
{
  QUERY_CLAUSE {
    QUERY_CLAUSE: {
      QUERY_CLAUSE: {
        QUERY_CLAUSE: {
          ARGUMENT: VALUE,
          ARGUMENT: VALUE,...
        }
      }
    }
  }
}
```

## Two type of Query DSL (Leaf and Compound)

### Leaf query clause
Look for a partiqulair value in a particulair field, such as match, term, range queries/
These queries can be used by themselves. Use such as **match**, **term** or **range**.

### Compound query clause
wrap other leaf(s) or compound queries and are used to combine multiple queries in a logical fashion (**bool** or **dis_max**)

Or alter their behaviour (such as **constant_score**)

- bool => must, must-not, should, filter, minimum_should_match

  multiple leaf or compound query clauses

  **must**, **should** => scores combined (), contributes to score

  **must_not**, **filter** => in context filter


  **must** ==> like logical **AND**.

  **should** ==> like logical **OR**.

  You can use the `minimum_should_match` parameter to specify the number or percentage of `should` clauses returned documents *must* match.

  If the `bool` query includes at least one `should` clause and no `must` or `filter` clauses, the default value is `1`. Otherwise, the default value is `0`

  ```json
  POST _search
  {
    "query": {
      "bool" : {
        "must" : {
          "term" : { "user" : "kimchy" }
        },
        "filter": {
          "term" : { "tag" : "tech" }
        },
        "must_not" : {
          "range" : {
            "age" : { "gte" : 10, "lte" : 20 }
          }
        },
        "should" : [
          { "term" : { "tag" : "wow" } },
          { "term" : { "tag" : "elasticsearch" } }
        ],
        "minimum_should_match" : 1,
        "boost" : 1.0
      }
    }
  }
  ```


- boosting query

- constant_score query

- dis_max query

- function_score query


## Match Query Clause

Match query clause is the most generic and commonly used query clause:
- run on a analyzed text field, it performs an analyzed search on the text
- run on an exact value field, it performs a filter
- calculates the score

example:
```json
{ "match": { "description": "Fourier analysis signals processing" }}
{ "match": { "date": "2014-09-01" }}
{ "match": { "visible": true }}
```

## The Match All Query Clause

Returns all documemts
```json
{ "match_all": {} }
```

## Term/Terms Query Clause
The term and terms query clauses are used to **filter** by a exact value fields by single or multiple values, respectively. In the case of multiple values, the logical connection is OR.

```json
{
  "query": {
    "term": { "tag": "math" }
    }
}

{
  "query": {
    "term": { "tag": ["math", "second"] }
    }
}


```

##  Multi Match Query Clause
Is run across multiple fields instead of just one

```json
{ "query": {
  "multi_match": {
    "query": "probability theory",    // value
    "fields": ["title^3", "*body"],    // fields, with wildcard *
                                      // no fields == *
                                      // title 3* more important
    "type":       "best_fields",
    }
  }
}
```
[Other types](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#multi-match-types)
## Exists and Missing Filters Query Clause
- The exists filter checks that documents have a value at a specified field
```json
{
  "query": {
   "exists": {
     "field": "*installCount"   // also with wildcards
   }
}
}
```
- The missing filter checks that documents do not have have a value at a specified field

```json
{
  "missing" : {
    "field" : "title"
  }
}
```

## Range Filter Query Clause
Number and date fields in ranges, using the operators gt gte lt lte
```json
{ "range" : { "age" : { "gt" : 30 } } }

{
  "range": {
    "born" : {
       "gte": "01/01/2012",
       "lte": "2013",
       "format": "dd/MM/yyyy||yyyy"
    }
  }
}
```

## Query in filter context
### No scores are calculated: yes or no
The __query__ parameter indicates query context.
The __bool__ and two __match__ clauses are used in query context, which means that they are used to score how well each document matches.
The __filter__ parameter indicates  	__*filter context*__. Its term and range clauses are used in filter context. They will filter out documents which do not match, but they will	__*not affect the score*__ for matching documents.
__Must__ clause is not required (score == 0.0)

```json
GET /.kibana/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"type" : "ui-metric"}},
        {"match": {"ui-metric.count" : "1"}}
      ],
      "filter": [
        {"range": {"updated_at": {"gte": "2020-04-01"}}}
      ]
    }
  }
}
```

## Bool Query Clause
Are built from other query clauses are called compound query clauses. <sup> Note that compound query clauses can also be comprised of other compound query clauses, allowing for multi-layer nesting <sup>.

The three supported boolean operators are __must__ (and) __must_not__ (not) and __should__ (or)
```json
{
    "bool": {
        "must":     { "term": { "tag":    "math" }},
        "must_not": { "term": { "tag":    "probability"  }},
        "should": [
                    { "term": { "favorite": true   }},
                    { "term": { "unread":  true   }}
        ]
    }
}
```

## Combining Analyzed Search With Filters
Example: query to find all posts by performing an analyzed search for “Probability Theory” but we only want posts with 20 or more upvotes and not those with that tag “frequentist”.
```json
{
   "filtered": {
     "query": { "match": { "body": "Probability Theory" }},
     "filter": {
        "bool": {
          "must": {
            "range":  { "upvotes" : { "gt" : 20 } }
           },
          "must_not": { "term":  { "tag": "frequentist" } }
        }
     }
  }
}
```

[Source: Understanding the Elasticsearch Query DSL](https://medium.com/@User3141592/understanding-the-elasticsearch-query-dsl-ce1d67f1aa5b)