Summaries/Databases/ElasticSearch/General.md

1.9 KiB
Raw Blame History

Application: ELK Stack

  • Elasticsearch - distributed NoSQL database
  • Logstash - ingests streams of activity data
  • Kibana - Visualisation / Dashboard

Fundamentals concepts

Source: architecture

The act of storing data in Elasticsearch is called indexing.

An index is a collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. Every index has some properties like mappings, settings, and aliases.

In Elasticsearch, a document belongs to a type, and those types live inside an index. We can draw a parallel to a traditional relational database:

Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields

In Elasticsearch, the term document has a specific meaning. It refers to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.

Elasticsearch lets you insert documents without a predefined schema (in RDBMS you need to define tables in advance).

Inverted index

Relational databases add an index, such as a B-tree index, to specific columns in order to improve the speed of data retrieval. Elasticsearch use a structure called an inverted index for exactly the same purpose.

By default, every field in a document is indexed (has an inverted index) and thus is searchable FullText search. A field without an inverted index is not searchable.

An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.

Summary

  • Elasticsearch gives us Google-like features
    • Scalable ingest / data size / search performance
    • Accessible through a "REST API"
  • Can be used as a full-text "search engine"
  • Can be used as a scalable NoSQL database