Elastic Stack – Introduction and Core Concepts

Elastic Stack

  • Is suite of products Elastic Search, Kibana, Beats and Logstash
  • They Reliably and securely taken data into from any source, in any format, then search, analyze and visualize in real time
  • Refer Here for the short history about elastic stack
  • Refer Here for basic overview of Elastic Stack in Monitoring/Logging use case

Elastic Stack Components

Preview

note: This image is from elastic.co blogs

Elastic Search

  • Core of the Elastic Stack.
  • Is used to store and perform analytics on the data.
  • Built on a Apache Lucene

Benefits

  • Schema-less and document-oriented:
    • No restrictions on strict structure, Any Json documents can be stored.
    • A document is mostly equivalent to a record in relational database table.
  • Searching:
    • Elastic Search is superior at full-text searches
    • Full-text search is searching through all the terms of all the documents available in database. (Much like how Google Search Works rather than a database SELECT query)
  • Analytics:
    • Elastic Search supports a wide variety of aggregations for analytics
  • API and Client Libraries:
    • Elastic Search has client libraries in almost all the popular languages
    • Elastic Search has a very Rich REST API which works on http protocol.
  • Horizontal Scaling:
    • Supports easily scaling number of Elastic search nodes.
    • Adding a new node is as easy as creating a new node in same network, with virtually no extra configuration.
  • Fault-tolerant:
    • Clusters can continue running even when there is a failure.
    • In Node failure scenarios, data is replicated to another node in the cluster
    • In Network failure scenarios, new node is elected as master to keep cluster running.

Installation of elastic Search

  • In this Series I would be using an Centos-7 Machine launched in AWS CLoud with 2 vCpus and 8 GB of RAM. The version of Elastic Search will be 7.4. Installation Method would be RPM Based.
  • Create an EC2 machine with Centos 7 AMI and ensure you have atleast 9200 and 22 ports opened in Security Groups.
  • Once the machine is created, ssh into the centos 7 Machine
  • Installation steps are referred from here
  • Execute the following Commands
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.1-x86_64.rpm
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.1-x86_64.rpm.sha512
shasum -a 512 -c elasticsearch-7.4.1-x86_64.rpm.sha512
sudo rpm --install elasticsearch-7.4.1-x86_64.rpm
  • Remember one important stuff, 50% of the system’s memory should be allocated to JVM. For that edit the file /etc/elasticsearch/jvm.options and change -Xms1g to -Xms4g and -Xmx1g to -Xmx4g. Please find the related documentation
-Xms4g
-Xmx4g
  • Now change the node name and the network configurations in /etc/elasticsearch/elasticsearch.yml by setting the following and remove # in the beginning to uncomment the line
    • cluster.name: direct-devops
    • node.name: node1
    • initial_master_nodes: - <private ip of elastic search>
    • network.host: _site_
    • http.port: 9200
    • discovery.seed_hosts: ["127.0.0.1", "[::1]", "<private ip of ec2>"]
  • Find the whole elasticsearch.yml file below
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: direct-devops
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: _site_
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["127.0.0.1", "[::1]", "172.31.23.231"]

#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
  • For more info on Configurations refer here
  • Now execute the following commands to start the elastic search service
sudo systemctl daemon-reload
sudo systemctl restart elasticsearch
  • If the elasticsearch has started successfully, execute the following statements in the ssh terminal and observe the results
curl -XGET '<private ip>:9200/_cluster/health?pretty'
curl -XGET '<private ip>:9200/_cluster/stats?human&pretty'
  • To continue further on understanding Elastic Search, lets install Kibana Console UI.

Kibana

  • As we are already familiar that Kibana is a visualization tool, lets continue to install Kibana

Kibana Installation

  • I will be using Centos 7 ec2 machine with t2.micro (1 vCpu 1 GB RAM)
  • Kibana installation is referred from here
  • Login into the created ec2 machine and execute the following commands on the terminal
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.4.1-x86_64.rpm
shasum -a 512 kibana-7.4.1-x86_64.rpm
sudo rpm --install kibana-7.4.1-x86_64.rpm
  • Now configure Kibana config file from /etc/kibana/kibana.yml change elasticsearch.hosts to ["http://<private ip of elastic search>:9200"] and server.host to "0.0.0.0" . For More info on configuration refer here Find the kibana.yml used by me below
# Kibana is served by a back end server. This setting specifies the port to use.
#server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "0.0.0.0"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URLs of the Elasticsearch instances to use for all your queries.
elasticsearch.hosts: ["http://172.31.23.231:9200"]

# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true

# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
#kibana.index: ".kibana"

# The default application to load.
#kibana.defaultAppId: "home"

# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "kibana"
#elasticsearch.password: "pass"

# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key

# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key

# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]

# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full

# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500

# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000

# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]

# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}

# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000

# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000

# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false

# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid

# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout

# Set the value of this setting to true to suppress all logging output.
#logging.silent: false

# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false

# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false

# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000

# Specifies locale to be used for all localizable strings, dates and number formats.
# Supported languages are the following: English - en , by default , Chinese - zh-CN .
#i18n.locale: "en"

  • Start kibana by executing
sudo systemctl start kibana
  • Now Navigate to http://<public ip kibana>:5601 and you should see the home page Preview

  • We cannot do any visualizations yet, because we don’t have any data so far

  • We can still use the dev tools section to test the elastic search apis. Click on DevTools Sections as shown in the image below and you will be navigated to console Preview

  • We will be using this Console for understanding the Elastic Search

Core Concepts of Elastic Search

  • Relational Databases have rows, columns, tables and schemas where as Elastic Search is a document-oriented store which has Json documents. These Json-documents are organized into different indexes, types.

Document

  • Data in elastic search is stored in the format of JSON
  • Document is very similar to record in Relational Database
  • A Sample Document
{
    "id": 1,
    "name": "khaja",
    "blog": "https://directdevops.blog",
    "organization": "QualityThought",
    "courses": ["AWS", "Azure", "DevOps", "Linux", "Windows", "Python"],
}
  • In addition to field send by the user, Elastic Search stores the internal fields for metadata. These fields are as follows
    • _id: unique identifier of document, just like primary key in a database. This can be auto generated or specified by the user
    • _type: Type of the document
    • _index: index name of the document

Indexes

  • Index is a container that stores and manages documents of single type in Elastic Search.

Types

  • Documents with mostly common set of fields are grouped under one Type.

Relation Between Index, Type and Documents

Preview

Nodes and Cluster

  • Since Elastic Search is a Distributed System, it has nodes and clusters
  • Node is a single server of Elastic Search
  • A cluster is formed by one or more nodes, Every Elastic Search node is always part of cluster.

Shards and replicas

  • And index consists of documents of some type, Shards helps in distributing an index over the cluster.
  • Shards help in dividing documents of single index over multiple nodes.
  • Process of dividing data among shards is Sharding. Sharding is Elastic Searches way of scaling and parallelism.
  • By default every index is configured to have five shards in Elastic Search.
  • You can specify the number of Shards while creating an index
  • Since systems might fail in a cluster replica shards or replicas are created and stored in cluster.
  • Despite the failure, Elastic Search runs due to this feature.
  • Replicas are also support querying

One thought on “Elastic Stack – Introduction and Core Concepts

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About continuous learner

devops & cloud enthusiastic learner