Elastic Stack
- Is suite of products Elastic Search, Kibana, Beats and Logstash
- They Reliably and securely taken data into from any source, in any format, then search, analyze and visualize in real time
- Refer Here for the short history about elastic stack
- Refer Here for basic overview of Elastic Stack in Monitoring/Logging use case
Elastic Stack Components
note: This image is from elastic.co blogs
Elastic Search
- Core of the Elastic Stack.
- Is used to store and perform analytics on the data.
- Built on a Apache Lucene
Benefits
- Schema-less and document-oriented:
- No restrictions on strict structure, Any Json documents can be stored.
- A document is mostly equivalent to a record in relational database table.
- Searching:
- Elastic Search is superior at full-text searches
- Full-text search is searching through all the terms of all the documents available in database. (Much like how Google Search Works rather than a database SELECT query)
- Analytics:
- Elastic Search supports a wide variety of aggregations for analytics
- API and Client Libraries:
- Elastic Search has client libraries in almost all the popular languages
- Elastic Search has a very Rich REST API which works on http protocol.
- Horizontal Scaling:
- Supports easily scaling number of Elastic search nodes.
- Adding a new node is as easy as creating a new node in same network, with virtually no extra configuration.
- Fault-tolerant:
- Clusters can continue running even when there is a failure.
- In Node failure scenarios, data is replicated to another node in the cluster
- In Network failure scenarios, new node is elected as master to keep cluster running.
Installation of elastic Search
- In this Series I would be using an Centos-7 Machine launched in AWS CLoud with 2 vCpus and 8 GB of RAM. The version of Elastic Search will be 7.4. Installation Method would be RPM Based.
- Create an EC2 machine with Centos 7 AMI and ensure you have atleast 9200 and 22 ports opened in Security Groups.
- Once the machine is created, ssh into the centos 7 Machine
- Installation steps are referred from here
- Execute the following Commands
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.1-x86_64.rpm
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.1-x86_64.rpm.sha512
shasum -a 512 -c elasticsearch-7.4.1-x86_64.rpm.sha512
sudo rpm --install elasticsearch-7.4.1-x86_64.rpm
- Remember one important stuff, 50% of the system’s memory should be allocated to JVM. For that edit the file /etc/elasticsearch/jvm.options and change -Xms1g to -Xms4g and -Xmx1g to -Xmx4g. Please find the related documentation
-Xms4g
-Xmx4g
- Now change the node name and the network configurations in /etc/elasticsearch/elasticsearch.yml by setting the following and remove # in the beginning to uncomment the line
cluster.name: direct-devops
node.name: node1
initial_master_nodes: - <private ip of elastic search>
network.host: _site_
http.port: 9200
discovery.seed_hosts: ["127.0.0.1", "[::1]", "<private ip of ec2>"]
- Find the whole elasticsearch.yml file below
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: direct-devops
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: _site_
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["127.0.0.1", "[::1]", "172.31.23.231"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
- For more info on Configurations refer here
- Now execute the following commands to start the elastic search service
sudo systemctl daemon-reload
sudo systemctl restart elasticsearch
- If the elasticsearch has started successfully, execute the following statements in the ssh terminal and observe the results
curl -XGET '<private ip>:9200/_cluster/health?pretty'
curl -XGET '<private ip>:9200/_cluster/stats?human&pretty'
- To continue further on understanding Elastic Search, lets install Kibana Console UI.
Kibana
- As we are already familiar that Kibana is a visualization tool, lets continue to install Kibana
Kibana Installation
- I will be using Centos 7 ec2 machine with t2.micro (1 vCpu 1 GB RAM)
- Kibana installation is referred from here
- Login into the created ec2 machine and execute the following commands on the terminal
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.4.1-x86_64.rpm
shasum -a 512 kibana-7.4.1-x86_64.rpm
sudo rpm --install kibana-7.4.1-x86_64.rpm
- Now configure Kibana config file from /etc/kibana/kibana.yml change elasticsearch.hosts to
["http://<private ip of elastic search>:9200"]
and server.host to "0.0.0.0" . For More info on configuration refer here Find the kibana.yml used by me below
# Kibana is served by a back end server. This setting specifies the port to use.
#server.port: 5601
# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "0.0.0.0"
# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""
# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false
# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576
# The Kibana server's name. This is used for display purposes.
#server.name: "your-hostname"
# The URLs of the Elasticsearch instances to use for all your queries.
elasticsearch.hosts: ["http://172.31.23.231:9200"]
# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true
# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
#kibana.index: ".kibana"
# The default application to load.
#kibana.defaultAppId: "home"
# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "kibana"
#elasticsearch.password: "pass"
# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key
# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]
# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full
# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500
# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000
# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]
# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}
# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000
# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000
# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false
# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid
# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout
# Set the value of this setting to true to suppress all logging output.
#logging.silent: false
# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false
# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false
# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000
# Specifies locale to be used for all localizable strings, dates and number formats.
# Supported languages are the following: English - en , by default , Chinese - zh-CN .
#i18n.locale: "en"
- Start kibana by executing
sudo systemctl start kibana
-
Now Navigate to
http://<public ip kibana>:5601
and you should see the home page -
We cannot do any visualizations yet, because we don’t have any data so far
-
We can still use the dev tools section to test the elastic search apis. Click on DevTools Sections as shown in the image below and you will be navigated to console
-
We will be using this Console for understanding the Elastic Search
Core Concepts of Elastic Search
- Relational Databases have rows, columns, tables and schemas where as Elastic Search is a document-oriented store which has Json documents. These Json-documents are organized into different indexes, types.
Document
- Data in elastic search is stored in the format of JSON
- Document is very similar to record in Relational Database
- A Sample Document
{
"id": 1,
"name": "khaja",
"blog": "https://directdevops.blog",
"organization": "QualityThought",
"courses": ["AWS", "Azure", "DevOps", "Linux", "Windows", "Python"],
}
- In addition to field send by the user, Elastic Search stores the internal fields for metadata. These fields are as follows
_id:
unique identifier of document, just like primary key in a database. This can be auto generated or specified by the user_type
: Type of the document_index
: index name of the document
Indexes
- Index is a container that stores and manages documents of single type in Elastic Search.
Types
- Documents with mostly common set of fields are grouped under one Type.
Relation Between Index, Type and Documents
Nodes and Cluster
- Since Elastic Search is a Distributed System, it has nodes and clusters
- Node is a single server of Elastic Search
- A cluster is formed by one or more nodes, Every Elastic Search node is always part of cluster.
Shards and replicas
- And index consists of documents of some type, Shards helps in distributing an index over the cluster.
- Shards help in dividing documents of single index over multiple nodes.
- Process of dividing data among shards is Sharding. Sharding is Elastic Searches way of scaling and parallelism.
- By default every index is configured to have five shards in Elastic Search.
- You can specify the number of Shards while creating an index
- Since systems might fail in a cluster replica shards or replicas are created and stored in cluster.
- Despite the failure, Elastic Search runs due to this feature.
- Replicas are also support querying
One thought on “Elastic Stack – Introduction and Core Concepts”