DevOps Classroom Series – 21/Jul/2020

Analyzing Log Data

  • Each application/system generates logs whenever an event occurs. This consists of rich information about the state & behavior of your application.
  • With so much logs, collecting it , extracting a relavent information & analyzing it would be a challenge.
  • Logs in all applications/systems will not have the same format.
  • Lets look at two different formats of logs
# event log format
<event>
  <occured> 7:53 AM 7/21/2020 </occured>
  <message> User admin created </message>
  <type> INFO </type>
  
</event>

# text log format
7:53 AM 7/21/2020 org.qt.ecommerce.com INFO user admincreated 
  • To analyze logs, we need a tool which can do Extracting the logs, Converting/Transforming into some common format and then storing it (ETL) & that is where logstash comes into picture.
  • Typical reasons for using logs
    • Troubleshooting
    • To understand system/application behavior
    • Auditing
    • Predictive Analysis
  • Challenges with Logs
    • No common/consistent format
    • Logs are decentralized.
    • No consitent time formats
    • Data is unstructured

Logstash

  • Logstash is a popular open source data collection engine with realtime pipelining capabilities. Logstash allows us to easily build a pipeline that can help in collecting data from various sources and parse, enrich, unify and store in wide variety of destinations Preview
  • Installation: Refer Here
sudo apt-get install openjdk-8-jdk -y

Preview

Architecture

  • Logstash event processing pipeline has 3 stages
    • Inputs
    • Filters
    • Outputs Preview
  • Inputs & ouputs are required whereas filters are optional.
  • This functionality of Inputs, Outputs and Filters in logstash is provided by logstash plugins
  • Logstash uses in-memory bound queues between pipeline stages by default (Input to Filter and Filter to Output). To persist this to a file to prevent data loss in cases of failure, in Logstash.yml file
    • Change property of queue.type to persisted
  • Logstash pipeline format is
input {
    <any input plugins>
}
filter {
    <any filter plugins>
}
output {
    <any output plugins>
}
  • Generally we create .conf file as store in Log stash configuration directory
  • Lets find the configuration files and logstash binaries (LOGSTASH_HOME) directories Preview
  • Lets create a basic pipeline using command line
cd /usr/share/logstash/bin/
sudo ./logstash -e "input {stdin {} } output { stdout{} } "

Preview

  • Lets create a basic configuration file called as simple.conf
input 
{
    stdin{}
}

filter {
    mutate {
        uppercase => ["message" ]
    }
}

output {
    stdout {
        codec => rubydebug
    }
}
  • Now lets run log stash
sudo ./logstash -f simple.conf

Preview

Logstash plugins

  • Logstash has rich collection of input,filter,codec and output plugins. Plugins are available as self contained packages called as gems
  • If you want verify the list of plugins that are part of installation of logstash
sudo ./logstash-plugin list
sudo ./logstash-plugin list --group input
sudo ./logstash-plugin list --group filter
sudo ./logstash-plugin list --group output
sudo ./logstash-plugin list 'grok'

Preview Preview

  • If needed plugins can be installed by using
sudo ./logstash-plugin install logstash-output-email

Preview

  • Plugins can be updated using
sudo ./logstash-plugin update

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About learningthoughtsadmin