Analyzing Log Data
- Each application/system generates logs whenever an event occurs. This consists of rich information about the state & behavior of your application.
- With so much logs, collecting it , extracting a relavent information & analyzing it would be a challenge.
- Logs in all applications/systems will not have the same format.
- Lets look at two different formats of logs
# event log format
<event>
<occured> 7:53 AM 7/21/2020 </occured>
<message> User admin created </message>
<type> INFO </type>
</event>
# text log format
7:53 AM 7/21/2020 org.qt.ecommerce.com INFO user admincreated
- To analyze logs, we need a tool which can do Extracting the logs, Converting/Transforming into some common format and then storing it (ETL) & that is where logstash comes into picture.
- Typical reasons for using logs
- Troubleshooting
- To understand system/application behavior
- Auditing
- Predictive Analysis
- Challenges with Logs
- No common/consistent format
- Logs are decentralized.
- No consitent time formats
- Data is unstructured
Logstash
- Logstash is a popular open source data collection engine with realtime pipelining capabilities. Logstash allows us to easily build a pipeline that can help in collecting data from various sources and parse, enrich, unify and store in wide variety of destinations

- Installation: Refer Here
sudo apt-get install openjdk-8-jdk -y

Architecture
- Logstash event processing pipeline has 3 stages
- Inputs
- Filters
- Outputs

- Inputs & ouputs are required whereas filters are optional.
- This functionality of Inputs, Outputs and Filters in logstash is provided by logstash plugins
- Logstash uses in-memory bound queues between pipeline stages by default (Input to Filter and Filter to Output). To persist this to a file to prevent data loss in cases of failure, in Logstash.yml file
- Change property of queue.type to persisted
- Logstash pipeline format is
input {
<any input plugins>
}
filter {
<any filter plugins>
}
output {
<any output plugins>
}
- Generally we create .conf file as store in Log stash configuration directory
- Lets find the configuration files and logstash binaries (LOGSTASH_HOME) directories

- Lets create a basic pipeline using command line
cd /usr/share/logstash/bin/
sudo ./logstash -e "input {stdin {} } output { stdout{} } "

- Lets create a basic configuration file called as simple.conf
input
{
stdin{}
}
filter {
mutate {
uppercase => ["message" ]
}
}
output {
stdout {
codec => rubydebug
}
}
- Now lets run log stash
sudo ./logstash -f simple.conf

Logstash plugins
- Logstash has rich collection of input,filter,codec and output plugins. Plugins are available as self contained packages called as gems
- If you want verify the list of plugins that are part of installation of logstash
sudo ./logstash-plugin list
sudo ./logstash-plugin list --group input
sudo ./logstash-plugin list --group filter
sudo ./logstash-plugin list --group output
sudo ./logstash-plugin list 'grok'

- If needed plugins can be installed by using
sudo ./logstash-plugin install logstash-output-email

- Plugins can be updated using
sudo ./logstash-plugin update
- Documentation of plugins
