Analyzing Log Data
- Log Analysis Challenges:
- Logs are defined as records of incidents or observations and are generated by a wide variety of resources such as system, applications, devices and so on
- Log is typically made of two things
- timestamp
- data
- Logs are used for the following reasons
- Troubleshooting
- Auditing
- Predective analytics
- Some challenges with logs are
- No common/consistent format
- Logs are decentralized
- Data is unstructured
Logstash
-
This is very popular open source data collection engine with real-time pipelining capabilities.
-
Logstash has input, filter and output plugins & logstash does the work of ETL Engine by extracting information from input, transforming according to filter and loading the information to output plugins
-
Features:
- Pluggable Data pipeline architecture
- Extensibility
- Centralized data processing
- Variety and Volume
-
Lab Environment
-
We would create two linux vms
- one vm for logstash
- one vm for elastic search and kibana
-
Lets create an ubuntu 20 server and install logstash Refer Here
-
Once you install logstash the
/usr/share/logstashwill be the folder containing executables/etc/logstashwill be folder for configuring logstash
-
Now lets try to run logstash by configuration input => stdin and output is stdout
cd /usr/share/logstash sudo bin/logstash -e "input {stdin {} } output {stdout {} }"- Lets try to create a configuration for logstash datapipeline
- input stdin
- output a file
sudo bin/logstash -e "input {stdin {} } output { file { path => '/tmp/output'} }"- Running the logstash from command line for evey pipeline is not sensible, we would learn how to this as configuration files.
-
