Each application/system generates logs whenever an event occurs. This consists of rich information about the state & behavior of your application.
With so much logs, collecting it , extracting a relavent information & analyzing it would be a challenge.
Logs in all applications/systems will not have the same format.
Lets look at two different formats of logs
# event log format
<event>
<occured> 7:53 AM 7/21/2020 </occured>
<message> User admin created </message>
<type> INFO </type>
</event>
# text log format
7:53 AM 7/21/2020 org.qt.ecommerce.com INFO user admincreated
To analyze logs, we need a tool which can do Extracting the logs, Converting/Transforming into some common format and then storing it (ETL) & that is where logstash comes into picture.
Typical reasons for using logs
Troubleshooting
To understand system/application behavior
Auditing
Predictive Analysis
Challenges with Logs
No common/consistent format
Logs are decentralized.
No consitent time formats
Data is unstructured
Logstash
Logstash is a popular open source data collection engine with realtime pipelining capabilities. Logstash allows us to easily build a pipeline that can help in collecting data from various sources and parse, enrich, unify and store in wide variety of destinations
Inputs & ouputs are required whereas filters are optional.
This functionality of Inputs, Outputs and Filters in logstash is provided by logstash plugins
Logstash uses in-memory bound queues between pipeline stages by default (Input to Filter and Filter to Output). To persist this to a file to prevent data loss in cases of failure, in Logstash.yml file
Logstash has rich collection of input,filter,codec and output plugins. Plugins are available as self contained packages called as gems
If you want verify the list of plugins that are part of installation of logstash
sudo ./logstash-plugin list
sudo ./logstash-plugin list --group input
sudo ./logstash-plugin list --group filter
sudo ./logstash-plugin list --group output
sudo ./logstash-plugin list 'grok'