Scenario
- Our Organization (LearningThoughts) has an application deployed which is used by customers
- The Application high level architecture is as shown below
- The customers of our product are
- Learning Thoughts DevOps Team Deploys the application into Customer Environments (Azure, AWS, GCP, On-premises)
- Failures & Troubleshooting: How can LT handle Failures and troubleshoot the errors
- Initial Responses
- To fix the issues or to troubleshoot, we need to understand what is happening or where the things have went wrong.
- Every application logs some information, so this is the place to start.
- But if the logs are locally created in each server, it will be difficult to manage. It is better if we have some kind of centralize logs where all the applications write the log data to the centralized location
- Refer Here for sample apache logs
- It would be good if the logs can be queried and analyzed to find some patterns
- Visualizations will be great.
- Basic Server Monitoring => Whether the server is up or not
- Log Analysis and Server Monitoring is what learning thoughts need to troubleshoot errors.
- Understanding usage patterns to come up with new set of features to make better user experience.