Automating log, trace and metric collection in ec2 for AWS
- While creating infra necessary for your application, we create infrastructure as code (IAC) in terraform/cloudwatch
- This configuration has to be done from IAC
Solution (Terraform):
To automate log management in EC2 to Amazon CloudWatch using Terraform, you need to set up the following:
- IAM Role for EC2: Attach a policy to allow the instance to send logs to CloudWatch.
- CloudWatch Log Group: Create a log group where the logs will be stored.
- CloudWatch Agent Configuration: Configure the CloudWatch agent to collect logs and send them to the log group.
- Install and Configure CloudWatch Agent on EC2: Automate this using user data.
- Here’s a Terraform script to accomplish this:
Terraform Script
provider "aws" {
region = "us-east-1" # Specify your region
}
# IAM Role for EC2
resource "aws_iam_role" "ec2_cloudwatch_role" {
name = "ec2-cloudwatch-role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Action = "sts:AssumeRole",
Effect = "Allow",
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
}
# IAM Policy to Allow Logs to CloudWatch
resource "aws_iam_policy" "cloudwatch_policy" {
name = "cloudwatch-log-policy"
description = "Allow EC2 to write logs to CloudWatch"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
],
Effect = "Allow",
Resource = "*"
}
]
})
}
# Attach the Policy to the Role
resource "aws_iam_role_policy_attachment" "attach_cloudwatch_policy" {
role = aws_iam_role.ec2_cloudwatch_role.name
policy_arn = aws_iam_policy.cloudwatch_policy.arn
}
# EC2 Instance Profile
resource "aws_iam_instance_profile" "ec2_instance_profile" {
name = "ec2-instance-profile"
role = aws_iam_role.ec2_cloudwatch_role.name
}
# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "app_logs" {
name = "/ec2/app-logs"
retention_in_days = 7
}
# Security Group for EC2
resource "aws_security_group" "ec2_sg" {
name_prefix = "ec2-sg"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# User Data Script for CloudWatch Agent
data "template_file" "user_data" {
template = <<EOT
#!/bin/bash
yum install -y amazon-cloudwatch-agent
cat <<EOF > /opt/aws/amazon-cloudwatch-agent/bin/config.json
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/messages",
"log_group_name": "${aws_cloudwatch_log_group.app_logs.name}",
"log_stream_name": "{instance_id}/messages",
"timestamp_format": "%b %d %H:%M:%S"
}
]
}
}
}
}
EOF
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a start -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -m ec2
EOT
}
# EC2 Instance
resource "aws_instance" "ec2_instance" {
ami = "ami-0c02fb55956c7d316" # Replace with your AMI
instance_type = "t2.micro"
iam_instance_profile = aws_iam_instance_profile.ec2_instance_profile.name
security_groups = [aws_security_group.ec2_sg.name]
user_data = data.template_file.user_data.rendered
tags = {
Name = "EC2-CloudWatch-Logs"
}
}
Steps to Execute
- Save the script in a .tf file.
- Run terraform init to initialize Terraform.
- Run terraform plan to review the planned changes.
- Run terraform apply to create the resources.
Explanation of Key Components
- IAM Role and Policy:
• Allows the EC2 instance to push logs to CloudWatch.
- CloudWatch Log Group:
• Acts as a destination for logs.
- User Data Script:
• Installs and configures the CloudWatch agent to send logs.
- EC2 Instance:
• Launches an instance with the required IAM role and the CloudWatch agent pre-configured.
Idea
- All the ec2 based deployment will have agent installed with necessary json to send logs, metrics and traces to the cloud watch
Azure Insights
- Watch the classroom recording for
- Azure insights
- VM Insights:
- Process INformation
- Detailed information about os,process, network etc
- Application Insights:
- Along with all the generic information we do get traces
- Container Insights:
- This gives tracing information for containers in AKS
- As of today, AWS supports
- container insights
- Lambda insights
- Cloudwatch insights
Troubleshooting EC2 instances
- Once the ec2 instance is launched AWS performs status checks
- What are the status checks actually checking in ec2 ?
- System Status Checks:
- Hardware failures
- when they occur the resolution is to restart the instance or raise AWS support ticket if it is reocurring.
- Instance Status checks:
- Unable to boot
- firewall issues
- resolutions:
- check the file /etc/fstab
- check the network firewall rules
- System Status Checks:
- AWS supports serial console which is a low level interface with boot logs, connecting to this is like connecting to boot
Troubleshooting Azure VMs
- For basics check resource health
- Azure VM serial console is enabled by default, If Azure vm is in pending state, connect to serial console and run low level commnands
- Serial log is also enabled by default, OS logs/diagnostics can be enabled.
