Amazon CloudWatch Agent - Collecting Metrics on EC2

Amazon CloudWatch Agent - Collecting Metrics on EC2

ยท

15 min read

The Amazon CloudWatch Agent is a lightweight and flexible monitoring agent provided by Amazon Web Services (AWS) that allows you to collect and publish system-level metrics, logs, and custom metrics from your EC2 instances to Amazon CloudWatch.

This agent simplifies the process of monitoring your infrastructure and applications running on EC2 instances, providing you with valuable insights and enabling you to take proactive actions to optimize performance and troubleshoot issues.

The article explains the important role of the CloudWatch agent for monitoring EC2 instances, and how to install and configure it on Amazon Linux 2. It also covers advanced features, best practices, and optimization techniques for using the agent.

Introduction

The Amazon CloudWatch agent is essential for monitoring and optimizing your EC2 instances. By installing and configuring the agent, you gain the ability to collect and publish system-level metrics, logs, and custom metrics to Amazon CloudWatch.

This allows you to gain valuable insights into the health and performance of your infrastructure, enabling you to proactively manage and troubleshoot issues.

Importance of Monitoring and Collecting Metrics

Without the CloudWatch Agent, you would miss out on real-time visibility into system-level metrics, logs, and custom metrics.

This means you would lack the ability to identify performance issues, optimize resource allocation, troubleshoot problems efficiently, and centralize logs for easy analysis. The agent also enables the collection of custom metrics, providing deeper insights into application behavior. Overall, not using the CloudWatch agent would result in a lack of comprehensive monitoring capabilities, hindering your ability to ensure the smooth operation of your EC2 instances.

Benefits of Using Amazon Cloudwatch Agent

With the CloudWatch agent, you can easily monitor CPU utilization, memory usage, disk space, network utilization, and more. Additionally, you can centralize your logs in CloudWatch Logs, making it easier to search, analyze, and troubleshoot issues across your entire infrastructure.

The agent also supports the collection of custom metrics, giving you the flexibility to monitor application-specific metrics. By leveraging the CloudWatch agent, you can optimize performance, identify bottlenecks, set alarms, create dashboards, and integrate with other AWS services, ensuring the smooth operation of your EC2 instances.

Installation and Configuration

Using Amazon Linux 2, you can easily install the CloudWatch agent package using the package manager.

Prerequisites for Installing Amazon Cloudwatch Agent

Make sure that the IAM role attached to the instance has the CloudWatchAgentServerPolicy attached. This permission is required to use the Amazon CloudWatch Agent on your servers.

If you're not too familiar with AWS AIM, have a read of our in-depth guides for roles and policies.

Step-By-Step Installation Guide

Here are the steps to install the CloudWatch agent:

  1. Connect to your Amazon Linux 2 instance using SSH or any other remote access method.

  2. Update the package manager's cache by running the following command:

     sudo yum update -y
    
  3. Install the CloudWatch agent package by running the following command:

     sudo yum install -y amazon-cloudwatch-agent
    
  4. Once the installation is complete, you can configure the CloudWatch agent to collect and send metrics and logs to CloudWatch. The configuration file for the agent is located at /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json.

  5. Edit the configuration file using a text editor of your choice. You can specify the metrics and logs you want to collect, as well as the destination in CloudWatch where you want to send them.

  6. Save the changes to the configuration file.

  7. Start the CloudWatch agent by running the following command:

     sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -s
    

    This command fetches the configuration from the specified file and starts the CloudWatch agent.

  8. Verify that the CloudWatch agent is running by checking its status:

     sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
         -m ec2 -a status
    

    If the agent is running successfully, you should see a message indicating its status.

That's it! You have successfully installed and configured the CloudWatch agent on your Amazon Linux 2 instance. Based on your configuration, It will start collecting and sending metrics and logs to CloudWatch.

For other operating systems like Windows Server: To install the CloudWatch agent on operating systems such as Windows Server, there are multiple methods available. You can choose to download and install it using the command line with an Amazon S3 download link, leverage Amazon EC2 Systems Manager, or utilize an AWS CloudFormation template.

Configuration Options and Customization

The Amazon CloudWatch agent provides various configuration options and customizations to tailor the monitoring experience for your EC2 instances. Here are some key features:

  1. Metrics Collection: You can configure the agent to collect system-level metrics such as CPU utilization, memory usage, disk space, network utilization, and more. Additionally, you can define and collect custom metrics specific to your applications.

  2. Logs Collection: The agent allows you to specify log files or log groups to collect and send to CloudWatch Logs. You can customize log file paths, patterns, and filters to capture specific log data.

  3. Log Streaming: You can configure the agent to stream logs directly to CloudWatch Logs or an intermediary service like Amazon Kinesis Data Firehose for further processing or archival.

  4. Metric Filters: The agent supports metric filters, which allow you to define rules to extract specific data from logs and create custom metrics based on that data.

  5. Alarms and Notifications: You can set up alarms based on metric thresholds and configure notifications to trigger actions when specific conditions are met. This enables proactive monitoring and alerting for critical events.

  6. Configuration File: The agent uses a configuration file that allows you to define settings such as metrics, logs, log file paths, log patterns, and more. You can modify this file to customize the agent's behavior.

These configuration options and customizations provide flexibility in tailoring the CloudWatch agent to your specific monitoring requirements, ensuring you capture the right metrics and logs for effective analysis and troubleshooting.

Collecting Metrics and Logs

To successfully run the CloudWatch agent on a server, it is essential to create a configuration file specifically for the agent. This file will contain all the necessary settings and parameters required for the agent to function properly.

The CloudWatch Agent Configuration File

The JSON file used for agent configuration contains specifications for everything the agent is responsible for collecting. This includes:

  • default and custom metrics

  • logs

  • traces

There are two ways to create this file

  • using a wizard or

  • creating it manually

The wizard provides a user-friendly interface for creating the configuration file, while manual creation allows for more control over the collected metrics and the ability to specify metrics not available through the wizard.

If you choose to create or modify the file manually, the process becomes more complex but grants greater flexibility. It is recommended to initially use the wizard to create the configuration file and then make manual modifications as needed.

After making changes to the agent configuration file, it is necessary to restart the agent for the changes to take effect.

When using the wizard, you simply need to run via the following command:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

It will guide you through the process and ask a few questions, including:

  1. Are you installing the agent on an EC2 instance or on-premises?

  2. Is the server running Linux or Windows Server?

  3. Do you want to send log files to CloudWatch Logs? If yes, do you have a configuration file?

  4. What retention period do you want for log files?

  5. Do you want to monitor default metrics or customize the list?

The wizard will automatically detect credentials in your credentials file (~/.aws/credentials). It will specifically search for a profile named AmazonCloudWatchAgent, but you can choose which credentials to use.

After completing the wizard, the configuration file can be found at /opt/aws/amazon-cloudwatch-agent/etc/config.json on Amazon Linux. If you've created in manually, it may have another name.

Starting the Agent

To start the CloudWatch agent on an Amazon EC2 instance using the command line, use the following command:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
    -a fetch-config -m ec2 -s \
    -c /opt/aws/amazon-cloudwatch-agent/etc/config.json

You can replace the configuration file path with the actual file name you used for your configuration parameter.

Collecting Metrics

By default, the CloudWatch agent collects a lot of metrics you can use out of the box. This includes, but is not limited to:

MetricDescriptionUnit
cpu_time_activeTime the CPU is active in any capacity, measured in hundredths of a second.None
cpu_time_guestTime the CPU is running a virtual CPU for a guest OS, measured in hundredths of a second.None
cpu_time_idleTime the CPU is idle, measured in hundredths of a second.None
cpu_time_iowaitTime the CPU is waiting for I/O operations to complete, measured in hundredths of a second.None
cpu_time_systemTime the CPU is in system mode, measured in hundredths of a second.None
cpu_usage_activePercentage of time that the CPU is active in any capacity.Percent
cpu_usage_idlePercentage of time that the CPU is idle.Percent
cpu_usage_systemPercentage of time that the CPU is in system mode.Percent
disk_freeFree space on the disks.Bytes
disk_usedUsed space on the disks.Bytes

These metrics are crucial for monitoring the performance of a system. They provide insights into the CPU's activity, its idle time, the system mode, and the disk usage.

All of these metrics are visually displayed in the CloudWatch metrics section. You can also create your own Amazon CloudWatch dashboard and add custom visualizations based on your most important metrics.

We are not confined to predefined metrics; we can also develop custom metrics, designed specifically for any particular use case.

Setting up Log Collection

In your CloudWatch agent configuration file, you can specify which logs should be forwarded to CloudWatch.

As an example, we'll have a look at the following configuration:

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "root"
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/logs/application.log",
            "log_group_name": "/aws/ec2/my-app/application",
            "log_stream_name": "{instance_id}",
            "filters": [
              {
                "type": "include",
                "expression": "(.*)"
              }
            ]
          },
          {
            "file_path": "/var/log/httpd/access_log",
            "log_group_name": "/aws/ec2/my-app/access-errors",
            "log_stream_name": "{instance_id}",
            "filters": [
              {
                "type": "include",
                "expression": "(\\\"\\s(4|5)\\d{2})"
              }
            ]
          }
        ]
      }
    }
  }
}

Let's have a look at the collect_list sections, which defines an array of log files to be collected. Each log file has the following properties:

  • file_path - The path to the log file on the instance.

  • log_group_name - The name of the log group in CloudWatch Logs where the logs will be stored.

  • log_stream_name - The name of the log stream in CloudWatch Logs where the logs will be stored. {instance_id} is a placeholder that will be replaced with the actual instance ID.

  • filters - This is an array of filters to apply to the log data. In this example, there is one filter specified for each log file. The type can be set to include, indicating that only log entries matching the filter expression should be included in the log stream, or exclude they are excluded. The expression is a regular expression that defines the filter criteria.

With this configuration, we are collecting data from two log files: application.log, which is generated by our application, and access_log, which is provided by httpd and displays successful and unsuccessful HTTP requests.

All of these logs will now be forwarded to CloudWatch, allowing us to browse them whenever necessary. Using CloudWatch Insights, we can also employ sophisticated filters to locate pertinent information or extract fine-grained data when we need it.

Filtering and Transforming Log Data

If we look at the example, we already see how we can filter for specific data by making use of multiple filters for each of our log files and properly using include and exclude. We see this when looking at the filter for the access logs: The expression property is a regular expression that defines the filter criteria which is (\"\s(4|5)\d{2}).

This matches log entries with HTTP status codes starting with 4 or 5. This means that log entries with HTTP status codes indicating client or server errors will be included into the log stream.

By using these filters and expressions, you can selectively include or exclude log entries based on specific patterns or criteria. This allows you to filter out irrelevant or sensitive data and focus on the log entries that are most important for monitoring and analysis.

Additionally, you can use regular expressions to transform the log data before sending it to CloudWatch Logs, such as extracting specific fields or modifying the format of the log entries.

Advanced Features

Additional features can significantly improve your experience with the CloudWatch Agent. Two noteworthy aspects include high-resolution metrics and the ability to utilize AWS Systems Manager for storing configuration files.

Enabling High-Resolution Metrics

The Amazon CloudWatch Agent provides the ability to collect high-resolution metrics from your EC2 instances and on-premises servers. By default, CloudWatch collects metrics at a five-minute interval, but with the CloudWatch Agent, you can enable high-resolution metrics at a one-minute interval. This allows you to monitor your resources with greater granularity and detect any performance issues more quickly.

Enabling high-resolution metrics is a straightforward process. You can configure the CloudWatch Agent to collect metrics at a one-minute interval by modifying the agent's configuration file. Once enabled, the agent will start collecting metrics at the desired resolution and send them to CloudWatch for further analysis and visualization.

Managing Agent Configurations with AWS Systems Manager

Managing the configurations of multiple CloudWatch Agents can be a challenging task, especially when dealing with a large number of instances. AWS Systems Manager provides a solution to this problem by offering a centralized management approach for agent configurations.

With AWS Systems Manager, you can create and manage parameter documents that define the desired configurations for your CloudWatch Agents. These documents can be versioned, allowing you to easily roll back to previous configurations if needed. You can also apply these configurations to multiple instances simultaneously, ensuring consistency across your environment.

Furthermore, AWS Systems Manager provides a user-friendly interface for managing agent configurations, making it easy to update, deploy, and monitor changes. You can also schedule configuration updates to occur at specific times, reducing the impact on your resources during peak usage periods.

Best Practices and Optimization

Simply forwarding logs and gathering metrics won't be beneficial unless you utilize them.

Setting up Alarms and Notifications

Setting up alarms and notifications is a crucial aspect of monitoring your resources effectively with the Amazon CloudWatch Agent. Alarms allow you to define thresholds for specific metrics and trigger notifications when those thresholds are breached. This enables you to proactively respond to any performance issues or anomalies in your environment.

To set up alarms and notifications, you can use the CloudWatch console or the AWS Command Line Interface (CLI). You can define the conditions for triggering an alarm, such as CPU utilization exceeding a certain percentage or disk space utilization reaching a specific threshold. Once an alarm is triggered, you can configure notifications to be sent via email, SMS, or other supported channels.

By setting up alarms and notifications, you can stay informed about the health and performance of your resources in real-time. This allows you to take immediate action and resolve any issues before they impact your applications or users.

Implementing Automated Scaling Based on Metrics

Automated scaling based on metrics is a powerful feature provided by the Amazon CloudWatch Agent. It allows you to dynamically adjust the capacity of your resources based on predefined thresholds, ensuring optimal performance and cost efficiency.

With automated scaling, you can define scaling policies that specify how your resources should scale in response to changes in specific metrics. For example, you can configure a scaling policy to add additional instances when CPU utilization exceeds a certain threshold and remove instances when CPU utilization drops below another threshold.

To implement automated scaling, you can use AWS Auto Scaling, which integrates seamlessly with CloudWatch. AWS Auto Scaling allows you to define scaling policies and target tracking configurations that automatically adjust the number of instances based on the desired metric values.

By implementing automated scaling, you can ensure that your resources are always right-sized to handle the workload efficiently. This not only improves the performance of your applications but also helps optimize costs by scaling resources up or down as needed.

Important Disclaimer: Keep an Eye on Data Ingestion and Retention

CloudWatch can become expensive very quickly. Setting proper retention periods and avoiding excessive data ingestion in CloudWatch is important primarily to control costs.

  • Log Retention: CloudWatch charges for the amount of data ingested and stored. By setting appropriate retention periods, you can ensure that you retain logs for the necessary duration without incurring unnecessary costs for retaining logs that are no longer needed. This helps optimize your spending on CloudWatch.

  • Data Volume: The more data you ingest into CloudWatch, the higher the costs will be. By avoiding excessive data ingestion, you can prevent unnecessary charges from accumulating. This is particularly important if you have a high volume of logs or if your applications generate a large amount of data.

  • Data Filtering: Not all logs and data have the same level of importance or need to be retained for the same duration. By setting proper retention periods, you can filter out less critical logs and avoid storing them for an extended period. This reduces the amount of data being ingested and stored, resulting in cost savings.

Conclusion

In summary, the Amazon CloudWatch Agent is an essential tool for monitoring and optimizing your EC2 instances. By installing and configuring the agent, you can collect system-level metrics, logs, and custom metrics, providing valuable insights into your infrastructure's health and performance.

With features such as high-resolution metrics, alarms, notifications, and automated scaling, the CloudWatch Agent enables you to proactively manage and troubleshoot issues, ensuring the smooth operation of your EC2 instances.

Further Resources for CloudWatch

We're highly interested in CloudWatch, as we believe it's an essential service alongside AWS IAM, being a part of every application. Frequently, it is overlooked and undervalued.

If you'd like to learn more about CloudWatch, check out the related articles on our blog.

Frequently Asked Questions

  1. What is the Amazon CloudWatch Agent?
    The Amazon CloudWatch Agent is a monitoring agent provided by AWS that collects and publishes system-level metrics, logs, and custom metrics from EC2 instances to Amazon CloudWatch.

  2. How do I install the CloudWatch Agent on Amazon Linux 2?
    Connect to your Amazon Linux 2 instance, update the package manager's cache, install the CloudWatch agent package, configure the agent using the configuration file, and start the agent using the provided commands.

  3. Can I collect custom metrics with the CloudWatch Agent?
    Yes, the CloudWatch Agent allows you to define and collect custom metrics specific to your applications, providing deeper insights into application behavior.

  4. How do I set up log collection with the CloudWatch Agent?
    In the agent's configuration file, specify which logs should be forwarded to CloudWatch by defining log files, log group names, log stream names, and filters for log data.

  5. What is automated scaling based on metrics?
    Automated scaling based on metrics is a feature that allows you to dynamically adjust the capacity of your resources based on predefined thresholds, ensuring optimal performance and cost efficiency. This can be achieved using AWS Auto Scaling integrated with CloudWatch.

ย