Visual Regression Testing with AWS CloudWatch Canaries

Visual Regression Testing with AWS CloudWatch Canaries

In today’s digital era, ensuring your website is consistently up and performing well is not just an option—it's essential.

For those using AWS, CloudWatch Synthetic Canaries offer a smart way to monitor any site or API. This helps in troubleshooting and maintaining the health of your site.

In this article, we'll explore how to deploy an application using SST Ion and monitor it using AWS Synthetic Canaries.

As always, this article comes with a repository that is ready to be deployed into your own AWS account so you can explore and learn.

The Application Setup

The application is pretty straightforward. It's a simple web app built with Astro.

Diagram illustrating a user interacting with an AWS account, showing data flow through CloudFront, S3, and CloudWatch services.

We’re using SST Ion, a flexible infrastructure as code (IaC) framework that's perfect for deploying modern web application frameworks.

SST Ion simplifies deploying cloud applications, allowing developers to focus more on code and less on infrastructure management.

SST vs. SST Ion: The new engine of SST is Ion. It's build on top of Pulumi and Terraform. This makes it easy for us to extend the functionality and build own "stacks". You can read more here.

For our example app, which showcases how Canaries work at AWS, you can check out our GitHub repository.

Here’s a quick overview of how to deploy this site using SST:

  1. Clone the Repository: Start by cloning the repository to your local machine.

     git clone https://github.com/awsfundamentals-hq/cloudwatch-synthetics-visual-regression-tests
     cd cloudwatch-synthetics-visual-regression-tests
    
  2. Install Dependencies: Ensure SST Ion is installed and install all the dependencies for the app itself.

     curl -fsSL https://ion.sst.dev/install | bash
     npm install
    
     # alternatively, you can also use a package manager like homebrew
     brew install sst/tap/sst
     brew upgrade sst
    
  3. Deploy the Application: Use SST to deploy your application to AWS.

     sst deploy
    
     # Note: Keep in mind, to connect to the right AWS CLI Profile
    

The deployment process will do all the necessary steps to build and deploy the application to AWS by using S3 and CloudFront. Besides that, our script will deploy and start the monitoring of canaries in your AWS account.

After deploying, SST will provide a URL where your site is hosted. This site serves as a perfect candidate for our monitoring tutorial using AWS Synthetic Canaries.

Monitoring with AWS Synthetic Canaries

Now, let’s talk about monitoring. AWS Synthetic Canaries are not just for checking APIs but are incredibly effective for monitoring websites.

This is because canaries can collect screenshots and compare them with each other.

The script is very simple:

const synthetics = require('Synthetics');
const log = require('SyntheticsLogger');
const config = synthetics.getConfiguration();

const title = 'visual-regression-check';

config.disableStepScreenshots();
config.withVisualCompareWithBaseRun(true);
config.withVisualVarianceThresholdPercentage(0);

const takeScreenshot = async () => {
  await synthetics.executeStep(title, async () => {
    let page = await synthetics.getPage();
    await page.goto(process.env.SITE_URL, { waitUntil: 'networkidle0' });
    await new Promise((r) => setTimeout(r, 2500));
    let pageTitle = await page.title();
    log.info('Page title: ' + pageTitle);
    await synthetics.takeScreenshot(title, 'loaded');
  });
};

exports.handler = async () => {
  await takeScreenshot();
};

The only thing the script needs to do is:

  1. Enable the visual comparison mode by setting the necessary configuration flags. (visualCompareWithBaseRun and visualVarianceThresholdPercentage).

  2. Go to the site passed via the environment variable SITE_URL.

  3. Wait for it to fully load (waitUntil: networkidle0).

  4. Take a screenshot.

If you jump to the CloudWatch console, you'll find our new canary in the section Application Signals > Synthetic Canaries.

Screenshot of a CloudWatch Synthetics Overview dashboard displaying the status of canary runs, with a pie chart showing one passed test and a timeline graph indicating test results over time. The sidebar highlights "Application Signals" and "Synthetics Canaries" options.

The first run will collect a baseline and store it in S3. Further runs will then compare this initial screenshot to the current one.

By default, after deploying with SST, the canaries will start to run. The first run will be triggered immediately, but you can always start or stop the canary in the console via the Actions dropdown.

A screenshot showing a success message for starting the canary visual-regressions in a CloudWatch Synthetics Canaries interface, with options to view in service map, perform actions, and set auto-refresh timing.

If a run has been completed, we'll see this in the Availability tab.

Screenshot of a software interface showing a "Canary runs" graph with 100% availability at 7:14 AM on May 3, 2024, and a section indicating "No issues" found in the selected time range.

There we'll also find the Screenshots tab. This will show us the screenshots (in our case just one) that the canary has collected.

Screenshot of a webpage discussing "Canaries" as automated scripts in cloud monitoring, with a diagram explaining the process and a navigation bar showing a "Step passed" notification for a visual regression check.

If this was the first run, this first screenshot will be set as the baseline. Each of our further runs will be compared with this screenshot to detect any (unwanted) changes.

Let's test exactly that by changing our frontends content. We can do this for example by changing our content's max-width to a much smaller value:

main {
  /* 200px instead of 800px */
  max-width: 200px;
  margin: auto;
  padding: 20px;
}

With this, we'll trigger the comparison to fail. So let's deploy our changes with sst deploy to AWS and wait for our next run.

After the run has finished, we should see the failure.

A screenshot of a monitoring dashboard showing "Canary runs" with a graph displaying data points indicating pass and fail statuses over time. A highlighted issue box reports a "Visual variation of 13.21% detected for screenshot" with a timestamp of May 3, 2024, at 7:46 AM.

CloudWatch has detected the change. Another look into the screenshots tab shows us the expected result. CloudWatch also provides us with another screenshot, showing the actual differences highlighted in yellow:

A screenshot showing a user interface for a visual regression check tool, indicating a step failure due to a 13.21% visual variance. The interface includes details about the test, links, and a timestamp of the last update.

In our example change, the actual difference was 13.21%. The threshold at which CloudWatch starts to fail the check can be set via the configuration property visualVarianceThresholdPercentage. In our example, we went with 0 to fail at any change. In a real-world scenario, this value should be higher as very small layout adjustments shouldn't result in a failed regression test.

Continuing with our failed check: what if this is exactly what we want it to be?

If we don't return our page to the look and content it had in the first screenshot that CloudWatch took, we'll get a failure for every run that has to come.

But we can also tell CloudWatch, that this should now be the new baseline to compare against.

Let's click on Actions > Edit and scroll down to the Visual Monitoring section.

A screenshot of a web interface titled "visual-regressions" showing various sections including a summary of the latest run marked as "Failed", issues in a selected time range, and a success percentage. A dropdown menu with options like Start, Stop, Edit, Delete, and Clone is highlighted on the right side of the screen.

Here, we can enable CloudWatch to use the next run's screenshot to overwrite the existing base image.

A screenshot of a user interface for visual monitoring with options to edit baseline screenshots, set next run as new baseline, and export excluded regions.

By that, future runs won't end in failures and the next runs should be green again.

Line graph showing the results of canary runs over time, with most data points in blue indicating "Passed" and one red point at 8:00 AM indicating "Failed".

Conclusion

By using AWS Synthetic Canaries, you’re not just just simply monitoring the uptime of your site. You can actively check for the expected look and feel. This helps to quickly spot and address visual issues in any web application.

This visual regression feature even comes with a blueprint, making it easy to get started.