In large-scale automation, a flaky test refers to a test that fails randomly. Simply put, if an application under testing passes and fails the same test, it is known as a flaky test. In such cases, the test results appear to be randomly different without any pattern.
So, if you get a continuous ‘pass’ and then a random fail or multiple fails and vice versa, it is a flaky test condition. Moreover, no changes were made to the code base in all the test attempts, which is why it became abnormal.
Large-scale automation testing has further improved the process, making it faster and more comprehensive. However, there are some scenarios where your test automation can behave weirdly, and the reason could be test flakiness. As per Google, it takes around 1.5 times longer to fix a flaky test than a non-flaky one.
The causes of test flakiness can vary, ranging from concurrency issues to test environment instability. However, recognizing flaky tests in the early stages is essential for maintaining the health of your CI/CD pipeline.
Why is the Flaky Test a Problem?
Let’s say you run a test, and it fails, and you look into the code for changes. However, you find nothing, so you give it another try and re-run it without changes. This time, out of nowhere, the test passes. Now, upon multiple tests, you get to understand that, yes, it’s a flaky test condition, but the problem is how you will know if the code base is actually passing or failing. That’s where the problem lies.
According to a study by Microsoft, it costs them more than a million dollars worth of developer time to fix flaky tests each year. In test automation, the ultimate goal is to test the code base in every way possible. So, whether it’s the developer, manager, or client, test automation gives confidence to pass it on.
In some cases, you might keep trying until the test passes and then deliver it to the client. However, if the software actually has an issue, it could harm both you and the company.
How to Identify Flaky Tests?
The most common sign to determine whether you are dealing with a flaky test is you are getting different testing results without any major changes to the code. Flaky tests are inconsistence, like sometimes they will pass, and sometimes they will fail, for the same codebase. This is different from failing tests, as they will consistently fail due to an issue in the code, environment, or the test itself.
So, if you are getting alternate test results, then this is a major sign that you have got flaky tests to deal with. There are some other signs as well, like:
- Tests that rely on external dependencies, like databases, services, or networks, may exhibit intermittent failures. This indicates flakiness due to external dependencies.
- Sometimes, tests may fail only under specific timing conditions or when they are executed in a certain order, like faster or slower execution times. This also indicates test flakiness.
- Another type of test flakiness is called branch flakiness. In this, the feature branch passes individually; however, as soon as it merges with the main branch, the test ultimately fails.
What are the Causes for Flaky Tests?
Well, let’s come to the causes behind a flaky test. This is important because a survey suggests that around 50% of the developers and testers experience flaky tests at least once a week. Here are some common causes for the test flakiness:
1. Data Hardcoding
It is quite a common cause of test flakiness, where failure happens due to data hardcoding. Let’s say there’s a test automation that includes signing up, followed by the other steps. In this case, if you have hard coded the data, it may fail as using the same username for the signup every time will trigger the application under testing to reject the required. The reason is that a similar username has already been created.
Moreover, there can be other such scenarios as well. For example, in e-commerce test automation, check-out tests are quite common. In such tests, the goal is to check out an item successfully. However, if the same is added to the cart every time, it may fail after some time as soon as the item count reaches zero. In such cases, you must make sure that you do not code data hard in tests.
2. Isolation of Test Resources in Parallel Testing
Parallel tests are run to speed up large-scale test automation. In such cases, the resources of an application under testing are used for multiple tests simultaneously. While this is a great way to complete testing faster, resource allocation can be an issue when performing parallel executions.
A few tests are running simultaneously, and the resources are being shared. For example, if a Login action needs to be performed, the clickable button should be available on time. However, if one test reaches the stage where it needs to click the button and another test arrives at the same stage, it can cause one or more tests to fail due to resource unavailability.
Here, automatic waiting is the best solution. Automatic waiting conditions like “page.WaitForSelector()” let you specify the item to wait for and the state it should be in before the test proceeds.
3. Dynamic Content Management
Dynamic content, or the content that often changes on web applications due to the use of dynamic elements, can also cause test flakiness. What happens in such scenarios is that traditional locator strategies look for the exact same element every time. So, in many cases, the exact same element can be located, but when the element changes a little, being a dynamic element, the test fails. But since there’s no concern about such issues, you can rely on the results where they passed.
However, you can also use other locator strategies where a subtle change in the element doesn’t affect the proceeding. For example, if an element changes from “Buy” to “Buy Now,” the test will continue by matching the “Buy” in both elements, or you can use static elements as well.
4. State Clean-Up Before the Next Test
In any test automation, the last step or the ending step of the test should be state clean-up. It cleans the state of the previous test and loads a completely new one. It is important to clean the state, which includes clearing the local storage of the browser, cookies etc, before starting the next test.
These are some of the common reasons why flaky tests occur. But how should you approach fixing them? First, you need to determine whether the issue is indeed a flaky test. And if it is, you must identify the cause. Let’s explore the steps to identify flaky tests and understand their causes.
How to Fix the Flakiness?
Fixing the flaky tests and managing the processes requires a strategic approach, which includes identifying root causes and effectively implementing the remediation methods. Let’s look at the top approaches that you can use to handle the flaky tests:
1. Find and Isolate the Cause
Given the randomness of the flaky tests, it is essential to find the root cause, and that’s not easy. Hence, it is best to conduct a binary search by selectively running subsets of tests to pinpoint a particular test or environment condition that is causing flakiness. With this approach, you gradually narrow down the scope until you find the problematic component.
2. Analyze the Test Logs
It is always best practice to keep a detailed log of tests to capture the key steps, inputs, and outputs. This makes it easy to analyze and review the logs to identify patterns in failure. This information helps in isolating and addressing specific issues.
3. Disable and Isolate Flaky Tests From the Testing Environment
Mock or stub the external services and dependencies during the tests to minimize reliance on live external systems. Keeping those tests separate from the variability of external components makes your testing environment more stable and predictable.
4. Adjusts Timing and Waits
Rather than relying on fixed time intervals, fine-tune the wait conditions using explicit waits based on specific conditions. Implement dynamic waiting strategies to synchronize with the application’s state changes.
5. Ensure a Clean Testing Environment
Include a pre-test environment configuration step to guarantee a consistent and clean state before testing. This will ensure that the testing environment has no issues and it will limit your search to code itself for finding the root cause. For this, you can perform database resets, cache clearing, and other actions to ensure a reliable starting point.
6. Refactor Tests for Determinism
To fix flakiness, it is best to remove non-deterministic elements, like random data or dependencies on external states, from the tests. Test execution should be consistent, so ensure that the tests are set up and maintain the required state.
7. Identify Concurrency Issues
Look for shared resources or critical sections that cause contention among the concurrent tests. To address the issue, implement a locking mechanism or other concurrency control strategy to prevent interference and ensure isolation.
Tools and Frameworks to Use to Find and Manage Flaky Tests
Finding and handling flaky tests is not easy. Hence, there are a range of tools and frameworks available that can be used, from the detection of these tests to their mitigation. Here are some tools that can be used for managing flakiness in testing automation:
- Flaky Test Handler (for JUnit): This is a plugin of JUnit that automatically tries the test to distinguish between test flakiness and tests that are consistently failing.
- Pytest-rerun failures (for Python): This is a Pytest plugin that reruns failed tests to find flakiness.
- TestNG (for Java): This tool offers built-in support for rerunning failed tests to identify flaky tests.
- Jenkins: This tool supports the Flaky Test Handler plugin, which can help identify and manage flaky tests as part of the CI pipeline.
- Buildkite: This tool offers test analytics and enables automatic trying of flaky tests with a detailed report.
- Quarantine (various languages): Some CI (Continuous Integration) systems can be configured with a Quarantine feature that isolates the flaky tests from the testing suite until they are fixed.
- WireMock (for JVM): This tool mocks HTTPS services, which helps isolate tests from external dependencies that could cause flakiness.
- Mockito (for Java): This is a mocking framework that ensures the unit tests focus on the code which being tested and not the external dependencies.
- Kubernetes: While it is not directly associated with resolving test flakiness, containerization tools like Docker and orchestration tools like Kubernetes help ensure consistency across testing environments.
- SonarQube: This tool offers static code analysis, which helps identify flaws in the code, such as reliance on unordered collections.
Wrapping Up
Test flakiness is quite common, but it should not be ignored, as it can lead to reliability issues. In most cases, test flakiness arises from factors that may not occur in real environments. However, it is crucial to identify the root cause and provide the client with a build that resolves all issues. Eliminating test flakiness in large-scale automation testing not only enhances reliability but also helps prevent similar problems in the future.