Sequential Tests In A Row Lead To The Wrong Decision
Management Summary
Are you also aware of the disruptive factors associated with sequential tests? In contrast to “real” testing (in which the test variants are tested in parallel at the same time), many temporal components can significantly influence a test. To name a few:
- Different behavior according to season, day, etc.
- Events such as. the Soccer World Cup, Christmas, etc.
- Campaign influences of the competition
- Weather (rain, sunshine, etc.)
- Different traffic
- …
For example, both your own campaigns and those of the competition can have an impact on the test variants. The test users of variant A would see the campaign, but the test users of variant B would not because the campaign is no longer running.
What effects do temporal disturbance factors have?
To be clear: They can cause you to unintentionally choose a bad variant as the test winner. To better understand this danger, I would like to demonstrate a possible scenario using an example:
Imagine that you want to test the layout of a Valentine’s Day campaign in the colors red (=control variant) and pink (=test variant) against each other to check which of the two colors better symbolizes this special day for the user and leads to more conversions.
In October, the campaign runs with the red layout and generates a conversion rate of 0.6%. In November, the pink layout runs, which achieves a conversion rate of 2%. Even if you already suspect it in this case: A quick check of the statistical significance using the significance calculator shows that the pink layout performs better with a probability of 99%. So it’s clear, right? Unfortunately not!
I’ll now show you what the result would have looked like if you had carried out a “real” test and tested the two variants against each other in parallel at the same time.
In this case, the pink variant would have been ahead of the red variant in October. In November, the result would have been reversed and the conversion rate of the red variant would have been ahead.
Sequential tests can quickly lead to distortions in the results due to temporal disruptive factors. A direct comparison of the test variants is not possible because they are not exposed to the same influencing factors. There is no guarantee that the same traffic will come to the test site. The result cannot therefore be attributed solely to the changed variants. In addition, you cannot see in advance how long the test variants would have to run one after the other before the result is statistically meaningful.
Conclusion – Can I trust sequential testing?
To reduce temporal disruption factors and thus the probability of errors, test different variants always at the same time against each other in an A/B or multivariate test!
You can find out how this works in the article “Use conversion optimization to gain a competitive advantage“.