When Is Your Test Result Significant The Statistics Behind The Ab Test

15.07.2015

Reading time: 9 min.

When Is Your Test Result Significant The Statistics Behind The Ab Test

Management Summary

In this blog post we want to show you how to calculate significance in three steps. It's always good to know how such a result comes about and what's behind it. That's why we'll take a look at how these computers work.

“Don’t trust any statistics you haven’t falsified yourself,” Winston Churchill most likely never said. However, you often hear this sentence in everyday life when the validity of statistical results is questioned. And in fact: it never hurts to know how such a result came about, especially if you do it yourselfA/B testingfor conversion optimization.

In the endless expanses of the Internet there are numerous online calculators that determine whether the difference between the conversion rates of two test variants is significant. Also withour significance calculatorsee at a glance which result is significant and which uplift or downlift could have occurred by chance.

However, we want to take a look at how these computers work. In this sense: “Don’t trust any statistics that you haven’t calculated yourself!”

Step 1: Observed and expected values

Most significance calculators use chi for their calculation²-Test (pronounced: ki-square). How this works and how it is calculated will be shown here using a fictitious example:

	Visitors	Conversions	Conversion rate
Variant A	9,998	1,001	10.01%
Variant B	10,001	1,087	10.87%

The conversion rates are obtained by dividing the observed conversions of a variant by the total number of visitors to a variant and multiplying them by 100%. For variant A this results in a conversion rate of 10.01% and for variant B 10.87%. The measure by which variant B performs better than variant A is called the uplift. This is calculated as follows:So variant B has an 8.55% higher conversion rate than variant A.However, this does not automatically mean that variant B is better than variant A.The different conversion rates can also arise by chance. This random deviation is called “statistical noise”.

For example, if you throw a die 600 times, you can expect the six to fall 100 times. In fact, there is a very high probability that we will observe a value that deviates from this. Maybe six is rolled 110 times, maybe just 92 times. Only if we roll an infinite number of dice can we be sure that exactly one sixth of all rolls will show a 6.

However, since an infinite number of observations is practically impossible, we must accept that the observed value may differ from the actual value. In order to be able to judge whether variant B actually performs better than variant A, we have to find out how likely the difference is due to statistical noise.

To do this, a crosstab is created that contains the observed values of converted and non-converted visitors:

	Visitors	Conversions	No conversions	Conversion rate
Variant A	9,998	1,001	8,997	10.01%
Variant B	10,002	1,087	8,915	10.87%

The next step is to compare these observed values with the values that one would expect if there were no difference between the two variants. The expected values result from the common conversion rate of both variants:The number of visitors to each variant is now multiplied with this common conversion rate. This gives us the expected conversions of a variant:The expected non-conversions are easily determined by subtracting the expected conversions from the number of visitors of the two variants: erwartete nicht-conversions We now enter these values into our crosstab:

	Visitors	Conversions		No conversions		CR
		observed	expected	observed	expected
Variant A	9,998	1,001	1,044	8,997	8,954	10.01%
Variant B	10,002	1,087	1,044	8,915	8,958	10.87%

Step 2: Calculate deviation values

Now we calculate the deviation from the expected value for each observed value. This is done with the formula: Abweichung By squaring the deviation, we ensure that it does not matter whether the observed value is below or above the expected value and avoid positive and negative deviations canceling out or weakening each other. In addition, larger deviations are taken into account more than small deviations.

By dividing by the expected value, we also accept a slightly larger deviation for higher expected values than for lower expected values. Abweichung We proceed similarly for the conversions of variant B and the values of the non-conversions. We now add all four values together and get the test value for the Chi²-Test:

Step 3: Check for significance

In general, the higher this value is, the higher the probability that the two variants actually differ. It is not possible to make a 100% statement about this, so a confidence value must be chosen. This indicates the probability that the two variants are different. For this confidence value you now hit in oneChi²-Distribution tablehow high the test value must be at least so that a difference between the two variants is at least as high as the confidence value. Popular confidence values and the associated minimum test values are:

Confidence	Minimum test value	note
90%	2.71	tends to
95%	3.84	significant
99%	6.63	very significant
99.9%	10.83	highly significant

We decide on a confidence of 95%, for this we need a test value of at least 3.84. With 3.955 we skip this minimum value, so we can claim with 95% probability:Variants A and B differ significantly from each other. The measured uplift actually has its origin in a higher conversion rate and is not just caused by statistical noise.However, what we cannot say for sure is that the uplift is actually +8.55%. This is the most likely value that the uplift can take, but it is also subject to statistical noise. But it is very likely close to this value.

Is all this too complicated for you? Then just use thate-dialog significance calculator >

Are you not sure what to do with the calculated result? Or do you not know what to test?
We are happy to help you! Write to us viakontakt@e-dialog.group

No two challenges are the same. Speak with our experts now to find your individual solution.

Any questions? Talk to us.

Get in touch now

Relevant content

More about Analytics

Analytics

Data strength & AI power: tech highlights at GACon 2026

Analytics

AI search, zero click and traffic slump: what the figures really say

Analytics

Google AI Studio: From prompt to GTM tool in under 60 minutes

Analytics

When Is Your Test Result Significant The Statistics Behind The Ab Test

Management Summary

Step 1: Observed and expected values

Step 2: Calculate deviation values

Step 3: Check for significance

Are you not sure what to do with the calculated result? Or do you not know what to test?
We are happy to help you! Write to us viakontakt@e-dialog.group

Any questions? Talk to us.

More about Analytics

Data strength & AI power: tech highlights at GACon 2026

AI search, zero click and traffic slump: what the figures really say

Google AI Studio: From prompt to GTM tool in under 60 minutes

Beyond Clicks: How GEO Strengthens Your Digital Identity and Becomes Visible in GA4

e-dialog Vienna

e-dialog Berlin

e-dialog Munich

e-dialog Zurich

When Is Your Test Result Significant The Statistics Behind The Ab Test

Management Summary

Step 1: Observed and expected values

Step 2: Calculate deviation values

Step 3: Check for significance

Are you not sure what to do with the calculated result? Or do you not know what to test? We are happy to help you! Write to us viakontakt@e-dialog.group

Any questions? Talk to us.

More about Analytics

Data strength & AI power: tech highlights at GACon 2026

AI search, zero click and traffic slump: what the figures really say

Google AI Studio: From prompt to GTM tool in under 60 minutes

Beyond Clicks: How GEO Strengthens Your Digital Identity and Becomes Visible in GA4

Are you not sure what to do with the calculated result? Or do you not know what to test?
We are happy to help you! Write to us viakontakt@e-dialog.group