‘Chance to beat control’ vs. ‘Significance’

In Webtrends Optimize we use two different statistical measurements to help us analyse our test results.

Chance to beat control

The chance to beat control tells us how likely a variant (experiment) is to beat the control experiment and it determines the direction of the change.

We interpret the percentages in the following way:

Between 0.01% and 5% chance to beat control = strong negative trend compared to control

Between 95% and 99.99% = strong positive trend compared to control

Between 5% and 20% = moderate negative trend compared to control

Between 80% and 95% = moderate positive trend compared to control

Between 20% and 80% = no signficant trends compared to control



Think of the question of “Is the test statistically significant?” as determining, with statistical confidence, that any performance difference between the variation and control is meaningful, and not due to random sampling error. Currently Webtrends Optimize uses a 95% confidence level to calculate significance.


So which one should we use?

Suppose you only know the test is ‘significant’. You cannot determine if the variation is successful yet because you don’t know if that difference is positive or negative. It could be the case the test performs 50% worse than the control, or it could be 50% better. We don’t know – this statistic doesn’t answer that.

Similarly, suppose you only know that the chance to beat control is 95%, 99.9%, or even 99.999999%. You cannot determine if the variation is successful yet because you don’t have any measure of confidence (or significance) of that difference. It could be the case that the variation only had 5 observations, 4 of which were conversions, and 1 was a page view. It would be folly to declare “success” based on such a small number of samples. In other words, the chance to beat control statistic has no information about whether or not the “chance” is repeatable and/or consistent.

This is why ideally we need both statistics to officially declare success.


Using ‘Chance to beat control’ when significance hasn’t been reached

As mentioned above, ideally we’d use both metrics in conjunction with each other to determine a test result. That being said, we live in the real world and there are times when a test has been running for a number of weeks/months, has large sample sizes and strong positive/negative chance to beat control trends and looks to have stabilised, but still shows as being ‘not significant’.

In these scenarios we can still draw some conclusions based on the chance to beat control as long we’re aware that statistically speaking the test hasn’t actually reached significance.