|Statistical Significance and Clinical Importance||To Epidemiology theme page
To EBM theme page
1. Core Knowledge:
Significance is defined as the quality of being important. In medicine, we distinguish between statistical significance and clinical importance.
Statistical Significance. Medical studies are carried out on selected samples of people, but the goal is to apply the findings to another population (e.g., your patients). Naturally, a concern is that the sample used in the study could provide misleading results. Perhaps it was a very small sample; perhaps it was a biased sample that is not equivalent to the people you are treating; perhaps the sample was large enough, but by chance or bad luck it contained people who gave wacky results.
Statistical significance considers the first and third of these concerns. The middle one, bias, cannot be detected by mathematical deductive logic: it needs detailed information on the way the sample was chosen. This is dealt with in the notes on bias.
Consider a study that shows a new therapy to be superior to the existing therapy. Statistical significance calculates the probability that the results observed in a study may have been merely a chance finding, and would not be repeated if the study were re-done. From the notes on the logic of experimentation, you will recall that this depends on the sample size (the bigger the sample, the more confident you will be that it produces trustworthy results) and the size of the difference observed. If the study showed a huge difference between new and old therapies, the result is more likely to be real. Link: more on the statistical power of a study
Statistical significance in hypothesis testing is expressed in terms of a probability (hence that little letter "p"). By convention this is set at 5%, or p < 0.05: there is only a 5% chance that a difference of the size found in your study, or a greater difference, would occur by chance, if there was actually no difference in the whole population. (In other words, you have drawn a false positive conclusion over the new therapy). The 5% value is arbitrary and is not chosen in terms of the actual magnitude of the effect seen in the study. Results are said to be "statistically significant" if the probability that the result is compatible with the null hypothesis is very small.
Crucial Point: testing statistical significance is all about the likelihood of a chance finding that will not hold up in future replications. Significance does not tell us directly how big the difference was.
Clinical significance, or clinical importance: Is the difference between new and old therapy found in the study large enough for you to alter your practice? Because there is always a leap of faith in applying the results of a study to your patients (who, after all, were not in the study), perhaps a small improvement in the new therapy is not sufficient to cause you to alter your clinical approach. Note that you would almost certainly not alter your approach if the study results were not statistically significant (i.e. could well have been due to chance). But when is the difference between two therapies large enough for you to alter your practice?
Statistics cannot fully answer this question. It is one of clinical judgment, considering the magnitude of benefit of each treatment, the respective profiles of side effects of the two treatments, their relative costs, your comfort with prescribing a new therapy, the patient's preferences, and so on. But we can provide different ways of illustrating the benefit of treatments, in terms of the Number Needed to Treat. Yet another example of science offering only partial guidance to the art of medicine.
A partial way out of this uncertainty is to express study results using confidence intervals instead of significance levels. Confidence intervals show the likely range of results within which the true value is likely to lie. An example: a study showed a statistically significant impact (p < 0.03) of Transcendental Meditation on reducing systolic BP compared to controls. The mean reduction was 7 mm Hg (95% CI 4, 10). Instead of significance testing telling us that this study result could have occurred 3% of the time by chance alone, confidence intervals tell us what our best guess is for the size of the population effect, 95% of the time. This seems more informative for the clinician.
An important idea to grasp is that if a study is very large, its result may be statistically significant (= unlikely to be due to chance), and yet the deviation from the null hypothesis may be too small to be of any clinical interest. Conversely, the result may not be statistically significant because the study was so small (or "under powered"), but the difference is large and would seem potentially important from a clinical point of view. You will then be wise to do another, perhaps larger, study.
Measures of the impact of an exposure
The Statistical Power of a study
Article on interpretation of statistical Significance Tests
"Statistics are like bikinis. What they reveal is suggestive but what they conceal is vital" Aaron Levenstein