Astrophysics (Index) | About |
Sigma (σ) is a Greek letter often used in science in indicating the confidence level of a measurement, e.g., of an experiment or observation. It is the symbol for the standard deviation, which is a particular unit over the width of a distribution function that gives some indication of the spread of the distribution, i.e., to what degree it is bunched versus spread out. An experimental result's implied distribution function's domain is some measurable quantity, such as a length or a temperature, and the standard deviation is expressed in the units of this quantity, such as saying "with a sigma of 2.3 meters".
The common use of sigma mentioned above is to express a confidence level regarding how likely/unlikely some experimental result is from being merely random, based upon the distribution function of possible erroneous measurement results. In this case, the quantity being measured, rather than a physical quantity (such as length or temperature) is the likelihood that the answer is not merely chance, a probability between 0 and 1. Both the physical quantity being measured and the instruments can include some randomness, which can be the limiting factor on the reliability of the result. If something about the distribution of this randomness is known, the extent of such randomness can be expressed in units of sigma, with more sigma meaning less likely to be merely random, e.g., "a 2σ result" (i.e., at least two standard deviations away from the expected result due to mere chance). A well-known source of randomness is sample size, i.e., the number of independent measurements, which often has a distribution that can be characterized. If the measurement result is one that would very rarely happen due to the expected measurement errors (this rarity expressed as some number of sigmas), then it looks like the result wasn't a fluke. Calculation of and the citing of such a sigma count is based a particular kind of distribution function, a normal distribution, which is what is expected if the randomness is made up of many small independent randomizing phenomena.
An experimental result may be quoted as being reliable to a certain number of sigmas, such as 4σ or 6σ. Some loose grammatical usage has grown around the term and symbol: the meaning of "a confidence level of 5 sigmas" or "a 5-sigma confidence level" is analogous to "a ladder of 10 feet" or "a 10-foot ladder", but phrases like "a sigma 4 result" or "sigma=4" are occasionally used and generally understood, though they diverge from grammatical usage regarding units.
In some branches of science, a discovery is not claimed until a 5σ confidence level is achieved. Even with that, 2σ is still useful as a hint that you may be on to something. In many cases, getting more data by repeating the tests would raise the sigma count if the 2σ result is in fact real, assuming there is no systematic error, i.e., some non-random mechanism for producing errors, such as a mis-calibrated instrument. It is important to keep in mind that a cited confidence presumes the errors are known to be random, i.e., the errors are correctly understood to that extent. Undiscovered factors can make the cited confidence spurious.
Some sigma values expressed as percentage confidence level, i.e., that the measurement wasn't a fluke:
Confidence in sigmas | percent confidence | same, if clearly to a specific side of the mean |
1σ | 68.2689492137086% | 84.1344746068543% |
2σ | 95.4499736103642% | 97.7249868051821% |
3σ | 99.7300203936740% | 99.865010196837% |
4σ | 99.9936657516334% | 99.9968328758167% |
5σ | 99.9999426696856% | 99.9999713348428% |
6σ | 99.9999998026825% | 99.9999999013413% |
7σ | 99.9999999997440% | 99.999999999872% |
Example: if you have a pill and believe it will lower someone's body temperature, you give it to two people, and find their body temperatures slightly lower, it seems to work, but if people's temperature always varies slightly, the result could be random: it either goes up or down, and, for example, getting heads twice when flipping a coin twice is not overly rare. The chance of both people having a lower temperature due to such randomness is 1/4, i.e., there's a 75% chance against such a result if they were random, so confidence in the result is 1 sigma (at least 68% chance that it is not a fluke).
If the sample were five people and all five got a lower temperature, the chance of it being random is 1/32 (i.e., 96.875% chance against it being random), confidence is 2 sigmas (at least 95% chance).
Another example: if an instrument pointed at the sky indicates a blip (some sort of signal that doesn't look like "clear sky"), and you think it might be some real astronomical entity, and you know that the instrument produces some random results, you might use your knowledge of the instrument's errors to calculate the number of sigmas. Perhaps you tested the instrument on a known-clear piece of sky and noted the random results, and in only 5 out of 100 times, it produced a blip this extreme, suggesting a confidence level of 2 sigmas. Note that there may be other sources of error, e.g., atmospheric conditions. A cited confidence level is for whatever types of errors that were considered, but often some types of errors are too difficult to deal with and some may have not have crossed the experimenter's mind.
In real life, more math statistics may be involved: for example, the pill may be assumed to work on only some percentage of people, and math is needed to calculate a confidence level based upon that hypothesis.
Currently in some sciences there is a reaction against the way conclusions are based upon sigma, particularly using a specific threshold (number of sigma) as a test of whether some measurement "proves" some hypothesis (or indicates the hypothesis is of further interest). An issue is drawing the conclusion that a hypothesis is disproved if an experiment's confidence level doesn't reach the threshold. Another issue is that doing lots of experiments searching for some particular significance-threshold raises the likelihood of some spurious results, whether or not it is repeating the same experiment, and whether the experiments are done sequentially or simultaneously: if you perform a study searching for something significant by measuring 20 separate quantities, there's a good chance one of them will show a "2σ result", even if each of the 20 is merely varying randomly.