Skip to content

Statistics

Warning

This page is under construction

Overview of CMS techniques

CMS searches typically determine an observable or set of observables that is used to measure the potential presence of signal events. This can be any observable, preferably highlighting unique features of the signal process. Signal extraction is based on maximum likelihood fits that compare ``data'' (either collision data or pseudodata sampled from a test distribution) to the signal (\(s\)) and background (\(b\)) predictions, with signal scaled by some unknown ratio \(\mu\). The likelihood is assumed to follow a Poisson distribution, and all predictions are subject to various nuisance parameters, \(\theta\), that are given default values \(\tilde{\theta}\) and assigned probability density functions (\(p\)). The likelihood function can be written as:

\[ \mathcal{L}(\mathrm{data}\vert \mu,\theta) = \mathrm{Poisson}(\mathrm{data}\vert \mu\cdot s(\theta) + b(\theta))\cdot p(\tilde{\theta}\vert\theta). \]

Systematic uncertainties are incorporated into the fit as nuisance parameters. Lognormal probability distributions are assigned to uncertainties that affect only the normalization of a histogram or rate of a predicted event yield, and Gaussian probability distributions are typically assigned to uncertainties provided as histograms that affect the shape of a distribution. You can learn about several typical sources of uncertainty in CMS analyses in the Systematics section of the Guide.

Observed and expected limits on the signal ratio \(\mu\) are extracted by comparing the compatibility of the observed data with a background-only (\(\mu = 0\)) hypothesis as well as with a signal+background hypothesis. The most common statistical method within CMS is the CLs method (Read, 2002 and Junk, 1999), which can be used to obtain a limit at the 95% confidence level using the profile likelihood test statistic (Cowan, 2010) with the asymptotic limit approximation.

The "Higgs Combine" software framework used by the CMS experiment to compute limits is built on the RooFit and RooStats packages and implements statistical procedures developed for combining ATLAS and CMS Higgs boson measurements.

Tutorials

Many tutorials and lectures on statistical interpretation of LHC data are available online. Some selected highlights are listed here.