CMS Open Data¶
The CMS experiment at CERN has released research-quality data from particle collisions at the LHC since 2014. Almost all data from the first LHC run in 2010–2012 ("Run1") with the corresponding simulated samples are in the public domain, and several scientific studies have been performed using these data. First data from the second LHC run in 2015-2018 ("Run2") have been released in 2021.
Open data are released after an embargo period of six years, which allows the collaboration to understand the detector performance and to exploit the scientific potential of these data. This is also necessary for the time needed to reprocess the data with the best available knowledge before the release.
The first release of each year’s data consists of 50% of the integrated luminosity recorded by the experiment, and the remaining data will be released within ten years, unless active analysis is still ongoing. However, the amount of open data will be limited to 20% of data with the similar centre-of-mass energy and collision type while such data are still planned to be taken. This approach allows for a fairly prompt release of the data after a major reprocessing once the reconstruction has been optimised, but still guarantees that the collaboration will have the opportunity to complete the planned studies with the complete dataset first.
The open data releases are regulated in the CMS data preservation, re-use and open access policy.
CERN open data portal includes a brief description about CMS open data and different tools available to analyze them. The main points are:
-
the released data are as those used by the CMS collaboration, with all their complexicity
-
some CMS-specific software is needed and available to get started with these data
-
a computing environment compatible with the data and software needed for their analysis is provided.
The experimental particle physics data are complex and studying them requires a solid understanding of the underlying physics, knowledge of different detector systems involved in data taking, and some mastering of the data handling. Some of these challenges have been addressed in this note, and this guide is part of the measures taken to improve the usability of CMS open data.