Physics:PP/11/Analysis Methods for Searches
Blind searches for new physics is an analysis technique that does not involve an immediate study of actual data where the expected signal may be present. Blind analyses are used where it is important to reduce subjective biases in choosing selection cuts. This search technique proceeds through the following steps:
- Establish expectations for signal signatures, and define the signal region in data where such signatures might be present. It is assumed that the analysers do not look at this region, i.e. blind it
- Create Monte Carlo events and optimize the selection cuts to increase the signal-over-background ratio for the expected signal. If Monte Carlo models for background events do not exist, or have issues with reliability or have insufficient statistics, one uses various “control” regions defined using actual data. A control region should represent the features needed to catch the detector-level characteristics of events, but should not include possible signal events. A common method for control regions is to design a histogram filled in all regions but the region where the signal is expected.
- When all analysers agree on the selection requirements defined by the step (2), apply such requirements to data. This step is called "unblinding". The region of the histogram that was previously not filled with events will be populated by events that pass the requirements.
- Report the observation in the signal region and publish the results.
Blind analysis is becoming increasingly acceptable at large particle collision experiments. At the LHC, most of the publication that reports searches for physics beyond the Standard Model apply this technique.
Despite the popularity of the "blinding" the signal region, this technique suffers from several shortcomings:
- Generally, an analysis method should avoid the inclusion of psychological, sociological and other "human" nature factors. Making various assumption on moral quality, human unintentional behavior, and the level of professionalism of data analysers in designing an analysis method can be a rather costly way of restricting a range of possible studies by employing such methods (which will be discussed later), may unfairly treat experienced scientists who would never consider modification of selection cuts based on visual outcome. In particular, a blind analysis assumes that analysers are (intentionally or unintentionally) vulnerable to “manufacturing” selection requirements that create spurious features that can be perceived as genuine signal events. Such an inclusion of “human” nature in an analysis technique definition is not something that should be a high priority for a rigorous definition of a scientific method.
- Likewise, an analysis method cannot rely on the honestly of analysers who are not suppose to look at data, while they do require data to build various control regions, or fill histograms with data but skipping a region of the histogram where a signal may show up. It is not clear how the requirement of "not looking" at a region of the histogram that may include “blinded signal region” can be technically enforced. Thus, at the end of the day, one should still fully rely on the honesty of analyzers who are overcoming temptation of having a glimpse of the signal region, while running over data and filling many other related histograms. In some cases, such “honest behavior” expectation is not so different from the situation when an analyser looks at the signal region, but refrain from playing with cuts to manufacture signal. Usually, a common joke among physicists is that blind analysis is good for “inexperienced but honest analysers” (aka students).
- High energy physics has relatively little experience in making true unexpected discoveries using the blinding procedure. In fact, historically, a blind analysis has not been fully employed in actual discoveries of new physics. It was applied for the Higgs discovery but one can argue that this was a special case in which all parameters of the Higgs bosons were known (except for its mass). One can argue that the Higgs boson was expected by many when it was discovered in 2012. There is no doubt that the Higgs boson will be discovered anyway without blinding using the simple channel H→ gamma+gamma which fully relies on performance selection cuts and cannot be arbitrary changed. The two-photon invariant mass is simply a very basic variable used for Standard Model measurements and cannot be overlooked.
As a final remark, one can also argue that it is inconceivable a situation when a signature for a completely new particle is observed after unblinding the data, and such result will be sent for publication without additional checks. For a significant claim for new physics, a rigorous set of cross checks have to follow the unblinding procedure, and such checks would typically involve independent analysers and the actual data, thus defying the purpose of the blinding procedure itself, and returning to the “open-eyes” search to be discussed next.
There are many cases when searches are executed without blinding a signal region in data. Such searches were especially common in the past when no Monte Carlo simulations were available (and, ironically, when most of discoveries in high-energy physics have been made). For example, fully unblinded (or “open-eyes”) searches can be performed when:
- Monte Carlo simulation for signal regions are not available, or when "control" regions of data cannot be constructed.
- All selection cuts are well defined on the basis of performance requirements and cannot be modified under no circumstance. The signal region is simple and does not require any complex kinetic selection cuts for optimization.
Examples of fully unblinded analysis include searches in invariant masses of jets and particles (such as leptons, photons etc.). In such searches, the control region based on data cannot or hard to contract, while the selection and reconstruction of jets and identified particles are well established on the bases of optimal performances of the detectors for such objects. Such selection and identification requirements cannot be arbitrary changed based on observations in the signal region. Such “open-eyes” searches have their advantages and disadvantage. The main advantage is that the procedure does not involve subjective factors, i.e. the moral quality and a level of professionalism of analyzers, who are expected to be fully honest in claiming that they did not look at the signal region while executing the blinding procedure. The fact that the analysers use established object selection is easy to verify. Another advantage of the “open-eyes” approach is that it opens up the opportunity to look at data from a rather unexpected perspective, at signatures that are hard to foresee in the context of model expectations.
The main critique for the “open-eye” search is that analyser’s event selection may still be influenced by their expectations, and even experienced scientists may not notice how a simple requirement, which may initially look totally uncontroversial, introduces a signature in data that can be interpreted as signals. If searches involve non-standard selection beyond those established by independent performance groups a prior the searches, non-standard selection cuts should be significantly reviewed and cross checked using some toy Monte Carlo simulations (if realistic MC is not available), or some kinematic considerations should be invoked that illustrate the selection does not influence final observation. The easiest way to avoid the complex problem discussed above is to avoid any selection requirements that may influence the signal region in data, and use only well-established and recommended event selection and object identification criteria for constructing the signal region.
Model-independent searches are a type of searches that use multiple selection criteria that reduce known background events, such as those from the Standard Model processes. Such searches can be blinded or unblinded. For example, searches in invariant mass of two jets are typically known are model-dependent if no selections tuned to particular exotic models are applied. In this example blinding concept is less common given the inclusive style of searches, and the fact that jets are typically reconstructed using a well-defined jet reconstruction procedure defined for a broad range of general measurements.
General searches belong to a technique where the unblinded analysis is applied to data using multiple observable, and comparing such observables with the Standard Model predictions. Typically, unblinded character is justified by using a well-defined event and object reconstruction procedure used for other measurements, i.e unrelated to searches. For example, a calculation of the rates (or cross sections) of events with a different number of well isolated leptons and comparison of such rates with the Standard Model expectations is a typical example of general searches.