pidcalib2

Name	pidcalib2 JSON
Version	1.3.6 JSON
	download
home_page	https://gitlab.cern.ch/lhcb-rta/pidcalib2
Summary	A set of tools for estimating LHCb PID efficiencies
upload_time	2025-02-19 13:18:54
maintainer	None
docs_url	None
author	Daniel Cervenkov
requires_python	>=3.6
license	GNU General Public License v3 (GPLv3)
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # PIDCalib2

A set of software tools for estimating LHCb PID efficiencies.

The package includes several user-callable modules:
- [`make_eff_hists`](#make_eff_hists) creates histograms that can be used to estimate the PID efficiency of a user's sample
- [`ref_calib`](#ref_calib) calculates the LHCb PID efficiency of a user reference sample
- `merge_trees` merges two ROOT files with compatible `TTree`s
- [`plot_calib_distributions`](#plot_calib_distributions) allows you to plot distributions of variables in the calibration datasets
- [`pklhisto2root`](#pklhisto2root) converts [Pickled](https://docs.python.org/3/library/pickle.html) [boost-histogram](https://github.com/scikit-hep/boost-histogram)s to ROOT histograms

The term "reference dataset/sample" refers to the user's dataset to which they want to assign PID efficiencies. The "calibration datasets/samples" are the special, centrally managed samples used internally by PIDCalib for PID efficiency estimation. The `--sample` argument always concerns these calibration samples.

Slides with additional information, example output, and plots are available on [Indico](https://indico.cern.ch/event/1055804/contributions/4440878/attachments/2277451/3869206/Run12_210707_v2.pdf).

## Setup

When working on a computer where the LHCb software stack is available (LXPLUS, university cluster, etc.), one can setup PIDCalib2 by running
```sh
lb-conda pidcalib bash
```
After this, the following commands will be available
```sh
pidcalib2.make_eff_hists
pidcalib2.ref_calib
pidcalib2.merge_trees
pidcalib2.plot_calib_distributions
pidcalib2.pklhisto2root
```
You can skip the bash invocation and join the setup and run phases into a single command
```sh
lb-conda pidcalib pidcalib2.make_eff_hists
```
To run `make_eff_hists`, you will need access to CERN EOS. You don't need to do anything special on LXPLUS. On other machines, you will usually need to obtain a Kerberos ticket by running
```sh
kinit [username]@CERN.CH
```

### Installing from PyPI

The PIDCalib2 package is available on [PyPI](https://pypi.org/project/pidcalib2/). It can be installed on any computer via `pip` simply by running (preferably in a virtual environment; see [`venv`](https://docs.python.org/3/library/venv.html))
```sh
pip install pidcalib2
```
Note that this will install the [`xrootd`](https://pypi.org/project/xrootd/) *Python bindings*. One also has to install XRootD itself for the bindings to work. See [this page](https://xrootd.slac.stanford.edu/index.html) for XRootD releases and instructions.

## `make_eff_hists`

This module creates histograms that can be used to estimate the PID efficiency of a user's sample.

Reading all the relevant calibration files can take a long time. When running a configuration for the first time, we recommend using the `--max-files 1` option. This will limit PIDCalib2 to reading just a single calibration file. Such a test will reveal any problems with, e.g., missing variables quickly. Keep in mind that you might get a warning about empty bins in the total histogram as you are reading a small subset of the calibration data. For the purposes of a quick test, this warning can be safely ignored.

### Options

To get a usage message listing all the options, their descriptions, and default values, type
```
pidcalib2.make_eff_hists --help
```

The calibration files to be processed are determined by the `sample`, `magnet`, and `particle` options. All the valid combinations can be listed by running
```sh
pidcalib2.make_eff_hists --list configs
```

Aliases for standard variables are defined to simplify the commands. We recommend users use the aliases when specifying variables. When you use a name that isn't an alias, a warning message like the following will show up in the log.
```
'probe_PIDK' is not a known PID variable alias, using raw variable
```
All aliases can be listed by running
```sh
pidcalib2.make_eff_hists --list aliases
```
Note that there are many more variables than there are aliases. If you want to find a variable for which no alias exists, you can check one of the calibration files yourself. The paths to the calibration files are printed when the `--verbose` option is specified. Alternatively, you can simply guess the name - if it doesn't exist, PIDCalib2 will let you know and might provide a list of similar names that do exist.

A file with alternative binnings can be specified using `--binning-file`. The file must contain valid JSON specifying bin edges. For example, two-bin binnings for particle `Pi`, variables `P` and `PT` can be defined as
```json
{"Pi": {"P": [10000, 15000, 30000], "PT": [6000, 10000, 20000]}}
```
An arbitrary number of binnings can be defined in a single file.

Complex cut expressions can be created by chaining simpler expressions using `&` (logical and) and `|` (logical or). One can also use standard mathematical symbols, like `*`, `/`, `+`, `-`, `(`, `)`. Whitespace does not matter.

### Examples
- Create a single 3D efficiency histogram for a single PID cut
  ```sh
  pidcalib2.make_eff_hists --sample Turbo18 --magnet up --particle Pi --pid-cut "DLLK > 4" --bin-var P --bin-var ETA --bin-var nSPDhits --output-dir pidcalib_output
  ```

- Create multiple histograms in one run (most of the time is spent reading
in data, so specifying multiple cuts is much faster than running
make_eff_hists sequentially)
  ```sh
  pidcalib2.make_eff_hists --sample Turbo16 --magnet up --particle Pi --pid-cut "DLLK > 0" --pid-cut "DLLK > 4" --pid-cut "DLLK > 6" --bin-var P --bin-var ETA --bin-var nSPDhits --output-dir pidcalib_output
  ```

- Create a single efficiency histogram for complex cuts using only negatively charged tracks
  ```sh
  pidcalib2.make_eff_hists --sample Turbo18 --magnet up --particle Pi --pid-cut "MC15TuneV1_ProbNNp*(1-MC15TuneV1_ProbNNpi)*(1-MC15TuneV1_ProbNNk) < 0.5 & DLLK < 3" --cut "IsMuon==0 & Brunel_PT>250 & trackcharge==-1" --bin-var P --bin-var ETA --bin-var nSPDhits --output-dir pidcalib_output
  ```

### Caveats

Not all datasets have all the variables, and in some cases, the same variable is named differently (e.g., `probe_Brunel_IPCHI2` is named `probe_Brunel_MINIPCHI2` in certain electron samples). The aliases correspond to the most common names, but you might need to check the calibration files if PIDCalib2 can't find the variable you need.

## `ref_calib`

This module uses the histograms created by `make_eff_hists` to assign efficiency to events in a reference sample supplied by the user. Adding efficiency to the user-supplied file requires PyROOT and is optional.

The module works in two steps:

1. Calculate the efficiency and save it as a TTree in a separate file.
2. Optionally copy the efficiency TTree to the reference file and make it a friend of the user's TTree. The user must request the step by specifying `--merge` on the command line.

Be aware that `--merge` will modify your file. Use with caution.

### Options

The `sample` and `magnet` options are used solely to select the correct PID efficiency histograms. They should therefore mirror the options used when running `make_eff_hists`.

`bin-vars` must be a dictionary that relates the binning variables (or aliases) used to make the efficiency histograms with the variables in the reference sample. We assume that the reference sample branch names have the format `[ParticleName]_[VariableName]`. E.g., `D0_K_calcETA`, corresponds to a particle named `D0_K` and variable `calcETA`. If the user wants to estimate PID efficiency of their sample using 1D binning, where `calcETA` corresponds to the `ETA` binning variable alias of the calibration sample, they should specify `--bin-vars '{"ETA": "calcETA"}'`.

`ref-file` is the user's reference file to which they want to assign PID efficiencies. The parameter can be a local file or a remote file, e.g., on EOS (`--ref-file root://eoslhcb.cern.ch//eos/lhcb/user/a/anonymous/tuple.root`).

`ref-pars` must be a dictionary of particles from the reference sample to apply cuts to. The keys represent the particle branch name prefix (`D0_K` in the previous example), and the values passed are a list containing particle type and PID cut, e.g. `'{"D0_K" : ["K", "DLLK > 4"], "D0_Pi" : ["Pi", "DLLK < 4"]}'`.

The `--merge` option will copy the PID efficiency tree to your input file and make the PID efficiency tree a "Friend" of your input tree. Then you can treat your input tree as if it had the PID efficiency branches itself. E.g., `input_tree->Draw("PIDCalibEff")` should work. ROOT's "Friend" mechanism is an efficient way to add branches from one tree to another. Take a look [here](https://root.cern.ch/root/htmldoc/guides/users-guide/Trees.html#example-3-adding-friends-to-trees) if you would like to know more.

### Examples
- Evaluate efficiency of a single PID cut and save it to `user_ntuple_PID_eff.root` without adding it to `user_ntuple.root`
  ```sh
  pidcalib2.ref_calib --sample Turbo18 --magnet up --ref-file data/user_ntuple.root --histo-dir pidcalib_output --bin-vars '{"P": "mom", "ETA": "Eta", "nSPDHits": "nSPDhits"}' --ref-pars '{"Bach": ["K", "DLLK > 4"]}' --output-file user_ntuple_PID_eff.root
  ```
- Evaluate efficiency of a single PID cut and add it to the reference file `user_ntuple.root`
  ```sh
  pidcalib2.ref_calib --sample Turbo18 --magnet up --ref-file data/user_ntuple.root --histo-dir pidcalib_output --bin-vars '{"P": "mom", "ETA": "Eta", "nSPDHits": "nSPDhits"}' --ref-pars '{"Bach": ["K", "DLLK > 4"]}' --output-file user_ntuple_PID_eff.root --merge
  ```
- Evaluate efficiency of multiple PID cuts and add them to the reference file
  ```sh
  pidcalib2.ref_calib --sample Turbo18 --magnet up --ref-file data/user_ntuple.root --histo-dir pidcalib_output --bin-vars '{"P": "P", "ETA": "ETA", "nSPDHits": "nSPDHits"}' --ref-pars '{"Bach": ["K", "DLLK > 4"], "SPi": ["Pi", "DLLK < 0"]}' --output-file user_ntuple_PID_eff.root --merge
  ```

### Caveats

You might notice that some of the events in your reference sample are assigned `PIDCalibEff`, `PIDCalibErr`, or both of -999.
- `PIDCalibEff` is -999 when for at least one track
  - The event is out of binning range
  - The relevant bin in the efficiency histogram has no events whatsoever
  - The efficiency is negative
- `PIDCalibErr` is -999 when for at least one track
  - The event is out of binning range
  - The relevant bin in the efficiency histogram has no events whatsoever
  - The relevant bin in the efficiency histogram has no events passing PID cuts
  - The efficiency is negative

Because of `double` → `float` conversion in the original PIDCalib, tiny discrepancies (<1e−3 relative difference) in the efficiencies and/or uncertainties are to be expected.

A bug in the original PIDCalib caused the electron calibration datasets to be read twice, resulting in incorrect efficiency map uncertainties.

The original PIDCalib didn't apply the correct cuts to Omega samples (`K_Omega` and `K_DD`), leading to non-sensical efficiency maps.

### Electrons

To use the efficiency tables for electrons in 2024, one should run a command similar to:

```sh
pidcalib2.ref_calib  --sample 2024_WithUT_block1_Tables_2brem --magnet up --ref-file "data/user_ntuple.root" --histo-dir /eos/lhcb/wg/rta/WP4/PIDCalib2_ElectronTables --bin-vars "{'P' : 'P'}" --ref-pars '{"eprobe": ["e", "ProbNNe > 0.2"]}' --output-file user_ntuple_PID_eff.root -v
```

The important differences here are:
- The `--sample` must include the bremsstrahlung category in the end. Those are `0brem`, `1brem_tag`, `1brem_probe` and `2brem`. In the future, we plan to join these four categories into two, because the relevant change is whether the probe electron has brem or not.
- The `--magnet` is labeled as `up` but it actually includes MagUp and MagDown.
- The `--histo-dir` must always be `/eos/lhcb/wg/rta/WP4/PIDCalib2_ElectronTables`, as that is where the efficiency tables are stored.
- The `--bin-vars` must always be `"{'P' : 'P'}"`. The binning used to compute the efficiencies is `{"e": {"P": [0,4375,8750,13125,17500,20625,23750,26875,30000,35000,40000,45000,50000,62500,75000,87500,100000]}}`.
- The `--ref-pars` have to match the values of the computed tables, that is: `"{DLLe: 0, 5}"` and `"{ProbNNe: 0.2, 0.8}"`.

## `plot_calib_distributions`

This tool allows you to plot distributions of variables in the calibration datasets. You can supply the same cuts and custom binnings that you would use for `make_eff_hists`. If you wish to plot a variable for which no binning exists, a uniform binning with 50 bins will be used. You can change the number of bins using `--bins` and force a uniform binning even if another binning is defined via `--force-uniform`.

A plot for every requested variable will be created in the `--output-dir` directory. The format of the plots can be controlled by `--format`. Furthermore, `plot_calib_distributions.pkl` will be saved in the same directory, containing all the histograms, should the user want to make the plots manually.

### Examples

- Create plots of the variables DLLK and P using 1 calibration file
  ```sh
  pidcalib2.plot_calib_distributions --sample Turbo18 --magnet up --particle Pi --bin-var DLLK --bin-var P --output-dir pidcalib_output --max-files 1
  ```
- Create PDF plots of variable P with 95 uniform bins
  ```sh
  pidcalib2.plot_calib_distributions --sample Turbo18 --magnet up --particle Pi --bin-var P --output-dir pidcalib_output --max-files 1 --format pdf --force-uniform --bins 95
  ```
- Create plots of variable P using custom binning
  ```sh
  pidcalib2.plot_calib_distributions --sample Turbo18 --magnet up --particle Pi --bin-var P --output-dir pidcalib_output --max-files 1 --format png --binning-file my_binning.json
  ```

## `pklhisto2root`

This tool converts pickled PIDCalib2 histograms to `TH*D` and saves them in a ROOT file. It can be used on histograms produced by `make_eff_hists` or `plot_calib_distributions`. Note that ROOT supports only 1-, 2-, and 3-dimensional histograms; attempting to convert higher-dimensional histograms will fail.

### Example

- Convert pickled boost_histograms from `make_eff_hists` to ROOT
  ```sh
  pidcalib2.pklhisto2root "pidcalib_output/effhists-Turbo18-up-Pi-DLLK>4-P.ETA.nSPDhits.pkl"
  ```
  This will translate the histograms and save them to `pidcalib_output/effhists-Turbo18-up-Pi-DLLK>4-P.ETA.nSPDhits.root`.

## Development

### With lb-conda

On machines where `lb-conda` is available, you may use the `pidcalib` environment for PIDCalib2 development. This is mainly useful for small modifications and only if you don't need to add any new dependencies.

1. Clone the repository from [GitLab](https://gitlab.cern.ch/lhcb-rta/pidcalib2)
2. Enter the PIDCalib2 directory
   ```sh
   cd pidcalib2
   ```
2. Start a new BASH shell within the `pidcalib` environment
```sh
lb-conda pidcalib bash
```
3. Run your *local* PIDCalib2 code
```sh
cd src
python -m pidcalib2.make_eff_hists -h
```

### Without lb-conda

This is a more versatile (if convoluted) method. It gives you full control of the dev environment and the ability to use IDEs, etc.

1. Clone the repository from [GitLab](https://gitlab.cern.ch/lhcb-rta/pidcalib2)
2. Enter the PIDCalib2 directory
   ```sh
   cd pidcalib2
   ```
3. (Optional) Set up a virtual environment
   ```sh
   python3 -m venv .venv
   source .venv/bin/activate
   ```
4. Install pinned dependencies
   ```sh
   pip install -r requirements-dev.txt
   ```
5. Install `xrootd` (possibly manually; see this [issue](https://github.com/xrootd/xrootd/issues/1397))
6. Run the tests
   ```sh
   pytest
   ```
7. Run the modules
   ```sh
   cd src
   python3 -m pidcalib2.make_eff_hists -h
   ```

### Tips

Certain tests can be excluded using markers like this
  ```sh
  pytest -m "not xrootd"
  ```
See available markers by running `pytest --markers` (the list will start with PIDCalib2 custom markers, then it will include all the pytest built-in markers).

## Links

- [PIDGen2](https://gitlab.cern.ch/lhcb-rta/pidgen2) - a tool to resample MC PID variables based on distributions from data calibration samples

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.cern.ch/lhcb-rta/pidcalib2",
    "name": "pidcalib2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Daniel Cervenkov",
    "author_email": "daniel.cervenkov@cern.ch",
    "download_url": "https://files.pythonhosted.org/packages/54/ed/548626caba7cd5a11fe21526b3a61a335c8927fecbf567f94c7a2b40eac0/pidcalib2-1.3.6.tar.gz",
    "platform": null,
    "description": "# PIDCalib2\n\nA set of software tools for estimating LHCb PID efficiencies.\n\nThe package includes several user-callable modules:\n- [`make_eff_hists`](#make_eff_hists) creates histograms that can be used to estimate the PID efficiency of a user's sample\n- [`ref_calib`](#ref_calib) calculates the LHCb PID efficiency of a user reference sample\n- `merge_trees` merges two ROOT files with compatible `TTree`s\n- [`plot_calib_distributions`](#plot_calib_distributions) allows you to plot distributions of variables in the calibration datasets\n- [`pklhisto2root`](#pklhisto2root) converts [Pickled](https://docs.python.org/3/library/pickle.html) [boost-histogram](https://github.com/scikit-hep/boost-histogram)s to ROOT histograms\n\nThe term \"reference dataset/sample\" refers to the user's dataset to which they want to assign PID efficiencies. The \"calibration datasets/samples\" are the special, centrally managed samples used internally by PIDCalib for PID efficiency estimation. The `--sample` argument always concerns these calibration samples.\n\nSlides with additional information, example output, and plots are available on [Indico](https://indico.cern.ch/event/1055804/contributions/4440878/attachments/2277451/3869206/Run12_210707_v2.pdf).\n\n## Setup\n\nWhen working on a computer where the LHCb software stack is available (LXPLUS, university cluster, etc.), one can setup PIDCalib2 by running\n```sh\nlb-conda pidcalib bash\n```\nAfter this, the following commands will be available\n```sh\npidcalib2.make_eff_hists\npidcalib2.ref_calib\npidcalib2.merge_trees\npidcalib2.plot_calib_distributions\npidcalib2.pklhisto2root\n```\nYou can skip the bash invocation and join the setup and run phases into a single command\n```sh\nlb-conda pidcalib pidcalib2.make_eff_hists\n```\nTo run `make_eff_hists`, you will need access to CERN EOS. You don't need to do anything special on LXPLUS. On other machines, you will usually need to obtain a Kerberos ticket by running\n```sh\nkinit [username]@CERN.CH\n```\n\n### Installing from PyPI\n\nThe PIDCalib2 package is available on [PyPI](https://pypi.org/project/pidcalib2/). It can be installed on any computer via `pip` simply by running (preferably in a virtual environment; see [`venv`](https://docs.python.org/3/library/venv.html))\n```sh\npip install pidcalib2\n```\nNote that this will install the [`xrootd`](https://pypi.org/project/xrootd/) *Python bindings*. One also has to install XRootD itself for the bindings to work. See [this page](https://xrootd.slac.stanford.edu/index.html) for XRootD releases and instructions.\n\n## `make_eff_hists`\n\nThis module creates histograms that can be used to estimate the PID efficiency of a user's sample.\n\nReading all the relevant calibration files can take a long time. When running a configuration for the first time, we recommend using the `--max-files 1` option. This will limit PIDCalib2 to reading just a single calibration file. Such a test will reveal any problems with, e.g., missing variables quickly. Keep in mind that you might get a warning about empty bins in the total histogram as you are reading a small subset of the calibration data. For the purposes of a quick test, this warning can be safely ignored.\n\n### Options\n\nTo get a usage message listing all the options, their descriptions, and default values, type\n```\npidcalib2.make_eff_hists --help\n```\n\nThe calibration files to be processed are determined by the `sample`, `magnet`, and `particle` options. All the valid combinations can be listed by running\n```sh\npidcalib2.make_eff_hists --list configs\n```\n\nAliases for standard variables are defined to simplify the commands. We recommend users use the aliases when specifying variables. When you use a name that isn't an alias, a warning message like the following will show up in the log.\n```\n'probe_PIDK' is not a known PID variable alias, using raw variable\n```\nAll aliases can be listed by running\n```sh\npidcalib2.make_eff_hists --list aliases\n```\nNote that there are many more variables than there are aliases. If you want to find a variable for which no alias exists, you can check one of the calibration files yourself. The paths to the calibration files are printed when the `--verbose` option is specified. Alternatively, you can simply guess the name - if it doesn't exist, PIDCalib2 will let you know and might provide a list of similar names that do exist.\n\nA file with alternative binnings can be specified using `--binning-file`. The file must contain valid JSON specifying bin edges. For example, two-bin binnings for particle `Pi`, variables `P` and `PT` can be defined as\n```json\n{\"Pi\": {\"P\": [10000, 15000, 30000], \"PT\": [6000, 10000, 20000]}}\n```\nAn arbitrary number of binnings can be defined in a single file.\n\nComplex cut expressions can be created by chaining simpler expressions using `&` (logical and) and `|` (logical or). One can also use standard mathematical symbols, like `*`, `/`, `+`, `-`, `(`, `)`. Whitespace does not matter.\n\n### Examples\n- Create a single 3D efficiency histogram for a single PID cut\n  ```sh\n  pidcalib2.make_eff_hists --sample Turbo18 --magnet up --particle Pi --pid-cut \"DLLK > 4\" --bin-var P --bin-var ETA --bin-var nSPDhits --output-dir pidcalib_output\n  ```\n\n- Create multiple histograms in one run (most of the time is spent reading\nin data, so specifying multiple cuts is much faster than running\nmake_eff_hists sequentially)\n  ```sh\n  pidcalib2.make_eff_hists --sample Turbo16 --magnet up --particle Pi --pid-cut \"DLLK > 0\" --pid-cut \"DLLK > 4\" --pid-cut \"DLLK > 6\" --bin-var P --bin-var ETA --bin-var nSPDhits --output-dir pidcalib_output\n  ```\n\n- Create a single efficiency histogram for complex cuts using only negatively charged tracks\n  ```sh\n  pidcalib2.make_eff_hists --sample Turbo18 --magnet up --particle Pi --pid-cut \"MC15TuneV1_ProbNNp*(1-MC15TuneV1_ProbNNpi)*(1-MC15TuneV1_ProbNNk) < 0.5 & DLLK < 3\" --cut \"IsMuon==0 & Brunel_PT>250 & trackcharge==-1\" --bin-var P --bin-var ETA --bin-var nSPDhits --output-dir pidcalib_output\n  ```\n\n### Caveats\n\nNot all datasets have all the variables, and in some cases, the same variable is named differently (e.g., `probe_Brunel_IPCHI2` is named `probe_Brunel_MINIPCHI2` in certain electron samples). The aliases correspond to the most common names, but you might need to check the calibration files if PIDCalib2 can't find the variable you need.\n\n## `ref_calib`\n\nThis module uses the histograms created by `make_eff_hists` to assign efficiency to events in a reference sample supplied by the user. Adding efficiency to the user-supplied file requires PyROOT and is optional.\n\nThe module works in two steps:\n\n1. Calculate the efficiency and save it as a TTree in a separate file.\n2. Optionally copy the efficiency TTree to the reference file and make it a friend of the user's TTree. The user must request the step by specifying `--merge` on the command line.\n\nBe aware that `--merge` will modify your file. Use with caution.\n\n### Options\n\nThe `sample` and `magnet` options are used solely to select the correct PID efficiency histograms. They should therefore mirror the options used when running `make_eff_hists`.\n\n`bin-vars` must be a dictionary that relates the binning variables (or aliases) used to make the efficiency histograms with the variables in the reference sample. We assume that the reference sample branch names have the format `[ParticleName]_[VariableName]`. E.g., `D0_K_calcETA`, corresponds to a particle named `D0_K` and variable `calcETA`. If the user wants to estimate PID efficiency of their sample using 1D binning, where `calcETA` corresponds to the `ETA` binning variable alias of the calibration sample, they should specify `--bin-vars '{\"ETA\": \"calcETA\"}'`.\n\n`ref-file` is the user's reference file to which they want to assign PID efficiencies. The parameter can be a local file or a remote file, e.g., on EOS (`--ref-file root://eoslhcb.cern.ch//eos/lhcb/user/a/anonymous/tuple.root`).\n\n`ref-pars` must be a dictionary of particles from the reference sample to apply cuts to. The keys represent the particle branch name prefix (`D0_K` in the previous example), and the values passed are a list containing particle type and PID cut, e.g. `'{\"D0_K\" : [\"K\", \"DLLK > 4\"], \"D0_Pi\" : [\"Pi\", \"DLLK < 4\"]}'`.\n\nThe `--merge` option will copy the PID efficiency tree to your input file and make the PID efficiency tree a \"Friend\" of your input tree. Then you can treat your input tree as if it had the PID efficiency branches itself. E.g., `input_tree->Draw(\"PIDCalibEff\")` should work. ROOT's \"Friend\" mechanism is an efficient way to add branches from one tree to another. Take a look [here](https://root.cern.ch/root/htmldoc/guides/users-guide/Trees.html#example-3-adding-friends-to-trees) if you would like to know more.\n\n### Examples\n- Evaluate efficiency of a single PID cut and save it to `user_ntuple_PID_eff.root` without adding it to `user_ntuple.root`\n  ```sh\n  pidcalib2.ref_calib --sample Turbo18 --magnet up --ref-file data/user_ntuple.root --histo-dir pidcalib_output --bin-vars '{\"P\": \"mom\", \"ETA\": \"Eta\", \"nSPDHits\": \"nSPDhits\"}' --ref-pars '{\"Bach\": [\"K\", \"DLLK > 4\"]}' --output-file user_ntuple_PID_eff.root\n  ```\n- Evaluate efficiency of a single PID cut and add it to the reference file `user_ntuple.root`\n  ```sh\n  pidcalib2.ref_calib --sample Turbo18 --magnet up --ref-file data/user_ntuple.root --histo-dir pidcalib_output --bin-vars '{\"P\": \"mom\", \"ETA\": \"Eta\", \"nSPDHits\": \"nSPDhits\"}' --ref-pars '{\"Bach\": [\"K\", \"DLLK > 4\"]}' --output-file user_ntuple_PID_eff.root --merge\n  ```\n- Evaluate efficiency of multiple PID cuts and add them to the reference file\n  ```sh\n  pidcalib2.ref_calib --sample Turbo18 --magnet up --ref-file data/user_ntuple.root --histo-dir pidcalib_output --bin-vars '{\"P\": \"P\", \"ETA\": \"ETA\", \"nSPDHits\": \"nSPDHits\"}' --ref-pars '{\"Bach\": [\"K\", \"DLLK > 4\"], \"SPi\": [\"Pi\", \"DLLK < 0\"]}' --output-file user_ntuple_PID_eff.root --merge\n  ```\n\n### Caveats\n\nYou might notice that some of the events in your reference sample are assigned `PIDCalibEff`, `PIDCalibErr`, or both of -999.\n- `PIDCalibEff` is -999 when for at least one track\n  - The event is out of binning range\n  - The relevant bin in the efficiency histogram has no events whatsoever\n  - The efficiency is negative\n- `PIDCalibErr` is -999 when for at least one track\n  - The event is out of binning range\n  - The relevant bin in the efficiency histogram has no events whatsoever\n  - The relevant bin in the efficiency histogram has no events passing PID cuts\n  - The efficiency is negative\n\nBecause of `double` \u2192 `float` conversion in the original PIDCalib, tiny discrepancies (<1e\u22123 relative difference) in the efficiencies and/or uncertainties are to be expected.\n\nA bug in the original PIDCalib caused the electron calibration datasets to be read twice, resulting in incorrect efficiency map uncertainties.\n\nThe original PIDCalib didn't apply the correct cuts to Omega samples (`K_Omega` and `K_DD`), leading to non-sensical efficiency maps.\n\n### Electrons\n\nTo use the efficiency tables for electrons in 2024, one should run a command similar to:\n\n```sh\npidcalib2.ref_calib  --sample 2024_WithUT_block1_Tables_2brem --magnet up --ref-file \"data/user_ntuple.root\" --histo-dir /eos/lhcb/wg/rta/WP4/PIDCalib2_ElectronTables --bin-vars \"{'P' : 'P'}\" --ref-pars '{\"eprobe\": [\"e\", \"ProbNNe > 0.2\"]}' --output-file user_ntuple_PID_eff.root -v\n```\n\nThe important differences here are:\n- The `--sample` must include the bremsstrahlung category in the end. Those are `0brem`, `1brem_tag`, `1brem_probe` and `2brem`. In the future, we plan to join these four categories into two, because the relevant change is whether the probe electron has brem or not.\n- The `--magnet` is labeled as `up` but it actually includes MagUp and MagDown.\n- The `--histo-dir` must always be `/eos/lhcb/wg/rta/WP4/PIDCalib2_ElectronTables`, as that is where the efficiency tables are stored.\n- The `--bin-vars` must always be `\"{'P' : 'P'}\"`. The binning used to compute the efficiencies is `{\"e\": {\"P\": [0,4375,8750,13125,17500,20625,23750,26875,30000,35000,40000,45000,50000,62500,75000,87500,100000]}}`.\n- The `--ref-pars` have to match the values of the computed tables, that is: `\"{DLLe: 0, 5}\"` and `\"{ProbNNe: 0.2, 0.8}\"`.\n\n## `plot_calib_distributions`\n\nThis tool allows you to plot distributions of variables in the calibration datasets. You can supply the same cuts and custom binnings that you would use for `make_eff_hists`. If you wish to plot a variable for which no binning exists, a uniform binning with 50 bins will be used. You can change the number of bins using `--bins` and force a uniform binning even if another binning is defined via `--force-uniform`.\n\nA plot for every requested variable will be created in the `--output-dir` directory. The format of the plots can be controlled by `--format`. Furthermore, `plot_calib_distributions.pkl` will be saved in the same directory, containing all the histograms, should the user want to make the plots manually.\n\n### Examples\n\n- Create plots of the variables DLLK and P using 1 calibration file\n  ```sh\n  pidcalib2.plot_calib_distributions --sample Turbo18 --magnet up --particle Pi --bin-var DLLK --bin-var P --output-dir pidcalib_output --max-files 1\n  ```\n- Create PDF plots of variable P with 95 uniform bins\n  ```sh\n  pidcalib2.plot_calib_distributions --sample Turbo18 --magnet up --particle Pi --bin-var P --output-dir pidcalib_output --max-files 1 --format pdf --force-uniform --bins 95\n  ```\n- Create plots of variable P using custom binning\n  ```sh\n  pidcalib2.plot_calib_distributions --sample Turbo18 --magnet up --particle Pi --bin-var P --output-dir pidcalib_output --max-files 1 --format png --binning-file my_binning.json\n  ```\n\n## `pklhisto2root`\n\nThis tool converts pickled PIDCalib2 histograms to `TH*D` and saves them in a ROOT file. It can be used on histograms produced by `make_eff_hists` or `plot_calib_distributions`. Note that ROOT supports only 1-, 2-, and 3-dimensional histograms; attempting to convert higher-dimensional histograms will fail.\n\n### Example\n\n- Convert pickled boost_histograms from `make_eff_hists` to ROOT\n  ```sh\n  pidcalib2.pklhisto2root \"pidcalib_output/effhists-Turbo18-up-Pi-DLLK>4-P.ETA.nSPDhits.pkl\"\n  ```\n  This will translate the histograms and save them to `pidcalib_output/effhists-Turbo18-up-Pi-DLLK>4-P.ETA.nSPDhits.root`.\n\n## Development\n\n### With lb-conda\n\nOn machines where `lb-conda` is available, you may use the `pidcalib` environment for PIDCalib2 development. This is mainly useful for small modifications and only if you don't need to add any new dependencies.\n\n1. Clone the repository from [GitLab](https://gitlab.cern.ch/lhcb-rta/pidcalib2)\n2. Enter the PIDCalib2 directory\n   ```sh\n   cd pidcalib2\n   ```\n2. Start a new BASH shell within the `pidcalib` environment\n```sh\nlb-conda pidcalib bash\n```\n3. Run your *local* PIDCalib2 code\n```sh\ncd src\npython -m pidcalib2.make_eff_hists -h\n```\n\n### Without lb-conda\n\nThis is a more versatile (if convoluted) method. It gives you full control of the dev environment and the ability to use IDEs, etc.\n\n1. Clone the repository from [GitLab](https://gitlab.cern.ch/lhcb-rta/pidcalib2)\n2. Enter the PIDCalib2 directory\n   ```sh\n   cd pidcalib2\n   ```\n3. (Optional) Set up a virtual environment\n   ```sh\n   python3 -m venv .venv\n   source .venv/bin/activate\n   ```\n4. Install pinned dependencies\n   ```sh\n   pip install -r requirements-dev.txt\n   ```\n5. Install `xrootd` (possibly manually; see this [issue](https://github.com/xrootd/xrootd/issues/1397))\n6. Run the tests\n   ```sh\n   pytest\n   ```\n7. Run the modules\n   ```sh\n   cd src\n   python3 -m pidcalib2.make_eff_hists -h\n   ```\n\n### Tips\n\nCertain tests can be excluded using markers like this\n  ```sh\n  pytest -m \"not xrootd\"\n  ```\nSee available markers by running `pytest --markers` (the list will start with PIDCalib2 custom markers, then it will include all the pytest built-in markers).\n\n## Links\n\n- [PIDGen2](https://gitlab.cern.ch/lhcb-rta/pidgen2) - a tool to resample MC PID variables based on distributions from data calibration samples\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3 (GPLv3)",
    "summary": "A set of tools for estimating LHCb PID efficiencies",
    "version": "1.3.6",
    "project_urls": {
        "Bug Tracker": "https://gitlab.cern.ch/lhcb-rta/pidcalib2/issues",
        "Homepage": "https://gitlab.cern.ch/lhcb-rta/pidcalib2"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c70027d9f52bb2f7db5e8261fd49a3baa6bef6a78c3be020cedcc57a42c9748",
                "md5": "620779e8c9cca494ae727dbc42ba1819",
                "sha256": "6d53cbc1f70190a8156dd8e402db1d58d8398340713a674bdf7d1f94eec055bf"
            },
            "downloads": -1,
            "filename": "pidcalib2-1.3.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "620779e8c9cca494ae727dbc42ba1819",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 1821394,
            "upload_time": "2025-02-19T13:18:52",
            "upload_time_iso_8601": "2025-02-19T13:18:52.349339Z",
            "url": "https://files.pythonhosted.org/packages/9c/70/027d9f52bb2f7db5e8261fd49a3baa6bef6a78c3be020cedcc57a42c9748/pidcalib2-1.3.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "54ed548626caba7cd5a11fe21526b3a61a335c8927fecbf567f94c7a2b40eac0",
                "md5": "4c52e48f2d51991b451e95f13caef574",
                "sha256": "e0937d21d38e42a07ec1ea31c7b8dcd445cd6f858a697df6fe3b7680aead7457"
            },
            "downloads": -1,
            "filename": "pidcalib2-1.3.6.tar.gz",
            "has_sig": false,
            "md5_digest": "4c52e48f2d51991b451e95f13caef574",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 1965047,
            "upload_time": "2025-02-19T13:18:54",
            "upload_time_iso_8601": "2025-02-19T13:18:54.618230Z",
            "url": "https://files.pythonhosted.org/packages/54/ed/548626caba7cd5a11fe21526b3a61a335c8927fecbf567f94c7a2b40eac0/pidcalib2-1.3.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-19 13:18:54",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pidcalib2"
}

Daniel Cervenkov