Related pull request with the code: Add `TRestExpressionEvaluationProcess` by Vindaar · Pull Request #25 · rest-for-physics/framework · GitHub
Over the last few days I’ve built something that as far as I understand fills a niche I’ve heard talked about multiple times. Namely: “do what TRestAnalysisPlot
does, but keep the results in the tree”. Or in more concrete terms: Evaluate arbitrary expressions yielding boolean or float values for each event in a tree and creating new observables from said values based on strings stored in an RML file.
I wrote a small standalone header only library to handle the actual parsing and evaluation of the given strings, because that is functionality that is completely independent of REST and thus there is no need to force it into a REST process or similar. There are of course much more fancy similar libraries out there, but I thought it would be better to write our own to not have another real external dependency (I’ve put the code into the external
dependency more for convenience and being unsure where to place it than anything else). Aside from that there is the fact that once the ROOT dataframe implementation is wrapped into REST all of this becomes obsolete anyways.
Since we are stuck on C++11 I had to implement a basic Either
type in the library. std::variant
would be a reasonable choice, but that was only added in C++17.
Hopefully this can be of use to some people. At the moment it may be a bit brittle around the edges. There are test cases, but they could be
Explanation taken from the docstring in TRestExpressionEvaluationProcess.cxx
Both floating point as well as boolean expressions are supported. For boolean results the data is stored as integers for better compatibility with other boolean variables used in REST.
The return type is determined automatically from the expression.
The expression strings are defined within a special <expressionset>
tag in the RML file under this process to allow for iteration over all child tags. Within that tag an arbitrary number of <item>
tags can be added, which must have a name
and an expr
field. The name
field is the name of the resulting observable. The expr
field is similar to the syntax for ROOT cut strings given to TTree::draw
to perform filtering on a tree before drawing it with a few small differences (see below).
<addProcess type="TRestExpressionEvaluationProcess" name="my custom expression" value="ON" verboseLevel="silent">
<expressionset>
<!-- a simple, constant boolean expression -->
<item name="boolExpr" expr="5<10"/>
<!-- a simple, constant float expression -->
<item name="floatExpr" expr="5 * 10"/>
<!-- a float expression using an existing REST observable -->
<Item name="floatExprObs" expr="hitsAna_energy / 1000.0"/>
<!-- a bool expression using an existing REST observable -->
<item name="boolExprObs" expr="hitsAna_energy < 5000.0"/>
<!-- a bool expression of multiple statements combined by an `and` -->
<item name="boolAndExprMultiple" expr="hitsAna_energy < 5000.0 and tckAna_nTracks_X==1"/>
<!-- a bool expression of multiple statements combined by an `or` -->
<item name="boolOrExprMultiple" expr="hitsAna_energy < 5000.0 or tckAna_nTracks_X==1"/>
<!-- a complicated expression of nested parenthesis and multiple expressions -->
<item name="boolComplicated" expr="(hitsAna_energy / 1000.0) < 5.0 or (tckAna_nTracks_X==1 and tckAna_nTracks_Y==1)"/>
</expressionset>
</addprocess>
The main difference to ROOT cut strings are the two facts that 1. the strings are not limited to boolean expressions and 2. that the boolean operatiors &&
and ||
are instead refered to and
and or
respectively. The latter is due to the fact that &
is an invalid character in XML and thus parsing them is broken (or the user has to write &
, which is kind of unaccptable). Each expression is stored as REST metadata in form of a lisp-like representation of the input expression. For example the last boolComplicated
example is represented as:
(|| (< (/ hitsAna_energy 1000.0)) (&& (== tckAna_nTracks_X 1) (== tckAna_nTracks_Y 1)))
Final words
Note that this process does not perform any kind of filtering or similar. It only creates masks (for boolean expressions) or computes new float values. It is up to the user to combine this with e.g. TRestAnalysisPlot
to allow plotting of filtered data or complex expressions.
In addition due to the inherent runtime evaluated nature, it is always going to be slower than a native REST process. Due to this it should not be overly abused. Computations that are done a lot (i.e. in a systematic fashion over many RML files) should be implemented natively. It does however provide a way to quickly gleam insights into the data (and store the applied transformations as metadata) without having to write ROOT macro or even a REST process.
Finally, some functionality has not been implemented so far, namely application of pre-defined mathematical functions (e.g. sqrt
, exp
etc.) and computing powers (e.g. myObservable^2
). These can be added rather easily, if this process is something that is considered of value.
(btw: why the heck are people running long running simulation jobs on sultan? Isn’t there a proper cluster for this kind of thing?)