RESTFramework: Usage of restMergeFiles

Cogollos · September 5, 2019, 9:52am

REST version : v2.2.12
REST commit : 2af7a44e

Hello,
I am working with a huge number of small files resulting from the same restG4 rml file. Each one of these generally contains from zero to two recorded events (since the interaction probability is very small). I would like to know if restMergeFiles can be used for combining them into a single file and, if possible, how to do it.
I’m sorry if this is written somewhere, but I searched both the forum and the REST Framework webpage and didn’t find any documentation regarding this.

nkx · September 5, 2019, 11:41am

restMergeFiles "*.root" output.root
Note that the quote must be added

Cogollos · September 5, 2019, 11:58am

Hi, thanks.
Actually, I tested a bunch of containers, but completely forgot about quotes. From the code at REST_MergeFiles.C I guess this is general syntax for list input, is it? (Just to know for future cases)

jgalan · September 5, 2019, 1:19pm

For me, the idea behind is to avoid merging files in a future. And to have the proper tools to visualise/analyse/process data from different files into one. Such as TRestAnalysisPlots for plotting several files.

I am not sure about merging different files, from restG4, and then post-process them with other REST processes. This should not be the case, because each file is identified with a run number, and if merged, several run numbers are into one single file.

If possible, I would increase the simulation time of each restG4 launch, multiplying the number of simulated events by 100. That way you will get 100 files with about 100 events in each file, 10k events in your ROI is a reasonable number to get a 1% statistical error.

Then, for 100 files is not hard, for the moment, to have a script/shell that launches restManager on those files.

restManager --c g4Ana.rml --f file$runNumber$.root

Then, generate a <TRestAnalysisPlot> with common variables.

In future, we will have a tool that makes it more interactive.

Cogollos · September 5, 2019, 2:19pm

Hi Javier, the problem is that the amount of initial events is already large.
I’m running simulations for “reaaaally low background” contamination, through a thick lead shielding (this is for the electronics placement at babyIAXO). This means the amount of initial events I need for getting statistics, in the desired energy region, ranges from 250M to 500M, depending on the isotope.
I’m running these simulations for 8 different isotopes coming from 5 different setup positions. So I end up with 40 different restG4 runs of 250M-500M initial events each for every complete setup I want to test. Which normally takes around 2 weeks.
I was doing this with 100M-event files a couple months ago, because I was using the queuing system at cierzo. But, since I moved to v2.2.10 (and newer versions) I had to drop cierzo, and with it the queuing system and I had to start running this simulations via ssh (since I cannot run them locally in reasonable time). Maintaining an ssh connection for the time this long simulations take has been proved mostly unfeasible (due to network instability issues in our offices), and that’s why I decided to chop them down to 100k event simulations, later merging the resulting files.
I have also thought about the restManager approach you mention, but I remembered restManager having problems with files containing 0 events with energy deposited (which happens for most of my files). If it works, then I can try doing it this way.

This being said, it could be that I’m going through a lot of unnecessary struggle because I’m not really experienced with these kind of hundreds-of-millions-of-events simulations. If you see that I’m doing something stupid and there’s an easy way to solve this, please just let me know.

jgalan · September 6, 2019, 7:40am

The simulation of radiation transfer in a thick shielding is computationally expensive. The problem should be considered as 2 independent problems. First, one calculates the gamma transfer through the shielding using the initial generator you need, whatever isotopes. Second, you get the resulting output spectrum from that first simulation and plug it into a second simulation that registers the hits in your detector. The first simulation should also contain few shortcuts to speed up the simulation process, such as using propagation through several layers of the lead. This was discussed already with @lobis and it might evolve to an independent rest program restG4Transfer that performs the first simulation.

Of course, this is strongly dependent on the geometry of your simulations. For PandaX-III CDR we used a biasing technique to simulate the gamma transfer through the water shielding.

How the version of REST is connected with that?

I believe there are ways to execute a program, and keep the program running even if you close your ssh session. But of course, the appropriate is to use the queuing system of the cluster.

Cogollos · September 6, 2019, 8:19am

I will try to explore the possibility of dividing it into two different tasks, as you point.

I am sharing a common user with a common bash with other people from Zaragoza and I don’t know if I can add a new version of REST only for me.

I guess it should be, but now I don’t have much time to invest diving deeper into it.

jgalan · September 6, 2019, 9:36am

Use —> Option 1: nohup given in second answer of this post