Segmentation fault happening about 50% of the times when launching restManager

REST version : v2.2.11
REST commit : 0fbd8405

I get the following output error when launching restManager, the strange thing is that only happens about 50% of the times. It happens at the end of the processes execution.

Anyone could reproduce a similar problem?

In my case, I am launching restManager --c axionGenerator.rml from the RestAxionLib repository, branch 2.2.12.

              [==                          TRestProcessRunner: 1000 processed events                           ==]              
-- Info : Total processing time : 507.428 ms                                                        
-- Info : Average read time from disk (per event) : 0.23639 ms                                      
-- Info : Average process time (per event) : 0.267183 ms                                            
-- Info : Average write time to disk (per event) : 0.003855 ms                                      
====================================================================================================
                                     Configuring output file, merging thread files together                                     
-- Info : Creating file : /Users/javi/git/rest/libraries/axion/examples/./Run_[fExperimentName].root

 *** Break *** segmentation violation
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[<unknown binary>] (no debug info)
[/usr/local/lib/libRIO.so] TBufferFile::WriteFastArray(char const*, int) /Users/javi/git/root/io/io/src/TBufferFile.cxx:1968
[/usr/local/lib/libRIO.so] TGenCollectionStreamer::WriteObjects(int, TBuffer&) /Users/javi/git/root/io/io/src/TGenCollectionStreamer.cxx:978
[/usr/local/lib/libRIO.so] TCollectionStreamer::Streamer(TBuffer&, void*, int, TClass*) /Users/javi/git/root/build/include/TVirtualCollectionProxy.h:67
[/usr/local/lib/libRIO.so] TBufferFile::WriteFastArray(void*, TClass const*, int, TMemberStreamer*) /Users/javi/git/root/io/io/src/TBufferFile.cxx:2255
[/usr/local/lib/libRIO.so] int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) /Users/javi/git/root/io/io/src/TStreamerInfoWriteBuffer.cxx:0
[/usr/local/lib/libRIO.so] TStreamerInfoActions::GenericWriteAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) /Users/javi/git/root/io/io/src/TStreamerInfoActions.cxx:192
[/usr/local/lib/libRIO.so] TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iterator:1390
[/usr/local/lib/libRIO.so] TBufferFile::WriteClassBuffer(TClass const*, void*) /Users/javi/git/root/io/io/src/TBufferFile.cxx:3536
[/Users/javi/git/rest/install/lib/libRestAxion.dylib] TRestAxionMagneticField::Streamer(TBuffer&) (no debug info)
[/usr/local/lib/libRIO.so] TKey::TKey(TObject const*, char const*, int, TDirectory*) /Users/javi/git/root/io/io/src/TKey.cxx:252
[/usr/local/lib/libRIO.so] TFile::CreateKey(TDirectory*, TObject const*, char const*, int) /Users/javi/git/root/io/io/src/TFile.cxx:1013
[/usr/local/lib/libRIO.so] TDirectoryFile::WriteTObject(TObject const*, char const*, char const*, int) /Users/javi/git/root/io/io/src/TDirectoryFile.cxx:0
[/usr/local/lib/libCore.so] TObject::Write(char const*, int, int) const /Users/javi/git/root/core/base/src/TObject.cxx:0
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestMetadata::Write(char const*, int, int) (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestRun::WriteWithDataBase(int, bool) (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestRun::FormOutputFile(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestProcessRunner::ConfigOutputFile() (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestProcessRunner::RunProcess() (no debug info)
[<unknown binary>] (no debug info)
[/usr/local/lib/libCling.so] TClingCallFunc::exec(void*, void*) /Users/javi/git/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:88
[/usr/local/lib/libCling.so] TCling::Execute(TObject*, TClass*, char const*, char const*, bool, int*) /Users/javi/git/root/interpreter/llvm/src/include/llvm/ADT/SmallVector.h:115
[/usr/local/lib/libCling.so] TCling::Execute(TObject*, TClass*, char const*, char const*, int*) /Users/javi/git/root/core/metacling/src/TCling.cxx:4759
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestTask::RunTask(TRestManager*) (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestManager::ReadConfig(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, TiXmlElement*) (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestManager::InitFromConfigFile() (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestMetadata::LoadConfigFromFile(TiXmlElement*, TiXmlElement*, std::__1::vector<TiXmlElement*, std::__1::allocator<TiXmlElement*> >) (no debug info)
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestMetadata::LoadConfigFromFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (no debug info)
[/Users/javi/git/rest/install/bin/restManager] main (no debug info)
[/usr/lib/system/libdyld.dylib] start (no debug info)

Looks that the error is from TRestRun::WriteWithDataBase(). I don’t know which metadata class gets failed. Can you provide more details?

It is TRestAxionMagneticField from RestAxionLib.

I can reproduce the problem in a ROOT session.

TFile *f = new TFile( "out.root", "RECREATE" );
TRestAxionMagneticField *mF = new TRestAxionMagneticField("metadata.rml");
mF->Write("test");

will produce the following error output. Any clue where the problem is coming from?

root [2] mF->Write("test")

 *** Break *** segmentation violation
[/usr/lib/system/libsystem_platform.dylib] _sigtramp (no debug info)
[<unknown binary>] (no debug info)
[/usr/local/lib/libRIO.so] TBufferFile::WriteFastArray(char const*, int) /Users/javi/git/root/io/io/src/TBufferFile.cxx:1968
[/usr/local/lib/libRIO.so] TGenCollectionStreamer::WriteObjects(int, TBuffer&) /Users/javi/git/root/io/io/src/TGenCollectionStreamer.cxx:978
[/usr/local/lib/libRIO.so] TCollectionStreamer::Streamer(TBuffer&, void*, int, TClass*) /Users/javi/git/root/build/include/TVirtualCollectionProxy.h:67
[/usr/local/lib/libRIO.so] TBufferFile::WriteFastArray(void*, TClass const*, int, TMemberStreamer*) /Users/javi/git/root/io/io/src/TBufferFile.cxx:2255
[/usr/local/lib/libRIO.so] int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) /Users/javi/git/root/io/io/src/TStreamerInfoWriteBuffer.cxx:0
[/usr/local/lib/libRIO.so] TStreamerInfoActions::GenericWriteAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) /Users/javi/git/root/io/io/src/TStreamerInfoActions.cxx:192
[/usr/local/lib/libRIO.so] TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iterator:1390
[/usr/local/lib/libRIO.so] TBufferFile::WriteClassBuffer(TClass const*, void*) /Users/javi/git/root/io/io/src/TBufferFile.cxx:3536
[/Users/javi/git/rest/install/lib/libRestAxion.dylib] TRestAxionMagneticField::Streamer(TBuffer&) (no debug info)
[/usr/local/lib/libRIO.so] TKey::TKey(TObject const*, char const*, int, TDirectory*) /Users/javi/git/root/io/io/src/TKey.cxx:252
[/usr/local/lib/libRIO.so] TFile::CreateKey(TDirectory*, TObject const*, char const*, int) /Users/javi/git/root/io/io/src/TFile.cxx:1013
[/usr/local/lib/libRIO.so] TDirectoryFile::WriteTObject(TObject const*, char const*, char const*, int) /Users/javi/git/root/io/io/src/TDirectoryFile.cxx:0
[/usr/local/lib/libCore.so] TObject::Write(char const*, int, int) const /Users/javi/git/root/core/base/src/TObject.cxx:0
[/Users/javi/git/rest/install/lib/libRestCore.dylib] TRestMetadata::Write(char const*, int, int) (no debug info)
[<unknown binary>] (no debug info)
[/usr/local/lib/libCling.so] cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) (no debug info)
[/usr/local/lib/libCling.so] cling::Interpreter::EvaluateInternal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) (no debug info)
[/usr/local/lib/libCling.so] cling::Interpreter::process(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) (no debug info)
[/usr/local/lib/libCling.so] cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) (no debug info)
[/usr/local/lib/libCling.so] HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) /Users/javi/git/root/core/metacling/src/TCling.cxx:2172
[/usr/local/lib/libCling.so] TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) /Users/javi/git/root/core/metacling/src/TCling.cxx:0
[/usr/local/lib/libRint.so] TRint::ProcessLineNr(char const*, char const*, int*) /Users/javi/git/root/core/rint/src/TRint.cxx:746
[/usr/local/lib/libRint.so] TRint::HandleTermInput() /Users/javi/git/root/core/rint/src/TRint.cxx:608
[/usr/local/lib/libCore.so] TUnixSystem::CheckDescriptors() /Users/javi/git/root/core/unix/src/TUnixSystem.cxx:1324
[/usr/local/lib/libCore.so] TMacOSXSystem::DispatchOneEvent(bool) /Users/javi/git/root/core/macosx/src/TMacOSXSystem.mm:378
[/usr/local/lib/libCore.so] TSystem::InnerLoop() /Users/javi/git/root/core/base/src/TSystem.cxx:413
[/usr/local/lib/libCore.so] TSystem::Run() /Users/javi/git/root/core/base/src/TSystem.cxx:363
[/usr/local/lib/libCore.so] TApplication::Run(bool) /Users/javi/git/root/core/base/src/TApplication.cxx:1184
[/usr/local/lib/libRint.so] TRint::Run(bool) /Users/javi/git/root/core/rint/src/TRint.cxx:463
[/Users/javi/git/rest/install/bin/restRoot] main (no debug info)
[/usr/lib/system/libdyld.dylib] start (no debug info)

Ok, the problem seems solved now.

We had some members defined like this:

Double_t var1;
Double_t var2;
...

When I added //< behind, everything was fixed.

Double_t var1; //<
Double_t var2; //<
...

I didn’t know this was so important, and that if I skip doing it, it will just take the default as //<.

Wow, didn’t know this was of such importance!

Maybe it is not the //< symbol that makes difference.

Now I remember it. I also engaged this strange problem in the previous time. It may due to the mismatch of the file timestamp between two operation systems. Sometimes when you updated your header file and recompile with make, CINT deson’t re-run. Then the program still use the old streamer method for the new class. And yes, the segmentation fault occurs in the Write() method.

The solution is to clear the build directory and compile all the things again.

1 Like

Ok, it happened again to me, the problem is not coming from //<. When I removed the build directory and re-compiled from scratch the problem was solved.

Now, I am trying to modify the header, adding or removing members, and trying to reproduce the problem. But I cannot get back to the unstable segmentation fault.

Perhaps, increasing the version in ClassDef will force CINT to re-run?

However, I always compile in the same system, do you mean between two operating systems, as MacOs,Windows,Linux? where it is the system timestamp being taken into account?

Then it is not the case. Anthor possiblity is that you terminate with ctrl-c during a -jN compliation process. I am just guessing…

Ok, that might happen to me, but I would have not thought about. I will be watching.

So, finally it seems the problem it was still there, @eve95, and it is not related with make, or cleaning properly the build directory.

The problem is connected with adding a Garfield object to the members of the class. Inside TRestAxionMagneticField we are placing the following member, which is not stored on the streamer

 60 #ifdef USE_Garfield
 61     Garfield::Sensor* fSetOfField;  //!
 62 #endif

I imagine the problem is due to the fact that the Sensor type is not recognised by reflection methods?

If I place the Garfield member at the bottom, as I did in the following commit, the problem is solved!

However, this is not good, we cannot control users place unidentified types at the bottom of the class.

1 Like

if you commit the object with //!, this object will indeed not be recognized by the reflection methods. But I believe REST’s reflection methods have nothing to do with object saving. So I still think the problem is during the make process. If you move the

Garfield::Sensor* fSetOfField;

line back to the old place, will the problem occour again?

I am 100% sure that placing the Garfield::Sensor* fSetOfField; in first position inside the members of the class will cause restManager --c axionGenerator.rml to crash about 50% of the times.

Placing it the latest will avoid the segmentation fault.

I tried several times to move up/first and down/last that member, and then I though what happens in between, and surprise! If I put it after TVector3 there is no segmentation fault. If I put it before TVector3 there is a segmentation fault.

This will not produce a seg.fault.

std::vector<TVector3> fPositions; //<;
#ifdef USE_Garfield
Garfield::Sensor* fSetOfField; //!
#endif
std::vector<TString> fFileNames; //<;

This will produce a seg.fault:

#ifdef USE_Garfield
Garfield::Sensor* fSetOfField; //!
#endif
std::vector<TVector3> fPositions; //<;
std::vector<TString> fFileNames; //<;
1 Like

I have another hypothesis.

You added #ifdef USE_Garfield mark in the class definition, but CINT never knows this definition, so it will omit this class member. Therefore, in view of CINT, the class members are:

A a;
B b;
...

but actually

A a;
Garfield::Sensor* fSetOfField;
B b;
...

CINT thinks it shall save the first and the second class member, but actually a and b are first and thrid class member. This causes segmentation fault.

To test this, you can try to add //! for all the data members after Garfield::Sensor* fSetOfField;. Then the problem should gone.

In TRestGas I don’t use the #ifdef USE_Garfield mark in the header. The alternative is to use forward declaration, which I think is more elegant, and also prevents the problem. It is in line 52 in

It is better because the class structure maintains unchanged when you switch on/off the garfield dependent compilation.

1 Like

Ok, that explains the problem, thanks!