ROOTANA
|
The new MIDAS analyzer was written to combine the good ideas from the existing analyzers and to correct some of the known problems:
TBW - explain creating of analyzer:
TBW - explain running of analyzer:
TBW - explain that all histograms are automatically exported to web browser
TBW
An analyzer for a non-trivial experiment may be quite complicated. To manage this complexity one can arrange the code in independant analyzer modules. To communicate results between modules one can use ordinary C++ coding or one can use the mechanism of flow objects described in the next section.
A typical analyzer module may perform several duties:
An analyzer may be used to process just one data file, a sequence of data files from the same run (subrun files) or several different runs.
To correctly manage the lifetime (creation and destructions) of all data objects, the analyzer uses "run objects".
A "run object" holds all the data (histograms, canvases, c++ structures, etc) for the run currently being processed. these data are created when the run starts, and are destroyed when the run ends, encouraging a coding style where pointers to deleted objects will not be accidentally kept and used, leading to memory corruption and crashes.
The analyzer framework manages (creates and destroys) run objects using the factory pattern. It the typical case, the user run object is connected to the framework by creating a TAFactory object (written explicitely or using the TAFactoryTemplate<T>) and passing it to the TARegister object via static initialization.
Previous analyzer frameworks did not use this type of "run object" to manage per-run data, and encouraged the use of global variables for ROOT file objects, for ROOT histograms, etc. Together with ROOT's idiosyncratic emory management, where some ROOT objects "live" inside in normal memory and behave like normal C++ objects while other ROOT objects live "inside ROOT file" objects and "vanish" when the ROOT file is closed, made it very easy to write analyzers that crash at the end of run or crash when switching from one run to the next. (a big problem when writing online analyzers).
An analyzer for a non-trivial experiment may have several analyzer modules performing different tasks separately (an ADC module may unpack and calibrate ADC data, a TDC module may unpack and sort TDC data). Result of these modules is often C++ classes (array of ADC pulse heights, array of TDC hit times). To pass these C++ objects to the next analyzer module where these data can be ombined together, one can use the "flow event". (as the bank structure of MIDAS events is inconvinient or handling C++ objects).
C++ objects that will be passed between modules should extend the class TAFlowEvent (as demonstrated by Object1 and Object2 in manalyzer_example_flow.cxx). The TAFlowEvent object maintains a simple linked list of all flow objects. Each module Analyze() method has access to all existing flow objects and can add new flow objects as desired. The flow event and all the flow objects are automatically deleted after the last analyzer module Analyze() method is completed.
Here is some examples taken from manalyzer_example_flow.cxx:
If desired, one can use the function AnalyzeFlowEvent() to separate the analysis of flow events from the analysis of MIDAS events. For example, one can unpack MIDAS data banks into C++ structures stored in a flow event in the function Analyze() of one module and process the data and fill the histograms in the function AnalyzeFlowEvent() of a different module (separate data unpacking module and data analysis module).
Some experiments may have multiple physics events stored inside a single MIDAS event.
To handle this situation in the manalyzer, one would implement each physics event as a flow event. Then one would have the data unpacker module process the MIDAS event (in the Analyze() method), unpack all the multiple physics events into flow events and queue the flow events for further analysis by calling the runinfo->AddToFlowQueue() method. For example, like this:
In the data unpacker Analyze() method: while (1) { TAFlowEvent*e = unpack_next_physics_event(midas_event); runinfo->AddToFlowQueue(e); }
After manalyzer finishes processing the current midas event, it will proceed with processing the queued flow events. Each queued flow event is processed the same way as normal midas events, except that the Analyze() method is not called (there is no midas event!), so only the AnalyzeFlowEvent() method will be used. The flags work the same way, and one can chain additional flow objects to the flow event as it passes from one module to the next. At the very end, the flow event is automatically deleted.
After all queued flow events are processed, manalyzer will continue with processing the next midas event.
In the multithreaded mode, the flow event queue works slightly differently: instead of using a special flow event queue, flow events are passed directly to the multithreading system.
The flow event queue can also be used to process any events remaining buffered or queued after the last MIDAS event was processed by using the PreEndRun() method as described in the next section.
Sometimes physics events need to be generated and processed at the end of a run after all midas events have already been processed (after the last call to Analyze()), but it is too late to do this in the final EndRun() call.
This happens when MIDAS events contain a continuous stream of data and the stream unpacker has to maintain a buffer of incomplete data between Analyze() calls.
This also happens when the analyzer contains an event builder component which may contain a buffer for incomplete or pending physics events.
To ensure that all of this buffered data is analyzed and no unprocessed data is left behind, use the PreEndRun() method.
The PreEndRun() method is called after all MIDAS events have been processed but before the final EndRun().
It gives the data unpacker or the event builder module an opportunity to unpack the remaining physics events into flow events and queue them for analysis by calling runinfo->AddToFlowQueue().
After calling the PreEndRun() method for all modules, the accumulated flow events are processed by calling the AnalyzeFlowEvent() method of each module, as described in the section about the flow event queue.
The user analysis code in the Analyze() method can influence data processing by the manalyzer framework by manipulating the TAFlags:
The "run info" object has the information about the currently analyzed run - run number, file name, etc.
This object is created when a new run is started (before calling the first BeginRun() method) and is destroyed at the end of a run (after calling the last EndRun() method).
Ownership of this object remains with the analyzer framework. User code should not keep a pointer to it or to any of it's components. (A pointer to this object is passed to all user methods).
When analyzing multiple subrun files, no new runinfo objects are created, but the current file name is always updated when switching from one subrun file to the next.
TARunInfo data members:
TARunInfo methods:
ROOT-related functions provided by the older analyzers via global variables are consolidated in the ROOT helper object with improved object lifetime management.
(If manalyzer is built without HAVE_ROOT, the ROOT helper object is not available).
The root helper object is created and destroyed at the same time as the "run info" object.
Ownership of the TARootHelper object remains with the manalyzer framework, user code should not save pointers to it or any of it's components. A pointer to this object is always available to all user methods via "runinfo->fRoot".
TARootHelper data members:
Multithreading works by giving each module its own thread, passing flow events between these thread via queues
Running with the flag –mt will enable PER-MODULE multithreading. Eg
Multithreading configuration settings can be changed with the flags:
Each flow queue has a std::mutex lock. The queue is locked when its read, the front is popped, or data is push to the back. A thread process will be reading its own queue, and writing to the next queue. There is a global lock for processing TAMultithreadHelper::gfLock, to be used whenever running code that isn't thread safe (ROOT fitting libraries for example).
Place locks around any root fitting functions:
Deconstructors for data put into the flow must be setup in the flow classes, not inside your modules. Do not put local variables into the flow, I recommend creading pointers and insert those.
Remember, you are putting data into the flow, the module it came from no longer has any connection to it.
Debugging advice:
TBW
TBW
TBW
TBW