ROOT Open Projects
Opportunities for collaboration and contributions
4/11/25
5/11/25
RNTuple & I/O
S3 Backend for RNTuple
C++ HTTP
Responsible: Jakob Blomer
Target: Master Thesis / Tech / ATLAS Quali Task (perhaps related to ATLAS cloud R&D)
6-12 months
ROOT areas: RNTuple, I/O
The RNTuple I/O provides the data format and basic storage stack for HL-LHC data. It has been designed to allow for exchangeable storage backends, including file system access, XRootD access, and object stores. Proof-of-concept implementations for DAOS and S3 object stores exist but have not been further pursued. The goal of this project is to develop a robust (pre-release quality) implementation for storing RNTuple data in S3, the standard cloud storage protocol. In particular, the project needs to design and implement a URL scheme to address RNTuple objects and an efficient HTTP base layer to access objects and byte ranges. The project should include benchmarks comparing S3 to XRootD and remote file system access in the same data center (e.g., using the AGC benchmark), and a benchmark measuring the effect of network latency to end-to-end performance.
Graceful crash recovery of RNTuple write streams
C++
Responsible: Jakob Blomer
Target: Master Thesis / Tech
12 months
ROOT areas: RNTuple, I/O
The RNTuple I/O provides the data format and basic storage stack for HL-LHC data. Its current use include analysis and reconstruction workflows, with some early prototyping of data acquisition use cases. For data acquisition in particular, where failing jobs cannot simply be restarted, the capability to recover written data after an unexpected termination of the writer process is critical. The RNTuple design allows for that feature provided that a snapshot of the meta-data (the on-disk location of pages) is regularly flushed to disk (not only at the end). The project should design and implement a mechanism to automatically flush a consistent state of the meta-data during writing at a location that can be located after a crash (e.g., a well-known byte offset such as every 100MB). An extensive test suite should be developed alongside the feature.
Abstract Object Access in ROOT Schema Evolution
C++
Responsible: Jakob Blomer
Target: Grad
18-24 months
ROOT areas: RNTuple, I/O
Scope: The ROOT schema evolution system enables users to define I/O customization rules that map on-disk types to their evolved versions in-memory. A limitation of the current approach is the fact that the on-disk information (e.g., class members) need to be accessed in the form of existing C++ types. Effectively, that may require users of complex event data models to preserve the history of the class layouts. This project aims at developing an extension to the ROOT schema evolution that allows access to on-disk classes in an abstract way, as a combination of PoDs, abstract records and abstract collections. As a result, certain schema evolution cases related to the change of class shapes (e.g., moving data members in the class hierarchy) should be greatly simplified.
Python Interface
ROOT in free-threaded Python
CPython Cling Thread-safety
Responsible: Jonas Rembser
Target: GSoC student
3 months
ROOT areas: Python Interface, Compiler Technology
- Description with motivation: Free-threaded (aka. “GIL-less”) Python is available since Python 3.13, and marked as non-experimental with Python 3.14. It could become the default one day. Most libraries in the Python ecosystem make an effort to support free-threaded Python builds to improve the performance of multithreaded code. Supporting free-threading is a challenge for Python modules that ship their own CPython extensions, such as ROOT. However, ROOT users would greatly benefit from true multithreading at the Python level, so they don’t have to implement thread management on the C++ level on top of ROOT (e.g. with std::thread). Therefore, supporting free-threaded Python builds could make ROOT user code more performance while staying “pythonic”. Another clear advantage is that multiple C++ threads can call back into Python functions concurrently, which has applications in RDataFrame or RooFit
- Scope and objective: The objective is to make the ROOT Python interface work with free-threaded Python builds, initially by putting locks in the ROOT side where required. To test and demonstrate, code examples should be written and benchmarked. The stretch goal is to update ROOT such that more of the interpreter infrastructure is thread-safe so locks can be gradually avoided.
- Suggested skills and experience: This is for a person who is ready to deep-dive into CPython extensions and learn also about Cling to understand thread-safety in ROOT.
W-mass analysis with RooFit
Physics Statistical Analysis Python C++
Responsible: Jonas Rembser
Target: Summer student or master thesis
3-4 months
ROOT areas: Statistical Interpretation
- Description with motivation: the recent W-mass measurement by CMS uses a custom fitting framework for binned likelihood fits, called rabbit (https://github.com/WMass/rabbit). It is optimized for binned likelihood fits with template models in the limit of large statistics, using automatic differentiation and minimizers other than Minuit 2 (which is used by RooFit). However, these specialized frameworks are often a problem for combined fits where other measurements are expressed in RooFit. Therefore, we would like to understand what RooFit is missing to support the W-mass analysis use-case with satisfactory performance, which should inspire several improvements in RooFit.
- Scope and objective: The rabbit framework provides public test examples in its repository, inspired by the W-mass measurement. These examples should be implemented in RooFit to compare performance and usability. For a fair comparison, the RooFit likelihood should also be minimized with the scipy minimizers that rabbit uses, and vice versa (rabbit likelihood with Minuit 2). This conveniently stress-tests the pythonizations of both RooFit and Minuit 2. The outcome should be a concise analysis of the differences in implementation and performance, including recommendations to RooFit developers how to reduce performance differences. The stretch goal is to implement these optimizations in RooFit itself.
- Suggested skills and experience: Physics student with solid knowledge of statistical analysis and Python. C++ would be a bonus.
Compiler Technology
Improve robustness of dictionary to module lookups in ROOT
Clang LLVM Cling ROOT
Responsible: Vassil Vassilev
Target: GSoC Student
3-4 months
ROOT areas: Compiler Technology
The LHC smashes groups of protons together at close to the speed of light: 40 million times per second and with seven times the energy of the most powerful accelerators built up to now. Many of these will just be glancing blows but some will be head on collisions and very energetic. When this happens some of the energy of the collision is turned into mass and previously unobserved, short-lived particles – which could give clues about how Nature behaves at a fundamental level - fly out and into the detector. Our work includes the experimental discovery of the Higgs boson, which leads to the award of a Nobel prize for the underlying theory that predicted the Higgs boson as an important piece of the standard model theory of particle physics.
CMS is a particle detector that is designed to see a wide range of particles and phenomena produced in high-energy collisions in the LHC. Like a cylindrical onion, different layers of detectors measure the different particles, and use this key data to build up a picture of events at the heart of the collision. The CMSSW is a collection of software for the CMS experiment. It is responsible for the collection and processing of information about the particle collisions at the detector. CMSSW uses the ROOT framework to provide support for data storage and processing. ROOT relies on Cling, Clang, LLVM for building automatically efficient I/O representation of the necessary C++ objects. The I/O properties of each object is described in a compileable C++ file called a /dictionary/. ROOT’s I/O dictionary system relies on C++ modules to improve the overall memory footprint when being used.
The few run time failures in the modules integration builds of CMSSW are due to dictionaries that can not be found in the modules system. These dictionaries are present as the mainstream system is able to find them using a broader search. The modules setup in ROOT needs to be extended to include a dictionary extension to track dictionary<->module mappings for C++ entities that introduce synonyms rather than declarations (using std::vector<A<B>> = MyVector where the dictionaries of A, B are elsewhere).
-
If an alias declaration of kind using
using std::vector<A<B>> = MyVector, we should store the ODRHash of it in the respective dictionary file as a number attached to a special variable which can be retrieved at symbol scanning time. - Track down the test failures of CMSSW and check if the proposed implementation works.
- Develop tutorials and documentation.
- Present the work at the relevant meetings and conferences.
Implement CppInterOp API exposing memory, ownership and thread safety information
Clang LLVM Cling cppyy
Responsible: Vassil Vassilev
Target: GSoC Student
3-4 months
ROOT areas: Compiler Technology
Incremental compilation pipelines process code chunk-by-chunk by building an ever-growing translation unit. Code is then lowered into the LLVM IR and subsequently run by the LLVM JIT. Such a pipeline allows creation of efficient interpreters. The interpreter enables interactive exploration and makes the C++ language more user friendly. The incremental compilation mode is used by the interactive C++ interpreter, Cling, initially developed to enable interactive high-energy physics analysis in a C++ environment.
Clang and LLVM provide access to C++ from other programming languages, but currently only exposes the declared public interfaces of such C++ code even when it has parsed implementation details directly. Both the high-level and the low-level program representation has enough information to capture and expose more of such details to improve language interoperability. Examples include details of memory management, ownership transfer, thread safety, externalized side-effects, etc. For example, if memory is allocated and returned, the caller needs to take ownership; if a function is pure, it can be elided; if a call provides access to a data member, it can be reduced to an address lookup.
The goal of this project is to develop an API for CppInterOp which is capable of extracting and exposing such information AST or from JIT-ed code and use it in cppyy (Python-C++ language bindings) as an exemplar. If time permits, extend the work to persistify this information across translation units and use it on code compiled with Clang.
Task ideas and expected results:- Collect and categorize possible exposed interop information kinds.
- Write one or more facilities to extract necessary implementation details.
- Design a language-independent interface to expose this information.
- Integrate the work in clang-repl and Cling.
- Implement and demonstrate its use in cppyy as an exemplar.
- Present the work at the relevant meetings and conferences.