Job Title: Summer Intern – Scientific Software Development
The Scientific Software Infrastructure department of the Scientific Computing Division at Fermi National Accelerator Laboratory is leading efforts to advance the tools and infrastructure for HEP data analysis. We have a wide variety of projects, we are looking for motivated students to work on an exciting project involving big data, high performance computing.
Project 1: We are investigating distributed data processing frameworks with HPC facilities as the target platforms for HEP data analysis. Our specific use case is analysis of neutrino interaction data from NOvA and Liquid Argon experiments. The summer project will involve understanding the current data format as it exists in HDF5 and the analysis tasks that are needed, implementing them in technologies such as MPI with Python API (h5py, mpi4py and numpy), and completing a comprehensive evaluation of performance and ease-of-use. We expect that at the end of this internship, the student will have an understanding of a broader range of HEP analysis operations, and implement a real scientific use case. We expect the student to write a technical report preferably a paper for a conference.
Preferred skills for this position are:
- Experience with Python or C++
- Use of MPI
- Understanding of distributed systems
- Experience with high performance I/O libraries (HDF5) is desirable.
Project 2: Our software systems and libraries play an important part for the scientific community in the research programs of Fermilab. The focus of this work will be on art framework features. The art framework plays a major role in data organization and storage, as well as the configuration, coordination, and execution of scientific algorithms. It is also the primary workflow tool for moving data through sequences of algorithms. art is written using state-of-the-art C++. It provides a full set of tools for development of data analysis algorithms, include object-based data modeling. Our goal for this summer project is to introduce a tabular data layer modeling and management layer into art. The layer will provide a way to introduce python algorithms, with data access using numpy. It must also provide an excellent C++ interface, with multidimensional array support, and efficient access to libraries such as Eigen and Armadillo.
Experience with Python and C++ is required.
Project 3: We are responsible for a configuration language developed in-house: the Fermilab Hierarchical Configuration Language (FHiCL). Sharing some characteristics with YAML, this language allows the configuration of invocations of the art C++14 event-based processing framework to carry out a wide range of tasks utilizing user-written and other third-party code, and is also used to configure the behavior of such code. Currently, the reference implementation is written in C++ using the Boost Spirit parser and Phoenix toolkits. We are looking for a student interested in domain-specific language parsing to investigate the use of ANTLR4 to generate bindings for the FHiCL language to multiple languages, including a prospective replacement for the C++ reference implementation in the art framework.
Experience with C++ and any parser generator e.g. yacc, bison, ANTLR is required.