Table of contents

About Features Requirements Compiling Documentation Code examples

About the library

A C++ template library for working with particles and grids and solving PDEs in a parallel and memory efficient way with up to thousands of tasks. The library uses MPI, OpenMP (or both) for the parallelization, though everything also works without an MPI compiler. We take care of most of the communication between tasks under the cover so the user don't have to deal with that. It is made mainly for analyzing the large scale structure of the Universe, but it is written in general so it can be useful outside of cosmology. It is not meant to be a replacement for common cosmology codes that does similar things (like initial condition generation, simple N-body simuations, halo and void finders, reconstruction etc.) although these things are very easy to do with the library. The goal of the library is rather provide users with the basic tools needed to working in parallel so that one don't have to start by making the tools before doing the science and to implement general parallel algorithms without making (too many) assumptions about how they are to be used and let the user take all these decitions thus making it easier to do new things without having to modify the whole code (as is usually the case with codes made by physicists). The classes we provide are therefore templated on dimension (well knowing that nobody is going to use it for anything else that $N=2$ or $N=3$) and the types and the algorithms are templated on classes that determine how to process and compile up the data it computes. To be able to do Fourier transforms we settle for a simple slab-based domain decomposition for the parallelization. The library contains algorithms for assigning particles to grids, linking particles together in groups, general watershed, tesselations, fourier transforms, computing correlation functions and polyspectra, a general multigrid solver, simple N-body simulations and many more things. We also provide wrappers of methods in libraries like GSL, CGAL and LUA to more easily being able to perform various tasks.

Code on Github


The library contains ~25.000 lines of code (and documentation inside the code) including ~10.000 lines worth of examples. Some of the things provided in the library:

  • A grid class $\texttt{FFTWGrid}$ for holding real and complex data (float, double, long double) shared among tasks built around FFTW for doing in-place Fourier transforms with MPI, threads or both.
  • A general grid class $\texttt{MPIGrid}$ for holding any data shared between tasks.
  • A general particle container $\texttt{MPIParticles}$ for holding particles shared between tasks and communicating them if they leave their domain.
  • A class, $\texttt{MPIPeriodicDelaunay}$, for performing parallel Delauney tesselation in a periodic box with the associated Voronoi diagram (using CGAL). Simple interface for adding data to the vertices.
  • A general multigrid solver for solving PDEs with periodic or Dirichlet boundary conditions on a compact domain with a simple interface for defining the equation.
  • Watershed algorithm based on the Voronoi diagram of the particles. User decides what quantity to watershed on.
  • Correlation functions in real space: general pair counting algorithm over particles. User provided a function determining what to compute for the pairs.
  • Correlation functions in Fourier space: power-spectrum via naive binning, with interlaced grids or with direct summation.
  • Correlation functions multipoles in Fourier space with fixed line of sight direction (though easy to generalize to survey).
  • Polyspectra in Fourier space: bispectrum, trispectrum, ...
  • Friends of Friends algorithm for linking together particles that are close to eachother in groups. User provides class determining what to compute for these groups.
  • N-body methods for computing forces and moving particles which can be used to perform simple N-body simulations. Contains methods for computing modified gravity N-body equations either exactly for a particular model or approximating it using the screening method of Winther & Ferreira 2015.
  • Hankel transforms (FFTLog).
  • Generating gaussian random fields in real or fourier space.
  • Generating non-local gaussian random fields (local, equilateral, orthogonal or generic $f_{\rm NL}$) in real or fourier space.
  • Generating particle distributions of a random field.
  • Lagrangian perturbation theory: generating 1LPT, 2LPT and 3LPT displacement fields (and with support for scaledependent growth). Augmented LPT fields. Computing growth-factors. Combined with the N-body methods its straight-forward to use this to create a COLA N-body solver.
  • Wrapper over GSL for more easily solve ODEs and make 1D and 2D splines.
  • Wrapper over LUA for reading LUA parameterfiles.
  • Typical grid operations like smoothing, whitening, convolutions of a field in fourier space or real space with arbitrary kernel.
  • Reading and writing common file-types Gadget and Ramses files, ascii files, etc.
  • Simple units class for getting and converting physical constants to any units.
  • Methods for monitoring memory allocations in the code, perform timings, producing plots (Matplotlib) etc.
Some more cosmology focused things:
  • Solvers for modified gravity equations ($f(R)$, galileon, symmetron, ...).
  • Density and RSD reconstruction algorithms.
  • Taking galaxies to cartesian coordinates and placing them in a box.
  • Producing particles in redshiftspace.
  • Simple HOD method for generating galaxy mocks.
  • Tidal tensor classification (Cosmic Web) of points in a grid.
  • Background cosmology, recombination history (Saha+Peebles) and integration of perturbation (only made for $\Lambda$CDM).
  • And we also include a Einstein-Boltzmann solver for computing matter and CMB power-spectra (though this is only for teaching purposes so its always better to use CLASS/CAMB).
Most of the these things are made for simulation data, but a lot of them work on survey data also (some things require some small adjustments). A few of the features above (like FFTog [1],[2] and matplotlib-cpp) are outright stolen from the web. For code that is taken from the web the header files should contain the proper acknowledgements and the appropriate licences should hopefuly be included, but if you see anything missing please let me know.


You will need atleast a C++14 compatible compiler to compile the library, but C++17 (-std=c++1z so gcc 7, clang 3.9, icpc 19, MSVC 19.11) is highly reccomended as we use a lot of if constexpr to do compile-time computations to make things more efficient. With some small tweeks and loss of generality it is also possible to just us a C++11 compiler, but too much work for me to bother. Parts of the code can also be compiled without any external libraries, but to take full advantage of the features in the code you should atleast have the FFTW3 library (with MPI support) and GSL installed. With these two libraries installed you can use ~95% of FML. In addition to this the other two libraries we use in just a few places are CGAL and LUA. You should also have a MPI complient compiler though the code can be compiled and run in serial.

  • FFTW verison 3+ : Required to be able to do Fourier transforms and the related algorithms that use Fourier transforms. The grid class FFTWGrid can be compiled without it.
  • CGAL version 5+ : Required to do tesselations and the related algorithms that rely on this.
  • GSL version 2+ : Required to solve ODEs, make splines, random numbers (though if we don't have this we use C++ <random> instead) and linear algebra (just a few places).
  • LUA version 5+ : Required to use LuaFileParser to read parameterfiles. We don't use this much.


Most of the library is in forms of header files that you just need to include in your code. A few of these needs to be compiled. We could have compiled this up to a shared library, but given the options the user have (with MPI or without, with some external libraries or without, etc.) its better to compiler these files together with your code. For this see the included Makefiles for how to do this. Having tons of define statements in the code sucks, but its the only good/normal way of being able to use the code without having all the features. In the general Makefile you have alot of options to choose from, here is a list of the most important ones:

  • USE_MPI : Compile with MPI support. You will need to have a MPI compiler to use this.
  • USE_OMP : Compile with OpenMP support. Can be used together with MPI for which each MPI task will use OMP_NUM_THREADS threads in certain loops.
  • USE_FFTW : If you have the FFTW3 library. You will have to provide the include and lib paths in the Makefile. If used together with USE_MPI you will need to have compiled the FFTW3 library with MPI support.
  • USE_FFTW_THREADS : If you have the FFTW3 library. Use threads to parallelize the FFTs. Can be used together with USE_MPI.
  • USE_GSL : If you have the GSL library. You will have to provide the include and lib paths in the Makefile.
  • USE_CGAL : If you have the CGAL library. You will have to provide the include and lib paths in the Makefile.
  • USE_LUA : If you have the LUA library. You will have to provide the include and lib paths in the Makefile.
  • USE_PYTHON : If you have Python with Matplotlib installed. You will have to provide the include and lib paths in the Makefile.
Some define statements that can be added to change how the code works:
  • SINGLE_PRECISION_FFTW : Use float instead of double for FFTWGrid and FFTs in general. FFTW needs to be compiled with float support to use this.
  • LONG_DOUBLE_PRECISION_FFTW : Use long double instead of double for FFTWGrid and FFTs in general. FFTW needs to be compiled with long double support to use this.
  • NO_AUTO_FML_SETUP : FML, MPI and FFTW is automatically initialized and finalized in the code. If you don't want this add this define. NB: if you use this you should know how FFTW should be initialized with MPI and/or threads to avoid issues. You also need to call FML::init_fml after initializing MPI and FFTW.
  • USE_MEMORYLOG : Log allocations over a certain size (see the MemoryLog header). Useful to map out the memory footprint of the code. Currently only a few classes implement this.
In addition to these we have some Makefile options that can be useful:
  • USE_SANITIZER : Compile with -fsanitize=address to check for bad memory accesses. Useful for debugging.
  • USE_DEBUG : Do some extra asserts, print some more info from the algorithms while running.
You will also have to provide the include path to the folder containing FML/ and in VPATH provide the path to the *.cpp files that needs to be compiled with the code.

Code examples

Almost every folder in the code contains an [example] folder showing how to one can use the methods and classes. Here are a few of those:

Some random examples just for showing some of the methods in the library:

// Headers we need to include
#include <FML/FFTWGrid/FFTWGrid.h>
#include <FML/MPIParticles/MPIParticles.h>
#include <FML/Smoothing/SmoothingFourier.h>
#include <FML/FriendsOfFriends/FoF.h>
#include <FML/ComputePowerSpectra/ComputePowerSpectrum.h>
#include <FML/RandomFields/NonLocalGaussianRandomField.h>

// The particles class
template< NDIM > 
class MyParticle {
  double pos[NDIM];
  double vel[NDIM];
  double get_pos() { return pos; }
  double get_vel() { return vel; }
  constexpr int get_ndim() { return NDIM; }

// Initialization of the library, FFTW and MPI is automatically taken care of
int main(){

  // The dimension we are working in and an alias for the particle class
  const int NDIM = 3;
  using Particle = MyParticle< NDIM >;
  FML::PARTICLE::info< Particle >();

  // Create a regular set of 32^3 particles spread across tasks
  // The local task is responsible for the x-domain [FML::xmin_domain, FML::xmax_domain) in [0,1)
  FML::PARTICLE::MPIParticles< Particle > part;
  const int Npart_1D = 32;
  const double buffer_factor = 2.0; // Allocate 2 times as many particles on each task
            FML::xmax_domain);; // Show info about the particle container

  // Randomly change the coordinates of all the particles 
  // which will make some leave the local domain
  for (auto & p : part) {
      auto pos = FML::PARTICLE::GetPos(p);
      for(int idim = 0; idim < NDIM; idim++)
        pos[idim] = (rand() % 1000) / 1000.0;

  // Communiate so that all particles end up on the right task

  // Compute the power-spectrum of the particles using interlacing for alias reduction
  const int Nmesh = 32;
  const bool interlacing = true;
  const std::string density_assignment_method = "CIC"; // NGP, CIC, TSC, PCS, PQS, ...
  FML::CORRELATIONFUNCTIONS::PowerSpectrumBinning<NDIM> pofk(Nmesh / 2);

  const double box = 100.0; // The physical size of the box
  pofk.scale(box);          // To physical units

  // Compute a density field of the particles
  auto nleftright = FML::INTERPOLATION::get_extra_slices_needed_for_density_assignment(density_assignment_method);
  FML::GRID::FFTWGrid<NDIM> density_field(Nmesh, nleftright.first, nleftright.second);
  FML::INTERPOLATION::particles_to_grid<NDIM, Particle>(
  // Transform density field to fourier space (FFTW real-to-complex)
  // Show info about the grid  

  // Smooth the density field by multiplying by a fourier-space filter
  const double smoothing_scale = 0.01; // In units of the boxsize
  const double smoothing_method = "gaussian";

  // Compute the 1LPT displacement field
  FML::GRID::FFTWGrid<NDIM> phi_1LPT_fourier;
  std::vector< FML::GRID::FFTWGrid<NDIM> &gT; psi_1LPT_real;
  // To free up the memory just call free; 
  for(auto &g : psi_1LPT_real);
  // Take a single N-body time-step
  const double timestep = 1e-3; // dt_code = H0 dt / a^2
  const double OmegaM = 0.3;
  const double a = 1.0;
  const double norm_poisson_equation = 1.5 * OmegaM * a;

  // Do Friend of Friend finding
  const double linking_length = 0.3;
  const double mean_particle_separation = 1.0 / std::pow(part.get_npart_total(), 1.0 / NDIM);
  const double fof_distance = linking_length * mean_particle_separation;
  const int n_min_FoF_group = 20;
  const bool periodic_box = true;

  using FoFHalo = FML::FOF::FoFHalo< Particle, NDIM>
  std::vector<FoFHalo> FoFGroups;
  FML::FOF::FriendsOfFriends<Particle, NDIM>(

  // Generate a non-gaussian random field 
  FML::GRID::FFTWGrid<NDIM> random_fourier_field(Nmesh);
  FML::RANDOM::RandomGenerator rng;
  const double fNL = 100.0;
  const std::string type_of_fnl = "local";
  const bool fix_amplitude = false; // Fix the norm and only do random phases?
  std::function<double(double)> Pofk_of_kBox_over_volume = [&](double kBox) {
        if (kBox == 0.0) return 0.0;
        double k = kBox / box;                        // k in physical units
        double volume = std::pow(box, Ndim);          // Volume of the box in physical units
        double pofk = 1e-6 * std::pow(1.0 / k, Ndim); // pofk in physical units
        return pofk / volume;                         // Dimensionless P(k)/V
  // Compute the bispectrum
  const double kmin = 0.0;
  const double kmax = 2.0 * M_PI * Nmesh/2;
  const int nbin = 16;
  FML::CORRELATIONFUNCTIONS::BispectrumBinning<NDIM> bofk(kmin, kmax, nbin);
  bofk.scale(box); // To physical units

  // Fourier transform density field to real-space