Dependencies

MPI
OpenMP
CUDA (Optional)

The non-GPU implementation is primarily for testing and development, and is not likely to achieve optimal performance

Installation

CUDA + OpenMP (for NVIDIA V100 with compute capability 7.0):

./autogen.sh
CXXFLAGS="--forward-unknown-to-host-compiler -x cu -ccbin mpic++ -gencode=arch=compute_70,code=sm_70 -g" \
LDFLAGS="--forward-unknown-to-host-compiler -link -ccbin mpic++ -gencode=arch=compute_70,code=sm_70 -g -ldl" \
CXX="nvcc" \
./configure --prefix=/path/to/install/dir --enable-openmp --enable-cuda`
make -j6
make install

Pure OpenMP:

./autogen.sh
CXXFLAGS=" -g" \
LDFLAGS="-g" \
CXX="mpic++" \
./configure --prefix=/path/to/install/dir --enable-openmp
make -j6
make install