## Dependencies * MPI * OpenMP * CUDA (Optional) The non-GPU implementation is primarily for testing and development, and is not likely to achieve optimal performance ## Installation CUDA + OpenMP (for NVIDIA V100 with compute capability 7.0): ./autogen.sh CXXFLAGS="--forward-unknown-to-host-compiler -x cu -ccbin mpic++ -gencode=arch=compute_70,code=sm_70 -g" \ LDFLAGS="--forward-unknown-to-host-compiler -link -ccbin mpic++ -gencode=arch=compute_70,code=sm_70 -g -ldl" \ CXX="nvcc" \ ./configure --prefix=/path/to/install/dir --enable-openmp --enable-cuda` make -j6 make install Pure OpenMP: ./autogen.sh CXXFLAGS=" -g" \ LDFLAGS="-g" \ CXX="mpic++" \ ./configure --prefix=/path/to/install/dir --enable-openmp make -j6 make install