Dependencies
MPI
Accelerator API, one of:
CUDA
HIP
SYCL
OpenMP (CPU-threading only, for testing/development – not performant!)
Host-side thread API (optional)
OpenMP
BLAS library for high-performance linear algebra (optional, recommended)
cuBLAS (CUDA)
rocBLAS (HIP)
oneMKL (SYCL)
Installation
CUDA + OpenMP (for NVIDIA V100 with compute capability 7.0):
./autogen.sh
CXXFLAGS="--forward-unknown-to-host-compiler -x cu -ccbin mpic++ -gencode=arch=compute_70,code=sm_70 -g" \
LDFLAGS="--forward-unknown-to-host-compiler -link -ccbin mpic++ -gencode=arch=compute_70,code=sm_70 -g -ldl" \
CXX="nvcc" \
./configure --prefix=/path/to/install/dir --enable-openmp --enable-cuda
make -j6
make install
Optionally enable cuBLAS with --enable-cublas
Pure OpenMP:
./autogen.sh
CXXFLAGS=" -g" \
LDFLAGS="-g" \
CXX="mpic++" \
./configure --prefix=/path/to/install/dir --enable-openmp
make -j6
make install
OLCF Frontier:
./autogen.sh
module load PrgEnv-amd rocm craype-accel-amd-gfx90a
CXX=CC \
CXXFLAGS="-x hip -D__HIP_ARCH_GFX90A__=1 --offload-arch=gfx90a -O3" \
./configure --prefix=/path/to/install/dir --enable-hip --enable-openmp --enable-rocblas
make -j 8
make install
ALCF Aurora:
./autogen.sh
CXX=mpic++ \
CXXFLAGS="-O3 -g -fsycl -fsycl-targets=spir64" \
LDFLAGS="-O3 -g -fsycl -fsycl-targets=spir64" \
./configure --prefix=/path/to/install/dir --enable-sycl --enable-onemkl