BLAS and LAPACK#

Matrix multiplication, matrix decomposition and other linear algebra operations are handled by external BLAS/LAPACK libraries, such as Intel MKL, AMD AOCL, OpenBLAS, etc. MOCCA just acts as the translation layer between the C++ objects and the BLAS/LAPACK library. Any library is supported as long as it provided standard CBLAS and LAPACKE interfaces. MOCCA also have integration with Intel MKL.

It is recommended to use CMake to build your project, but GNU Makefile and other build systems can be used as well. When using CMake, set the variable MOCCA_BLAS to intel-mkl, amd-aocl or openblas before calling find_package(). The MOCCA configuration script will automatically link your project to the correct libraries. You can use the environment variables AOCLROOT/MKLROOT to indicate the installation path of the AMD AOCL/Intel MKL.

For other build systems, define the preprocessing directive MOCCA_BLAS to MKL (for Intel MKL) or CBLAS (for all other BLAS libraries) before include any MOCCA’s header. You must then link your project with the appropriate libraries. If MOCCA_BLAS is not specified, the default is intel-mkl/MKL.

For sparse BLAS operations, MOCCA will use the Intel MKL implementation if it is available. Otherwise, will fall back to the built-in kernels.

Warning

BLAS/LAPACK routines only work with floating-point numbers.

Note

Intel(R) MKL and AMD AOCL are proprietary software and it is the responsibility of users to buy or register for community (free) licenses for their products.

Note

In newer versions of Intel MKL (v2020.2 and up), most routines seem to have good performance in both Intel and AMD processors. Nevertheless, if you encounter performance issues in AMD CPUs, you can follow the instructions in Daniel’s blog to force Intel MKL to use a more efficient code path. In older versions of Intel MKL, set the environment variable MKL_DEBUG_CPU_TYPE=5 before calling your program to force MKL to use the AVX2 code path in AMD processors (see Pudget Systems’ blogpost for more information).

Matrix Multiplication#

Currently, the mult() method supports the following operations between matrices:

  • \(\mathbf{C} = \alpha \mathbf{A} \mathbf{B} + \beta \mathbf{C}\)

  • \(\mathbf{C} = \alpha \mathbf{A}^\intercal \mathbf{B} + \beta \mathbf{C}\)

  • \(\mathbf{C} = \alpha \mathbf{A} \mathbf{B}^\intercal + \beta \mathbf{C}\)

  • \(\mathbf{C} = \alpha \mathbf{A}^\intercal \mathbf{B}^\intercal + \beta \mathbf{C}\)

  • \(\mathbf{C} = \alpha \mathbf{S} \mathbf{B} + \beta \mathbf{C}\)

  • \(\mathbf{C} = \alpha \mathbf{B} \mathbf{S} + \beta \mathbf{C}\)

And the following operations for multiplying matrices and vectors:

  • \(\vec{u} = \alpha \mathbf{A} \vec{v} + \beta \vec{u}\)

  • \(\vec{u} = \alpha \mathbf{A}^\intercal \vec{v} + \beta \vec{u}\)

  • \(\vec{u} = \alpha \mathbf{S} \vec{v} + \beta \vec{u}\)

where \(\alpha, \beta\) are scalars; \(\mathbf{A}, \mathbf{B}, \mathbf{C}\) are dense Matrix objects; \(\mathbf{S}\) is a CSR Matrix; and \(\vec{u}, \vec{v}\) are either dense row or column Vector objects. Except for the transpose, no matrix or vector expression is allowed as arguments to the routine. Be careful with Aliasing: \(\mathbf{A} = \mathbf{A} \mathbf{B}\) is not safe!

using namespace mocca;

Matrix<float> A(n, k);
Matrix<float> B(k, m);
Matrix<float> C(n, m);
Vector<float> v(k);
Vector<float> u(n);
//...//

// Calculates C = A x B
mult(A, B, C);

// Calculates C = 2 * A x B + 4 * C;
mult(A, B, C, {.alpha = 2.0f, .beta = 4.0f});

// Calculates u = A x v
mult(A, v, u);

Note

Column and Row Vector are automatically transposed to the correct orientation before the multiplication.