BLAS and LAPACK¶
Matrix multiplication, matrix decomposition and other linear algebra operations are handled by external BLAS/LAPACK libraries, such as Intel MKL, AMD AOCL, OpenBLAS, etc. MOCCA just acts as the translation layer between the C++ objects and the BLAS/LAPACK library. Any library is supported as long as it provided standard CBLAS
and LAPACKE
interfaces. MOCCA also have integration with Intel MKL.
It is recommended to use CMake to build your project, but GNU Makefile and other build systems can be used as well. When using CMake, set the variable MOCCA_BLAS
to intel-mkl
, amd-aocl
or openblas
before calling find_package()
. The MOCCA configuration script will automatically link your project to the correct libraries. You can use the environment variables AOCLROOT
/MKLROOT
to indicate the installation path of the AMD AOCL/Intel MKL.
For other build systems, define the preprocessing directive MOCCA_BLAS
to MKL
(for Intel MKL) or CBLAS
(for all other BLAS libraries) before include any MOCCA’s header. You must then link your project with the appropriate libraries. If MOCCA_BLAS
is not specified, the default is intel-mkl
/MKL
.
For sparse BLAS operations, MOCCA will use the Intel MKL implementation if it is available. Otherwise, will fall back to the built-in kernels.
Warning
BLAS/LAPACK routines only work with floating-point numbers.
Note
Intel(R) MKL and AMD AOCL are proprietary software and it is the responsibility of users to buy or register for community (free) licenses for their products.
Note
In newer versions of Intel MKL (v2020.2 and up), most routines seem to have good performance in both Intel and AMD processors. Nevertheless, if you encounter performance issues in AMD CPUs, you can follow the instructions in Daniel’s blog to force Intel MKL to use a more efficient code path. In older versions of Intel MKL, set the environment variable MKL_DEBUG_CPU_TYPE=5
before calling your program to force MKL to use the AVX2 code path in AMD processors (see Pudget Systems’ blogpost for more information).
Matrix Multiplication¶
Currently, the mult()
method supports the following operations between matrices:
\(\mathbf{C} = \alpha \mathbf{A} \mathbf{B} + \beta \mathbf{C}\)
\(\mathbf{C} = \alpha \mathbf{A}^\intercal \mathbf{B} + \beta \mathbf{C}\)
\(\mathbf{C} = \alpha \mathbf{A} \mathbf{B}^\intercal + \beta \mathbf{C}\)
\(\mathbf{C} = \alpha \mathbf{A}^\intercal \mathbf{B}^\intercal + \beta \mathbf{C}\)
\(\mathbf{C} = \alpha \mathbf{S} \mathbf{B} + \beta \mathbf{C}\)
\(\mathbf{C} = \alpha \mathbf{B} \mathbf{S} + \beta \mathbf{C}\)
And the following operations for multiplying matrices and vectors:
\(\vec{u} = \alpha \mathbf{A} \vec{v} + \beta \vec{u}\)
\(\vec{u} = \alpha \mathbf{A}^\intercal \vec{v} + \beta \vec{u}\)
\(\vec{u} = \alpha \mathbf{S} \vec{v} + \beta \vec{u}\)
where \(\alpha, \beta\) are scalars; \(\mathbf{A}, \mathbf{B}, \mathbf{C}\) are dense Matrix objects; \(\mathbf{S}\) is a CSR Matrix; and \(\vec{u}, \vec{v}\) are either dense row or column Vector objects. Except for the transpose, no matrix or vector expression is allowed as arguments to the routine. Be careful with Aliasing: \(\mathbf{A} = \mathbf{A} \mathbf{B}\) is not safe!
using namespace mocca;
Matrix<float> A(n, k);
Matrix<float> B(k, m);
Matrix<float> C(n, m);
Vector<float> v(k);
Vector<float> u(n);
//...//
// Calculates C = A x B
mult(A, B, C);
// Calculates C = 2 * A x B + 4 * C;
mult(A, B, C, {.alpha = 2.0f, .beta = 4.0f});
// Calculates u = A x v
mult(A, v, u);
Note
Column and Row Vector are automatically transposed to the correct orientation before the multiplication.