# BLAS and LAPACK#

Matrix multiplication, matrix decomposition and other linear algebra operations are handled by external BLAS/LAPACK libraries, such as Intel MKL, AMD AOCL, OpenBLAS, etc. MOCCA just acts as the translation layer between the C++ objects and the BLAS/LAPACK library. Any library is supported as long as it provided standard `CBLAS`

and `LAPACKE`

interfaces. MOCCA also have integration with Intel MKL.

It is **recommended** to use CMake to build your project, but GNU Makefile and other build systems can be used as well. When using CMake, set the variable `MOCCA_BLAS`

to `intel-mkl`

, `amd-aocl`

or `openblas`

before calling `find_package()`

. The MOCCA configuration script will automatically link your project to the correct libraries. You can use the environment variables `AOCLROOT`

/`MKLROOT`

to indicate the installation path of the AMD AOCL/Intel MKL.

For other build systems, define the preprocessing directive `MOCCA_BLAS`

to `MKL`

(for Intel MKL) or `CBLAS`

(for all other BLAS libraries) **before** include **any** MOCCA’s header. You **must** then link your project with the appropriate libraries. If `MOCCA_BLAS`

is not specified, the default is `intel-mkl`

/`MKL`

.

For sparse BLAS operations, MOCCA will use the Intel MKL implementation if it is available. Otherwise, will fall back to the built-in kernels.

Warning

BLAS/LAPACK routines only work with floating-point numbers.

Note

Intel(R) MKL and AMD AOCL are proprietary software and it is the responsibility of users to buy or register for community (free) licenses for their products.

Note

In newer versions of Intel MKL (v2020.2 and up), most routines seem to have good performance in both Intel and AMD processors. Nevertheless, if you encounter performance issues in AMD CPUs, you can follow the instructions in Daniel’s blog to force Intel MKL to use a more efficient code path. In older versions of Intel MKL, set the environment variable `MKL_DEBUG_CPU_TYPE=5`

before calling your program to force MKL to use the AVX2 code path in AMD processors (see Pudget Systems’ blogpost for more information).

## Matrix Multiplication#

Currently, the `mult()`

method supports the following operations between matrices:

\(\mathbf{C} = \alpha \mathbf{A} \mathbf{B} + \beta \mathbf{C}\)

\(\mathbf{C} = \alpha \mathbf{A}^\intercal \mathbf{B} + \beta \mathbf{C}\)

\(\mathbf{C} = \alpha \mathbf{A} \mathbf{B}^\intercal + \beta \mathbf{C}\)

\(\mathbf{C} = \alpha \mathbf{A}^\intercal \mathbf{B}^\intercal + \beta \mathbf{C}\)

\(\mathbf{C} = \alpha \mathbf{S} \mathbf{B} + \beta \mathbf{C}\)

\(\mathbf{C} = \alpha \mathbf{B} \mathbf{S} + \beta \mathbf{C}\)

And the following operations for multiplying matrices and vectors:

\(\vec{u} = \alpha \mathbf{A} \vec{v} + \beta \vec{u}\)

\(\vec{u} = \alpha \mathbf{A}^\intercal \vec{v} + \beta \vec{u}\)

\(\vec{u} = \alpha \mathbf{S} \vec{v} + \beta \vec{u}\)

where \(\alpha, \beta\) are scalars; \(\mathbf{A}, \mathbf{B}, \mathbf{C}\) are dense Matrix objects; \(\mathbf{S}\) is a CSR Matrix; and \(\vec{u}, \vec{v}\) are either dense row or column Vector objects. Except for the transpose, no matrix or vector expression is allowed as arguments to the routine. *Be careful* with Aliasing: \(\mathbf{A} = \mathbf{A} \mathbf{B}\) is not safe!

```
using namespace mocca;
Matrix<float> A(n, k);
Matrix<float> B(k, m);
Matrix<float> C(n, m);
Vector<float> v(k);
Vector<float> u(n);
//...//
// Calculates C = A x B
mult(A, B, C);
// Calculates C = 2 * A x B + 4 * C;
mult(A, B, C, {.alpha = 2.0f, .beta = 4.0f});
// Calculates u = A x v
mult(A, v, u);
```

Note

Column and Row Vector are automatically transposed to the correct orientation before the multiplication.