# Matrix and Vector Operations#

The MOCCA library was designed around Expression Templates [1] [2] in order to allow users to write mathematical expressions as close to the textbooks as possible. These expressions are then lazily evaluated to achieve high performance. In other words, some operators (e.g., `+`

) and methods (e.g., `mat.T()`

) return an abstract object representing the operation without performing any actual computation. The computation only happens later, when the entire expression is evaluated, usually in the assignment operator `=`

. Although this sounds inefficient, modern compilers can optimize all the abstraction away and generate a perfectly optimized code. For example, the code

```
Matrix<float> vec1(n);
Matrix<float> vec2(n);
Matrix<float> result(n);
// ...
result = 5 * vec1 + vec2;
```

is transformed in a single for-loop, so the vectors are traversed only once and no temporary is created. Without any further optimizations (e.g., SIMD), the loop looks like this:

```
for (int i = 0; i < n; ++i) {
result[i] = 5 * vec1[i] + vec2[i];
}
```

Therefore, you should not be afraid to write mathematical expressions with many terms. This will only give MOCCA more opportunities to optimize the code.

Warning

Due to the SIMD optimization, expressions can **only** contain matrix or vectors with the same type, e.g., `float`

.

**References:**

## Element-wise Operations#

In MOCCA, the C++ operators `+`

, `-`

, `*`

and `/`

express element-wise operations between matrices, vectors and scalars. Naturally, both the left and right-hand sides of the operation must have the same number of rows and columns (see Broadcast when mixing vectors and matrices). All element-wise operations can be applied for both dense and sparse datatypes.

Addition:

`mat1 + mat2`

,`vec1 + vec2`

,`mat + vec`

,`scalar + mat`

and`scalar + vec`

Subtraction:

`mat1 - mat2`

,`vec1 - vec2`

,`mat - vec`

,`scalar - mat`

and`scalar - vec`

Multiplication:

`mat1 * mat2`

,`vec1 * vec2`

,`mat * vec`

,`scalar * mat`

and`scalar * vec`

Division [*] :

`mat1 / mat2`

,`vec1 / vec2`

,`mat / vec`

,`scalar / mat`

and`scalar / vec`

Unary Minus:

`-mat`

and`-vec`

Compound operators:

`+=`

,`-=`

,`*=`

and`/=`

with the same rules as above.

Note

See BLAS and LAPACK for calculating the product between two matrices as defined in linear algebra.

Some mathematical functions can also be applied element-wise:

Absolute Value:

`abs(vec)`

and`abs(mat)`

Square Root:

`sqrt(vec)`

and`sqrt(mat)`

y-th Power:

`pow(vec, y)`

and`pow(mat, y)`

Exponential:

`exp(vec)`

and`exp(mat)`

```
using namespace mocca;
Matrix<int> mat1 = {{1, 2},
{3, 4}};
Matrix<int> mat2 = {{0, 1},
{4, 7}};
Matrix<int> res(2, 2);
res = mat1 * mat2;
std::cout << "mat1 * mat2 =" << std::endl;
std::cout << res << std::endl << std::endl;
res += 3;
std::cout << "res += 3" << std::endl;
std::cout << res << std::endl << std::endl;
```

Output::

```
mat1 * mat2 =
[[ 0 2 ]
[ 12 28 ]]
res += 3
[[ 3 5 ]
[ 15 31 ]]
```

Note

Naturally, the right-hand side of a division cannot be sparse.

Note

Element-wise unary operations (e.g., `csr + 5`

or `abs(csr)`

) will **only** affect the nonzero entries of the sparse expression (i.e., an expression that includes a CSR Matrix or Sparse Vector)

You can also mix objects with different storage schemes:

```
using namespace mocca;
Matrix<int> dense1(nrows, ncols);
Matrix<int> dense2(nrows, ncols);
CSRMatrix<int> csr(nrows, ncols);
Matrix<int> dense3(nrows, ncols);
//...
dense3 = dense1 + csr + dense2;
```

However, performance-wise, it is better to calculate the previous example in two steps:

```
dense3 = dense1 + dense2;
dense3 += csr;
```

In this way, your code fully exploits the higher performance of the dense storages (e.g., no indirection, SIMD, etc.) and only have to pay the cost of the slow evaluation of the sparse expression on the few non-zero entries in the `csr`

matrix.

## User-defined Operations#

You can also create custom element-wise operations. There are two kinds: unary and binary operations. The first applies an element-wise operation (.e.g., `log()`

) over a container, while the second, is an element-wise operation between two objects (.e.g, an addition). However, the functor that implements the target operation must accept both a scalar and a `mocca::simd::Register`

. For example, for a custom unary operation,

```
using namespace mocca;
template<typename T>
struct custom_op
{
T operator()(T x)
{
return std::log(x);
}
simd::Register<T> operator()(const simd::Register<T>& x)
{
return simd::log(x);
}
};
```

Naturally, for a custom binary operation, the functor should accept two arguments:

```
template<typename T>
struct custom_op
{
T operator()(T x, T y);
simd::Register<T> operator()(const simd::Register<T>& x, const simd::Register<T>& y);
};
```

After you created the functor, you can call either `unary_op()`

or `binary_op()`

to apply the operation over the target object(s).

```
Matrix<float> A = {{0.4, 0.5, 0.6},
{0.2, 0.6, 1.6},
{2.4, 1.1, 0.05}};
Matrix<float> B = unary_op(A, custom_op<float>());
std::cout << "log(A) = " << B << std::endl;
```

Output::

```
log(A) =
[[ -0.9162907 -0.6931472 -0.5108256 ]
[ -1.609438 -0.5108256 0.47000363 ]
[ 0.8754688 0.095310204 -2.9957323 ]]
```

Note

The user-defined operation obeys the same principles as a built-in operation, for example, an unary operation only affects the nonzero entries in a sparse expression and binary operations can trigger a Broadcast when mixing vectors and matrices.

## Broadcast#

In a *Broadcast* [3] operation, a row or column Vector is replicated over one direction to be interpreted as a matrix. MOCCA will implicitly broadcast a vector expression (i.e., an expression only containing vectors) to match the dimension of the target matrix when using the operators `+`

, `-`

, `*`

and `/`

or evaluating the expression to a matrix. For example,

```
using namespace mocca;
Matrix<float> A = {{1, 2, 3},
{4, 5, 6},
{7, 8, 9}};
Vector<float> v = {5, 2, 10};
Matrix<float> B(3, 3);
B = v;
std::cout << "Broadcast in the assignment:" << std::endl;
std::cout << B << std::endl;
std::cout << "A / v =" << std::endl;
std::cout << B << std::endl;
```

Output:

```
Broadcast in the assignment:
[[ 5 5 5 ]
[ 2 2 2 ]
[ 10 10 10 ]]
A / v =
[[ 0.2 0.4 0.6 ]
[ 2 2.5 3 ]
[ 0.7 0.8 0.9 ]]
```

There are two equivalent ways to interpret the broadcast in this example. It divides each column of the matrix `A`

by the vector `v`

. The second interpretation is that the vector `v`

is replicated 3 times to form a 3x3 matrix and then the program divides `A`

by this matrix (the operator `/`

denote element-wise division):

Note that the target matrix can be either sparse or dense, but only Vector objects can be broadcasted. The direction of the broadcast is determined by the orientation of the vector: column vectors are always broadcasted over the matrix columns, while row vectors are over the matrix rows. Naturally, transposing the vector changes the direction of the broadcast.

Note

Before broadcasting a vector and assigning it to a matrix, you **must** set the matrix to the correct dimensions.

A broadcast also occurs when performing element-wise operations (`+`

, `-`

, `*`

and `/`

) between row and column vectors. For example,

```
using namespace mocca;
Vector<float> v = {6, 3, 12};
RowVector<float> rv = {5, 0.5, 1, 5};
Matrix<float> A = v * rv;
Matrix<float> B = outer(v, rv);
std::cout << "v * rv =" << std::endl;
std::cout << A << std::endl << std::endl;
std::cout << "is equivalent to" << std::endl << std::endl;
std::cout << "outer(v, rv) =" << std::endl;
std::cout << B << std::endl;
```

Output::

```
v * rv =
[[ 30 3 6 30 ]
[ 15 1.5 3 15 ]
[ 60 6 12 60 ]]
is equivalent to
outer(v, rv) =
[[ 30 3 6 30 ]
[ 15 1.5 3 15 ]
[ 60 6 12 60 ]]
```

In this case, both the column vector `v`

and the row vector `rv`

are replicated 4 and 3 times, respectively, to form two 3x4 matrices, which are then multiplied together in an element-wise fashion. This is equivalent to computing the `outer`

product between the two vectors.

Note

Assigning a row vector to a column vector, or vice-versa, will not trigger a broadcast. Instead, the source vector will be automatically transposed to match the orientation of the destination vector.

**Reference:**

‘NumPy Documentation: Broadcasting’. https://numpy.org/doc/stable/user/basics.broadcasting.html

## Transpose#

The transpose \(\mathbf{A}^T\) of a matrix or vector \(\mathbf{A}\) can be calculated by calling the methods `A.T()`

or `A.transpose()`

. This operation is **only** valid for dense datatypes.

Note

Transposing a Vector only change the orientation of the vector (e.g., column -> row vector) wihout performing any computation. This can be used for controlling the direction of the Broadcast.

```
using namespace mocca;
Matrix<float> A = {{11, 12, 13, 14},
{21, 22, 23, 24},
{31, 32, 33, 34}};
Matrix<float> B = A.T();
std::cout << "A.T() =" << std::endl;
std::cout << B << std::endl;
```

Output::

```
A.T() =
[[ 11 21 31 ]
[ 12 22 32 ]
[ 13 23 33 ]
[ 14 24 34 ]]
```

Similar to the other operations, the `transpose()`

will create an abstract object without doing the actual transposition. For example, in the `B = A.T()`

expression, the transpose only will be evaluated when assigning the results to `B`

. This creates a problem when trying to assign the transpose to itself: `A = A.T()`

. MOCCA will start to write the results to `A`

before completing the evaluation of the transposition. Therefore, the `A = A.T()`

do not replace the `A`

with its transpose `A.T`

as expected:

```
using namespace mocca;
Matrix<float> A = {{11, 12, 13},
{21, 22, 23},
{31, 32, 33}};
// NEVER DO THIS!
A = A.T();
std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;
```

which produces

```
A.T() =
[[ 11 21 31 ]
[ 21 22 32 ]
[ 31 32 33 ]]
```

But the correct output is

```
A.T() =
[[ 11 21 31 ]
[ 12 22 32 ]
[ 13 23 33 ]]
```

We denote this problem as Aliasing. In “debug” mode (i.e., with `MOCCA_DEBUG = 1`

), MOCCA will automatically detect the presence of aliasing in the expression and will throw a `std::runtime_error`

exception accordingly.

For *in-place* transposition (i.e., `A = A.T()`

) use the transpose_inplace() method instead.

```
using namespace mocca;
Matrix<float> A = {{11, 12, 13},
{21, 22, 23},
{31, 32, 33}};
A.transpose_inplace();
std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;
```

Output:

```
A.T() =
[[ 11 21 31 ]
[ 12 22 32 ]
[ 13 23 33 ]]
```

## Diagonal#

You can use the `diag()`

method to extract the diagonal of an expression and the `diag_mat()`

method to construct a diagonal matrix from a vector expression. The diagonal matrix is considered a sparse expression.

```
using namespace mocca;
Matrix<float> A = {{1, 2, 3},
{4, 5, 6},
{7, 8, 9}};
Vector<float> A_diag = diag(A);
Matrix<float> D = diag_mat(A_diag);
std::cout << "Diagonal of A:" << A_diag << std::endl;
std::cout << "Diagonal Matrix from diag(A): " << D << std::endl;
return 0;
```

Output:

```
Diagonal of A:
[ 1
5
9 ]
Diagonal Matrix from diag(A):
[[ 1 0 0 ]
[ 0 5 0 ]
[ 0 0 9 ]]
```

Note

Note that you cannot access individual entries of the objects returned by the methods `diag()`

and `diag_mat()`

nor you can assign values to them.

## Reductions#

A *Reduction* operation combines all the entries of a matrix, vector or related expression into a single scalar value. Here is the list of all the reductions available:

Sum:

`sum()`

Product:

`prod()`

Minimum Value:

`min()`

Maximum Value:

`max()`

Note

Currently, there is no way to define a custom reduction operation.

One of the most common reductions is the `sum()`

, which returns the sum of all entries in the expression.

```
using namespace mocca;
Matrix<float> A = {{1, 4, 0, 3},
{2, 1, 2, 2},
{5, 10, 1, 3}};
Vector<float> v = {2, 3, 0};
float sum_A = sum(A);
float sum_Av = sum(A + v);
std::cout << "The sum of all entries of A is " << sum_A << std::endl;
std::cout << "The sum of all entries of (A + v) is " << sum_Av << std::endl;
```

Output:

```
The sum of all entries of A is 34
The sum of all entries of (A + v) is 54
```

*Partial Reductions* are also supported. A partial reduction calculates the reduction for each row or column of the matrix expression and then stores the results in a column/row vector [4]. The axis of the partial reduction is defined by the tags `rowwise`

, `colwise`

and `no_axis`

. A row-wise reduction will always returns a column vector, while a column-wise reduction, a row vector. For example,

```
using namespace mocca;
Matrix<float> A = {{1, 4, 0, 3},
{2, 1, 2, 2},
{5, 10, 1, 3}};
float max_A = max(A, no_axis);
Vector<float> max_row_A = max(A, rowwise);
RowVector<float> max_col_A = max(A, colwise);
std::cout << "Maximum value among all entries is " << max_A << std::endl << std::endl;
std::cout << "Maximum value per row is " << std::endl << max_row_A << std::endl << std::endl;
std::cout << "Maximum value per col is " << std::endl << max_col_A << std::endl;
```

Output:

```
Maximum value per row is
[ 4
2
10 ]
Maximum value per col is
[ 5 10 2 3 ]
```

Note

Due to the CSR storage format, CSR Matrix **only** supports row-wise and full reductions. This operation also **cannot** be applied to sparse expressions.

Note

Like all other operations, the partial reduction is only evaluated during the assignment and no additional storage is required. However, if the partial reduction is followed by a broadcast, the result of the partial reduction may be stored into a temporary object to avoid repeating the computation during the broadcast.

## Norm#

MOCCA can compute the following norms for any vector expression:

`norm_l1(x)`

= \(\|\vec{x}\|_1 = |x_1| + |x_2| + \dots + |x_n|\) (\(l^1\) norm)`norm_l2(x)`

= \(\|\vec{x}\|_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}\) (\(l^2\) norm)`norm_max(x)`

= \(\|\vec{x}\|_\infty = max(|x_1|, |x_2|, \dots, |x_n|)\) (\(l^\infty\) norm)

These norms can also be applied to matrices. In this case, a \(N \times M\) matrix is seen as a vector with \(N \cdot M\) entries. Therefore, the `norm_l2(x)`

will return the *Frobenius* or *Hilbert-Schmidt* norm of the matrix. MOCCA can also calculate the norm of each row or column of the matrix using the tags `rowwise`

, `colwise`

and `no_axis`

.

```
using namespace mocca;
Matrix<float> A = {{1, 4, 0},
{2, 1, 2},
{5, 10, 1}};
float norm_l2_A = norm_l2(A);
Vector<float> norm_l2_A_row = norm_l2(A, rowwise);
std::cout << "||A||_2 = " << norm_l2_A << std::endl << std::endl;
std::cout << "L2 norm of each row:" << std::endl;
std::cout << norm_l2_A_row << std::endl;
```

```
||A||_2 = 12.3288
L2 norm of each row:
[4.12311
3
11.225 ]
```

## Scalar and Outer Products#

The dot (or scalar) product of two vectors can be calculated using the `dot()`

method. Likewise, the outer product can be computed with the `outer`

method.

```
using namespace mocca;
Vector<float> v1 = {2, 3, 1, 3};
Vector<float> v2 = {5, 2, 1, 1};
float scalar_prod = dot(v1, v2);
Matrix<float> outer_prod = outer(v1, v2);
std::cout << "dot(v1, v2) = " << scalar_prod << std::endl << std::endl;
std::cout << "outer(v1, v2) = " << std::endl;
std::cout << outer_prod << std::endl;
```

Output:

```
dot(v1, v2) = 20
outer(v1, v2) =
[[ 10 4 2 2 ]
[ 15 6 3 3 ]
[ 5 2 1 1 ]
[ 15 6 3 3 ]]
```

## Aliasing#

In MOCCA, we say that an expression has *aliasing* when a matrix or vector appears on both sides of the assignment operation, for instance, `mat1 = mat1 + mat2`

or `mat1 = mat1.T()`

. Although in the first case, the aliasing is harmless, the second one will produce incorrect results. To better explain, let us recall the example from the Transpose:

```
using namespace mocca;
Matrix<float> A = {{11, 12, 13},
{21, 22, 23},
{31, 32, 33}};
// NEVER DO THIS!
A = A.T();
std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;
```

Output:

```
A.T() =
[[ 11 21 31 ]
[ 21 22 32 ]
[ 31 32 33 ]]
```

The Aliasing problem arises due to the lazy evaluation, i.e., the expression is only evaluated during the assignment. In this case, MOCCA will transpose a block of `A`

and then immediately write the results to `A`

, so when it moves to next block, it will use the new values of `A`

instead of the old ones. For example, MOCCA will first compute `A(0, 1) = A(1, 0)`

and then `A(1, 0) = A(0, 1)`

, losing the original value of `A(1, 0)`

.

Naturally, if a matrix or vector `a`

is resized (which erases all the content from `a`

) before the assignment and the expression has aliasing, the results will be incorrect. This is another example of harmful aliasing.

In general, the aliasing can only detect during the runtime. In the default mode (“debug”), MOCCA will analyze the expression during the runtime and determine if it has harmful aliasing or not. For example, the example above will result in the following exception:

```
terminate called after throwing an instance of 'std::runtime_error'
what(): Error 17: Aliasing detected when evaluating the expression!
```

Besides built-in methods, such as `transpose_inplace()`

, the only way to solve the aliasing issue is to evaluate the expression to a temporary and then move the results from the temporary to the object. This can be done with `eval()`

routine. Still, the built-in methods, when available, are usually more efficient, while indicating more clearly your intention.

```
using namespace mocca;
Matrix<float> A = {{11, 12, 13},
{21, 22, 23},
{31, 32, 33}};
A = eval(A.T());
std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;
```

Output:

```
A.T() =
[[ 11 21 31 ]
[ 12 22 32 ]
[ 13 23 33 ]]
```

## Expression Validation#

MOCCA will check if the created expression is valid or not. When possible, it will check during the compile time, producing compilation errors if it encounters any anomaly. These errors can be ugly and long, but they help to detect issues with your code. For example, mixing different types in the expression

```
using namespace mocca;
Matrix<float> A(3, 3, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C = A + B;
```

will result in the following compilation error:

```
./MOCCA/mocca/matexpr/ewise_binary_op.h:113:5: error: static_assert failed due to requirement
'std::is_same_v<float, double>' "Error 6: A expression cannot contain objects with different types!"
```

Many validation checks, however, cannot be done during the compilation, for example, checking if all dimensions of the objects in the expression match. In these cases, MOCCA will run runtime assertions to determine if the expression is valid and will throw C++ exceptions if it detects an illegal operation if it is in the “debug” mode.

```
using namespace mocca;
Matrix<float> A(4, 4, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C = A + B;
```

Output:

```
terminate called after throwing an instance of 'std::runtime_error'
what(): Error 20: Mismatch matrix dimensions!
```

Runtime assertions can be disabled by writing `#define MOCCA_DEBUG 0`

**before** including any MOCCA header.