Matrix and Vector Operations

The MOCCA library was designed around Expression Templates [1] [2] in order to allow users to write mathematical expressions as close to the textbooks as possible. These expressions are then lazily evaluated to achieve high performance. In other words, some operators (e.g., +) and methods (e.g., mat.T()) return an abstract object representing the operation without performing any actual computation. The computation only happens later, when the entire expression is evaluated, usually in the assignment operator =. Although this sounds inefficient, modern compilers can optimize all the abstraction away and generate a perfectly optimized code. For example, the code

Matrix<float> vec1(n);
Matrix<float> vec2(n);
Matrix<float> result(n);
// ...
result = 5 * vec1 + vec2;

is transformed in a single for-loop, so the vectors are traversed only once and no temporary is created. Without any further optimizations (e.g., SIMD), the loop looks like this:

for (int i = 0; i < n; ++i) {
   result[i] = 5 * vec1[i] + vec2[i];
}

Therefore, you should not be afraid to write mathematical expressions with many terms. This will only give MOCCA more opportunities to optimize the code.

Warning

Due to the SIMD optimization, expressions can only contain matrix or vectors with the same type, e.g., float.

References:

Element-wise Operations

In MOCCA, the C++ operators +, -, * and / express element-wise operations between matrices, vectors and scalars. Naturally, both the left and right-hand sides of the operation must have the same number of rows and columns (see Broadcast when mixing vectors and matrices). All element-wise operations can be applied for both dense and sparse datatypes.

  • Addition: mat1 + mat2, vec1 + vec2, mat + vec, scalar + mat and scalar + vec

  • Subtraction: mat1 - mat2, vec1 - vec2, mat - vec, scalar - mat and scalar - vec

  • Multiplication: mat1 * mat2, vec1 * vec2, mat * vec, scalar * mat and scalar * vec

  • Division [*] : mat1 / mat2, vec1 / vec2, mat / vec, scalar / mat and scalar / vec

  • Unary Minus: -mat and -vec

  • Compound operators: +=, -=, *= and /= with the same rules as above.

Note

See BLAS and LAPACK for calculating the product between two matrices as defined in linear algebra.

Some mathematical functions can also be applied element-wise:

  • Absolute Value: abs(vec) and abs(mat)

  • Square Root: sqrt(vec) and sqrt(mat)

  • y-th Power: pow(vec, y) and pow(mat, y)

  • Exponential: exp(vec) and exp(mat)

using namespace mocca;

Matrix<int> mat1 = {{1, 2},
                    {3, 4}};

Matrix<int> mat2 = {{0, 1},
                    {4, 7}};

Matrix<int> res(2, 2);

res = mat1 * mat2;

std::cout << "mat1 * mat2 =" << std::endl;
std::cout << res << std::endl << std::endl;

res += 3;

std::cout << "res += 3" << std::endl;
std::cout << res << std::endl << std::endl;

Output::

mat1 * mat2 =
[[    0     2 ]
 [   12    28 ]]

res += 3
[[    3     5 ]
 [   15    31 ]]

Note

Note

Element-wise unary operations (e.g., csr + 5 or abs(csr)) will only affect the nonzero entries of the sparse expression (i.e., an expression that includes a CSR Matrix or Sparse Vector)

You can also mix objects with different storage schemes:

using namespace mocca;

Matrix<int> dense1(nrows, ncols);
Matrix<int> dense2(nrows, ncols);
CSRMatrix<int> csr(nrows, ncols);
Matrix<int> dense3(nrows, ncols);

//...
dense3 = dense1 + csr + dense2;

However, performance-wise, it is better to calculate the previous example in two steps:

dense3 = dense1 + dense2;
dense3 += csr;

In this way, your code fully exploits the higher performance of the dense storages (e.g., no indirection, SIMD, etc.) and only have to pay the cost of the slow evaluation of the sparse expression on the few non-zero entries in the csr matrix.

User-defined Operations

You can also create custom element-wise operations. There are two kinds: unary and binary operations. The first applies an element-wise operation (.e.g., log()) over a container, while the second, is an element-wise operation between two objects (.e.g, an addition). However, the functor that implements the target operation must accept both a scalar and a mocca::simd::Register. For example, for a custom unary operation,

using namespace mocca;

template<typename T>
struct custom_op
{
   T operator()(T x)
   {
      return std::log(x);
   }

   simd::Register<T> operator()(const simd::Register<T>& x)
   {
      return simd::log(x);
   }
};

Naturally, for a custom binary operation, the functor should accept two arguments:

template<typename T>
struct custom_op
{
   T operator()(T x, T y);
   simd::Register<T> operator()(const simd::Register<T>& x, const simd::Register<T>& y);
};

After you created the functor, you can call either unary_op() or binary_op() to apply the operation over the target object(s).

Matrix<float> A = {{0.4, 0.5, 0.6},
                   {0.2, 0.6, 1.6},
                   {2.4, 1.1, 0.05}};

Matrix<float> B = unary_op(A, custom_op<float>());

std::cout << "log(A) = " << B << std::endl;

Output::

log(A) =
[[ -0.9162907 -0.6931472 -0.5108256 ]
 [ -1.609438 -0.5108256 0.47000363 ]
 [ 0.8754688 0.095310204 -2.9957323 ]]

Note

The user-defined operation obeys the same principles as a built-in operation, for example, an unary operation only affects the nonzero entries in a sparse expression and binary operations can trigger a Broadcast when mixing vectors and matrices.

Broadcast

In a Broadcast [3] operation, a row or column Vector is replicated over one direction to be interpreted as a matrix. MOCCA will implicitly broadcast a vector expression (i.e., an expression only containing vectors) to match the dimension of the target matrix when using the operators +, -, * and / or evaluating the expression to a matrix. For example,

using namespace mocca;

Matrix<float> A = {{1, 2, 3},
                   {4, 5, 6},
                   {7, 8, 9}};

Vector<float> v = {5, 2, 10};
Matrix<float> B(3, 3);

B = v;
std::cout << "Broadcast in the assignment:" << std::endl;
std::cout << B << std::endl;

std::cout << "A / v =" << std::endl;
std::cout << B << std::endl;

Output:

Broadcast in the assignment:
[[   5     5     5   ]
 [   2     2     2   ]
 [  10    10    10   ]]

A / v =
[[  0.2   0.4   0.6  ]
 [   2    2.5    3   ]
 [  0.7   0.8   0.9  ]]

There are two equivalent ways to interpret the broadcast in this example. It divides each column of the matrix A by the vector v. The second interpretation is that the vector v is replicated 3 times to form a 3x3 matrix and then the program divides A by this matrix (the operator / denote element-wise division):

\[\begin{split}\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} / \begin{bmatrix} 5 & 5 & 5 \\ 2 & 2 & 2 \\ 10 & 10 & 10 \end{bmatrix} = \begin{bmatrix} 0.2 & 0.4 & 0.6 \\ 2 & 2.5 & 3 \\ 0.7 & 0.8 & 0.9 \end{bmatrix}\end{split}\]

Note that the target matrix can be either sparse or dense, but only Vector objects can be broadcasted. The direction of the broadcast is determined by the orientation of the vector: column vectors are always broadcasted over the matrix columns, while row vectors are over the matrix rows. Naturally, transposing the vector changes the direction of the broadcast.

Note

Before broadcasting a vector and assigning it to a matrix, you must set the matrix to the correct dimensions.

A broadcast also occurs when performing element-wise operations (+, -, * and /) between row and column vectors. For example,

using namespace mocca;

Vector<float> v = {6, 3, 12};
RowVector<float> rv = {5, 0.5, 1, 5};
Matrix<float> A = v * rv;
Matrix<float> B = outer(v, rv);

std::cout << "v * rv =" << std::endl;
std::cout << A << std::endl << std::endl;

std::cout << "is equivalent to" << std::endl << std::endl;

std::cout << "outer(v, rv) =" << std::endl;
std::cout << B << std::endl;

Output::

v * rv =
[[   30     3     6    30 ]
 [   15   1.5     3    15 ]
 [   60     6    12    60 ]]


is equivalent to

outer(v, rv) =
[[   30     3     6    30 ]
 [   15   1.5     3    15 ]
 [   60     6    12    60 ]]

In this case, both the column vector v and the row vector rv are replicated 4 and 3 times, respectively, to form two 3x4 matrices, which are then multiplied together in an element-wise fashion. This is equivalent to computing the outer product between the two vectors.

Note

Assigning a row vector to a column vector, or vice-versa, will not trigger a broadcast. Instead, the source vector will be automatically transposed to match the orientation of the destination vector.

Reference:

Transpose

The transpose \(\mathbf{A}^T\) of a matrix or vector \(\mathbf{A}\) can be calculated by calling the methods A.T() or A.transpose(). This operation is only valid for dense datatypes.

Note

Transposing a Vector only change the orientation of the vector (e.g., column -> row vector) wihout performing any computation. This can be used for controlling the direction of the Broadcast.

using namespace mocca;

Matrix<float> A = {{11, 12, 13, 14},
                   {21, 22, 23, 24},
                   {31, 32, 33, 34}};

Matrix<float> B = A.T();

std::cout << "A.T() =" << std::endl;
std::cout << B << std::endl;

Output::

A.T() =
[[   11    21    31 ]
 [   12    22    32 ]
 [   13    23    33 ]
 [   14    24    34 ]]

Similar to the other operations, the transpose() will create an abstract object without doing the actual transposition. For example, in the B = A.T() expression, the transpose only will be evaluated when assigning the results to B. This creates a problem when trying to assign the transpose to itself: A = A.T(). MOCCA will start to write the results to A before completing the evaluation of the transposition. Therefore, the A = A.T() do not replace the A with its transpose A.T as expected:

using namespace mocca;

Matrix<float> A = {{11, 12, 13},
                   {21, 22, 23},
                   {31, 32, 33}};

// NEVER DO THIS!
A = A.T();

std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;

which produces

A.T() =
[[  11    21    31 ]
 [  21    22    32 ]
 [  31    32    33 ]]

But the correct output is

A.T() =
[[  11    21    31 ]
 [  12    22    32 ]
 [  13    23    33 ]]

We denote this problem as Aliasing. In “debug” mode (i.e., with MOCCA_DEBUG = 1), MOCCA will automatically detect the presence of aliasing in the expression and will throw a std::runtime_error exception accordingly.

For in-place transposition (i.e., A = A.T()) use the transpose_inplace() method instead.

using namespace mocca;

Matrix<float> A = {{11, 12, 13},
                   {21, 22, 23},
                   {31, 32, 33}};

A.transpose_inplace();

std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;

Output:

A.T() =
[[   11    21    31 ]
 [   12    22    32 ]
 [   13    23    33 ]]

Diagonal

You can use the diag() method to extract the diagonal of an expression and the diag_mat() method to construct a diagonal matrix from a vector expression. The diagonal matrix is considered a sparse expression.

using namespace mocca;

Matrix<float> A = {{1, 2, 3},
                   {4, 5, 6},
                   {7, 8, 9}};

Vector<float> A_diag = diag(A);
Matrix<float> D = diag_mat(A_diag);

std::cout << "Diagonal of A:" << A_diag << std::endl;
std::cout << "Diagonal Matrix from diag(A): " << D << std::endl;

return 0;

Output:

Diagonal of A:
[   1
    5
    9   ]

Diagonal Matrix from diag(A):
[[   1     0     0   ]
 [   0     5     0   ]
 [   0     0     9   ]]

Note

Note that you cannot access individual entries of the objects returned by the methods diag() and diag_mat() nor you can assign values to them.

Reductions

A Reduction operation combines all the entries of a matrix, vector or related expression into a single scalar value. Here is the list of all the reductions available:

  • Sum: sum()

  • Product: prod()

  • Minimum Value: min()

  • Maximum Value: max()

Note

Currently, there is no way to define a custom reduction operation.

One of the most common reductions is the sum(), which returns the sum of all entries in the expression.

using namespace mocca;

Matrix<float> A = {{1, 4, 0, 3},
                   {2, 1, 2, 2},
                   {5, 10, 1, 3}};
Vector<float> v = {2, 3, 0};

float sum_A = sum(A);
float sum_Av = sum(A + v);

std::cout << "The sum of all entries of A is " << sum_A << std::endl;
std::cout << "The sum of all entries of (A + v) is " << sum_Av << std::endl;

Output:

The sum of all entries of A is 34
The sum of all entries of (A + v) is 54

Partial Reductions are also supported. A partial reduction calculates the reduction for each row or column of the matrix expression and then stores the results in a column/row vector [4]. The axis of the partial reduction is defined by the tags rowwise, colwise and no_axis. A row-wise reduction will always returns a column vector, while a column-wise reduction, a row vector. For example,

using namespace mocca;

Matrix<float> A = {{1, 4, 0, 3},
                   {2, 1, 2, 2},
                   {5, 10, 1, 3}};

float max_A = max(A, no_axis);
Vector<float> max_row_A = max(A, rowwise);
RowVector<float> max_col_A = max(A, colwise);

std::cout << "Maximum value among all entries is " << max_A << std::endl << std::endl;
std::cout << "Maximum value per row is " << std::endl << max_row_A << std::endl << std::endl;
std::cout << "Maximum value per col is " << std::endl << max_col_A << std::endl;

Output:

Maximum value per row is
[    4
    2
   10     ]


Maximum value per col is
[    5    10     2      3     ]

Note

Due to the CSR storage format, CSR Matrix only supports row-wise and full reductions. This operation also cannot be applied to sparse expressions.

Note

Norm

MOCCA can compute the following norms for any vector expression:

  • norm_l1(x) = \(\|\vec{x}\|_1 = |x_1| + |x_2| + \dots + |x_n|\) (\(l^1\) norm)

  • norm_l2(x) = \(\|\vec{x}\|_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}\) (\(l^2\) norm)

  • norm_max(x) = \(\|\vec{x}\|_\infty = max(|x_1|, |x_2|, \dots, |x_n|)\) (\(l^\infty\) norm)

These norms can also be applied to matrices. In this case, a \(N \times M\) matrix is seen as a vector with \(N \cdot M\) entries. Therefore, the norm_l2(x) will return the Frobenius or Hilbert-Schmidt norm of the matrix. MOCCA can also calculate the norm of each row or column of the matrix using the tags rowwise, colwise and no_axis.

using namespace mocca;

Matrix<float> A = {{1, 4, 0},
                   {2, 1, 2},
                   {5, 10, 1}};

float norm_l2_A = norm_l2(A);
Vector<float> norm_l2_A_row = norm_l2(A, rowwise);

std::cout << "||A||_2 = " << norm_l2_A << std::endl << std::endl;
std::cout << "L2 norm of each row:" << std::endl;
std::cout << norm_l2_A_row << std::endl;
||A||_2 = 12.3288

L2 norm of each row:
[4.12311
   3
 11.225  ]

Scalar and Outer Products

The dot (or scalar) product of two vectors can be calculated using the dot() method. Likewise, the outer product can be computed with the outer method.

using namespace mocca;

Vector<float> v1 = {2, 3, 1, 3};
Vector<float> v2 = {5, 2, 1, 1};
float scalar_prod = dot(v1, v2);
Matrix<float> outer_prod = outer(v1, v2);

std::cout << "dot(v1, v2) = " << scalar_prod << std::endl << std::endl;
std::cout << "outer(v1, v2) = " << std::endl;
std::cout << outer_prod << std::endl;

Output:

dot(v1, v2) = 20

outer(v1, v2) =
[[   10     4     2     2 ]
 [   15     6     3     3 ]
 [    5     2     1     1 ]
 [   15     6     3     3 ]]

Aliasing

In MOCCA, we say that an expression has aliasing when a matrix or vector appears on both sides of the assignment operation, for instance, mat1 = mat1 + mat2 or mat1 = mat1.T(). Although in the first case, the aliasing is harmless, the second one will produce incorrect results. To better explain, let us recall the example from the Transpose:

using namespace mocca;

Matrix<float> A = {{11, 12, 13},
                   {21, 22, 23},
                   {31, 32, 33}};

// NEVER DO THIS!
A = A.T();

std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;

Output:

A.T() =
[[  11    21    31 ]
 [  21    22    32 ]
 [  31    32    33 ]]

The Aliasing problem arises due to the lazy evaluation, i.e., the expression is only evaluated during the assignment. In this case, MOCCA will transpose a block of A and then immediately write the results to A, so when it moves to next block, it will use the new values of A instead of the old ones. For example, MOCCA will first compute A(0, 1) = A(1, 0) and then A(1, 0) = A(0, 1), losing the original value of A(1, 0).

Naturally, if a matrix or vector a is resized (which erases all the content from a) before the assignment and the expression has aliasing, the results will be incorrect. This is another example of harmful aliasing.

In general, the aliasing can only detect during the runtime. In the default mode (“debug”), MOCCA will analyze the expression during the runtime and determine if it has harmful aliasing or not. For example, the example above will result in the following exception:

terminate called after throwing an instance of 'std::runtime_error'
   what():  Error 17: Aliasing detected when evaluating the expression!

Besides built-in methods, such as transpose_inplace(), the only way to solve the aliasing issue is to evaluate the expression to a temporary and then move the results from the temporary to the object. This can be done with eval() routine. Still, the built-in methods, when available, are usually more efficient, while indicating more clearly your intention.

using namespace mocca;

Matrix<float> A = {{11, 12, 13},
                   {21, 22, 23},
                   {31, 32, 33}};

A = eval(A.T());

std::cout << "A.T() =" << std::endl;
std::cout << A << std::endl;

Output:

A.T() =
[[   11    21    31 ]
 [   12    22    32 ]
 [   13    23    33 ]]

Expression Validation

MOCCA will check if the created expression is valid or not. When possible, it will check during the compile time, producing compilation errors if it encounters any anomaly. These errors can be ugly and long, but they help to detect issues with your code. For example, mixing different types in the expression

using namespace mocca;

Matrix<float> A(3, 3, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C = A + B;

will result in the following compilation error:

./MOCCA/mocca/matexpr/ewise_binary_op.h:113:5: error: static_assert failed due to requirement
'std::is_same_v<float, double>' "Error 6: A expression cannot contain objects with different types!"

Many validation checks, however, cannot be done during the compilation, for example, checking if all dimensions of the objects in the expression match. In these cases, MOCCA will run runtime assertions to determine if the expression is valid and will throw C++ exceptions if it detects an illegal operation if it is in the “debug” mode.

using namespace mocca;

Matrix<float> A(4, 4, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C = A + B;

Output:

terminate called after throwing an instance of 'std::runtime_error'
what():  Error 20: Mismatch matrix dimensions!

Runtime assertions can be disabled by writing #define MOCCA_DEBUG 0 before including any MOCCA header.