.. _api_simd: ================== SIMD Primitives ================== As a way to faciliate the SIMD programming, MOCCA provided a simple interface over the raw SIMD instrinsic and datatypes, while also providing the most common methods and operations. Here, we define 3 datatypes: - ``Register`` - A SIMD register. The width of the registers and the number of lanes depends on the target instruction set (e.g., AVX2) and underlying type (e.g., ``double``, ``float``, ``int``, etc.). - ``Mask`` - A mask over a SIMD register. - ``Tile`` - A group of SIMD registers forming a square matrix. The size of the ``Tile`` depends on the number of lanes in the register. .. note:: There is no automatic type conversion in SIMD. .. .. doxygennamespace:: mocca::simd .. :project: mocca .. :members: .. Type Cast .. --------- .. .. .. cpp:function:: template Register rebind(const Register& a) .. .. Rebind a SIMD register from the type `From` to type `To`. This routine does **not** convert the data between the types. .. .. Type Conversion .. --------------- .. .. .. cpp:function:: template Register to_float(const Register& vec) .. .. Converts the integers in each lane of `vec` to floating-point numbers with the same number of bytes (e.g., `int -> float`). .. .. .. cpp:function:: template Register to_integer(const Register& vec) .. .. Converts each lane of `vec` from the type `From` to the type `To`. The types `From` and `To` **must** have the same number of bytes (e.g., `float` and `int`). .. .. .. cpp:function:: template Register convert_to(const Register& vec) .. .. Converts each lane of `vec` from the type `From` to the type `To`. The types `From` and `To` **must** have the same number of bytes (e.g., `float` and `int`). .. .. Initialization .. --------------- .. .. .. cpp:function:: template Register zeros() .. .. Intialize a Register with all bits set to ``0``. .. .. .. cpp:function:: template Register ones() .. .. Intialize a Register with all bits set to ``1``. .. .. .. cpp:function:: template Register set_all(T val) .. .. Intialize a Register with all lanes set to ``val``. .. .. .. cpp:function:: template Register set(Args... args) .. .. Initialized as `[x1, x2, ..., xn]`. .. .. .. cpp:function:: template Register sign_bit() .. .. Intialize a Register with only the most significant bit in each lane is set to ``1``. .. .. Memory Operations .. ----------------- .. .. .. cpp:function:: template Register load(T* ptr) .. .. Loads a data block from the memory position `ptr`. The memory address **must** be aligned by 32 bytes. .. .. .. cpp:function:: template Register loadu(T* ptr) .. .. Loads a data block from the memory position `ptr`. There is no alignment requirement. .. .. .. cpp:function:: template Register mask_load(T* ptr, const Mask256& m) .. .. Conditionally loads data from the memory position `ptr` considering the mask `m`. The skipped lanes are set to `0`. .. .. .. cpp:function:: template Register strided_load(T* ptr, index_t stride) .. .. Loads a data block from memory, beginning on the `ptr` position. There are a `stride` between two consecutive elements in block. .. .. .. cpp:function:: template void store(const Register& vec, T* ptr) .. .. Stores data from `vec` to the memory position `ptr`. The memory address **must** be aligned by 32 bytes. .. .. .. cpp:function:: template void storeu(const Register& vec, T* ptr) .. .. Stores data from `vec` to the memory position `ptr`. There is no alignment requirement. .. .. .. cpp:function:: template void mask_store(const Register& vec, T* ptr, const Mask& m) .. .. Conditionally stores data from `vec` to the memory position `ptr` considering the mask `m`. .. .. .. cpp:function:: template void strided_store(const Register& reg, T* ptr, index_t stride) .. .. Stores data from `vec` to a memory block, beginning on the `ptr` position. There are a `stride` between two consecutive elements in the memory block.