This document provides detailed documentation for all public functions in the sv128 simulated vector library.
struct sv_mask {
bool data[VECTOR_WIDTH];
};A vector mask with VECTOR_WIDTH boolean lanes used for conditional operations. Each lane can be either true or false, controlling which vector lanes participate in masked operations.
struct sv_int4 {
int data[VECTOR_WIDTH];
};A vector register containing VECTOR_WIDTH integer values. Used for integer vector operations.
struct sv_float4 {
float data[VECTOR_WIDTH];
};A vector register containing VECTOR_WIDTH floating-point values. Used for float vector operations.
All sv128 operations record a simulated latency based on Intel SSE/AVX-512 reference values. Each operation accumulates its instruction latency (not pipelined throughput) into the total — using full SIMD width keeps the latency-per-result low.
| Category | Operation | Latency |
|---|---|---|
| Memory (load) | sv_load_int, sv_load_float |
7 cycles |
| Memory (store) | sv_store_int, sv_store_float |
4 cycles |
| Set / broadcast | sv_set_int, sv_set_float |
1 cycle |
| Broadcast masked | sv_set1_int, sv_set1_float |
3 cycles |
| Integer add / sub / abs | sv_int_add, sv_int_sub, sv_int_abs |
1 cycle |
| Integer min / max | sv_int_min, sv_int_max |
1 cycle |
| Integer multiply | sv_int_mul |
3 cycles |
| Integer divide | sv_int_div |
20 cycles (software-emulated) |
| Float add / sub / mul | sv_float_add, sv_float_sub, sv_float_mul |
4 cycles |
| Float abs / min / max | sv_float_abs, sv_float_min, sv_float_max |
1 cycle |
| Float divide | sv_float_div |
11 cycles |
| Float sqrt | sv_float_sqrt |
14 cycles |
| Shuffle / hadd | sv_float_hadd, sv_float_interleave |
5 cycles |
| Comparisons (int & float) | sv_int_eq/lt/le/gt/ge, sv_float_eq/lt/le/gt/ge |
3 cycles (masked) |
| Mask ops | sv_init_ones, sv_mask_and/or/not, sv_mask_all/any, sv_cntbits |
1 cycle |
| Mask init | sv_mask_all_true |
0 cycles |
Masked vs. unmasked: Most arithmetic and memory operations are masked — only active lanes consume throughput. Comparison and mask operations are unmasked — they always operate on all lanes.
void sv_logger_init();Description: Resets all performance counters to zero.
Parameters: None
Return Value: None
Example:
sv_logger_init(); // Reset performance countersvoid sv_logger_print_stats();Description: Prints a summary of collected performance statistics including total instructions, utilized lanes, lane utilization rate, and total latency.
Parameters: None
Return Value: None
Example:
sv_logger_print_stats(); // Display performance summarylong long sv_logger_get_total_instructions();Description: Returns the total number of vector instructions recorded since the last sv_logger_init() call.
Parameters: None
Return Value: The total instruction count as a long long.
Example:
long long instructions = sv_logger_get_total_instructions();
std::cout << "Instructions executed: " << instructions << std::endl;long long sv_logger_get_utilized_lanes();Description: Returns the total number of vector lanes that were active (not masked off) across all recorded instructions since the last sv_logger_init() call.
Parameters: None
Return Value: The total count of utilized lanes as a long long.
Example:
long long active_lanes = sv_logger_get_utilized_lanes();
std::cout << "Active lanes processed: " << active_lanes << std::endl;long long sv_logger_get_total_latency();Description: Returns the total simulated latency accumulated across all recorded operations since the last sv_logger_init() call. Each sv128 operation contributes a fixed latency based on Intel SSE/AVX-512 reference values (e.g. float add/mul = 4 cycles, float div = 11 cycles, int add = 1 cycle).
Parameters: None
Return Value: The total latency as a long long (in cycles).
Example:
long long cycles = sv_logger_get_total_latency();
long long utilized = sv_logger_get_utilized_lanes();
double throughput = (double)utilized / (double)cycles; // lanes per cycle
std::cout << "Throughput: " << throughput << " lanes/cycle" << std::endl;sv_int4 sv_load_int(sv_int4 passthru, const int* mem_addr, sv_mask mask);Description: Loads VECTOR_WIDTH consecutive integers from memory into a vector register. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the passthru vector.
Parameters:
passthru: The vector to use for lanes that are masked offmem_addr: Pointer to the memory location to load frommask: Mask controlling which lanes to operate on
Return Value: Vector containing the loaded integer values for active lanes and passthru values for inactive lanes
Latency: 7 cycles (masked — active lanes only)
Example:
int array[4] = {1, 2, 3, 4};
sv_int4 passthru = sv_set_int(10, 20, 30, 40);
sv_mask mask = sv_init_ones(2); // [T, T, F, F]
sv_int4 vec = sv_load_int(passthru, array, mask); // [1, 2, 30, 40]void sv_store_int(int* mem_addr, sv_int4 a, sv_mask mask);Description: Stores vector register lanes to consecutive memory locations. The operation is only performed on lanes where the mask is true. Memory locations corresponding to inactive lanes remain unchanged.
Parameters:
mem_addr: Pointer to the memory location to store toa: Vector register to storemask: Mask controlling which lanes to operate on
Return Value: None
Latency: 4 cycles (masked — active lanes only)
Example:
int result[4] = {10, 20, 30, 40}; // Initial values
sv_mask mask = sv_init_ones(2); // [T, T, F, F]
sv_store_int(result, vec, mask); // result becomes [1, 2, 30, 40]sv_int4 sv_set_int(int i0, int i1, int i2, int i3);Description: Creates a vector with explicitly provided values for each lane.
Parameters:
i0,i1,i2,i3: Values for each vector lane
Return Value: Vector with the specified values
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_int4 vec = sv_set_int(10, 20, 30, 40);sv_int4 sv_set1_int(sv_int4 passthru, int val, sv_mask mask);Description: Creates a vector with specified lanes set to the same value. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the passthru vector.
Parameters:
passthru: The vector to use for lanes that are masked offval: Value to broadcast to active lanesmask: Mask controlling which lanes to operate on
Return Value: Vector with active lanes set to the specified value and inactive lanes from passthru
Latency: 3 cycles (masked — active lanes only)
Example:
sv_int4 passthru = sv_set_int(1, 2, 3, 4);
sv_mask mask = sv_init_ones(2); // [T, T, F, F]
sv_int4 vec = sv_set1_int(passthru, 42, mask); // [42, 42, 3, 4]sv_float4 sv_load_float(sv_float4 passthru, const float* mem_addr, sv_mask mask);Description: Loads VECTOR_WIDTH consecutive floats from memory into a vector register. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the passthru vector.
Parameters:
passthru: The vector to use for lanes that are masked offmem_addr: Pointer to the memory location to load frommask: Mask controlling which lanes to operate on
Return Value: Vector containing the loaded float values for active lanes and passthru values for inactive lanes
Latency: 7 cycles (masked — active lanes only)
Example:
float array[4] = {1.5f, 2.5f, 3.5f, 4.5f};
sv_float4 passthru = sv_set_float(10.0f, 20.0f, 30.0f, 40.0f);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_float4 vec = sv_load_float(passthru, array, mask); // [1.5, 2.5, 3.5, 40.0]void sv_store_float(float* mem_addr, sv_float4 a, sv_mask mask);Description: Stores float vector register lanes to consecutive memory locations. The operation is only performed on lanes where the mask is true. Memory locations corresponding to inactive lanes remain unchanged.
Parameters:
mem_addr: Pointer to the memory location to store toa: Vector register to storemask: Mask controlling which lanes to operate on
Return Value: None
Latency: 4 cycles (masked — active lanes only)
Example:
float result[4] = {10.0f, 20.0f, 30.0f, 40.0f}; // Initial values
sv_mask mask = sv_init_ones(2); // [T, T, F, F]
sv_store_float(result, vec, mask); // result becomes [1.5, 2.5, 30.0, 40.0]sv_float4 sv_set_float(float f0, float f1, float f2, float f3);Description: Creates a float vector with explicitly provided values for each lane.
Parameters:
f0,f1,f2,f3: Values for each vector lane
Return Value: Vector with the specified values
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_float4 vec = sv_set_float(1.0f, 2.0f, 3.0f, 4.0f);sv_float4 sv_set1_float(sv_float4 passthru, float val, sv_mask mask);Description: Creates a float vector with specified lanes set to the same value. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the passthru vector.
Parameters:
passthru: The vector to use for lanes that are masked offval: Value to broadcast to active lanesmask: Mask controlling which lanes to operate on
Return Value: Vector with active lanes set to the specified value and inactive lanes from passthru
Latency: 3 cycles (masked — active lanes only)
Example:
sv_float4 passthru = sv_set_float(1.0f, 2.0f, 3.0f, 4.0f);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_float4 vec = sv_set1_float(passthru, 3.14f, mask); // [3.14, 3.14, 3.14, 4.0]sv_int4 sv_int_add(sv_int4 a, sv_int4 b, sv_mask mask);Description: Performs element-wise addition of two integer vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise sum for active lanes and values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_int4 a = sv_set_int(1, 2, 3, 4);
sv_int4 b = sv_set_int(5, 6, 7, 8);
sv_mask mask = sv_init_ones(2); // [T, T, F, F]
sv_int4 result = sv_int_add(a, b, mask); // [6, 8, 3, 4]sv_int4 sv_int_sub(sv_int4 a, sv_int4 b, sv_mask mask);Description: Performs element-wise subtraction of two integer vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise difference for active lanes and values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_int4 result = sv_int_sub(b, a, mask); // [4, 4, 7, 8]sv_int4 sv_int_mul(sv_int4 a, sv_int4 b, sv_mask mask);Description: Performs element-wise multiplication of two integer vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise product for active lanes and values from 'a' for inactive lanes
Latency: 3 cycles (masked — active lanes only)
Example:
sv_int4 result = sv_int_mul(a, b, mask); // [5, 12, 3, 4]sv_int4 sv_int_div(sv_int4 a, sv_int4 b, sv_mask mask);Description: Performs element-wise division of two integer vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: Dividend vector (also provides values for masked-off lanes)b: Divisor vectormask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise quotient for active lanes and values from 'a' for inactive lanes
Latency: 20 cycles (masked — software-emulated integer division)
Example:
sv_int4 result = sv_int_div(b, a, mask); // [5, 3, 7, 8]sv_int4 sv_int_abs(sv_int4 a, sv_mask mask);Description: Computes the absolute value of each element in an integer vector. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the input vector (a).
Parameters:
a: Input vector (also provides values for masked-off lanes)mask: Mask controlling which lanes to operate on
Return Value: Vector containing the absolute values for active lanes and original values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_int4 negative = sv_set_int(-1, -2, 3, -4);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_int4 result = sv_int_abs(negative, mask); // [1, 2, 3, -4]sv_float4 sv_float_add(sv_float4 a, sv_float4 b, sv_mask mask);Description: Performs element-wise addition of two float vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise sum for active lanes and values from 'a' for inactive lanes
Latency: 4 cycles (masked — active lanes only)
Example:
sv_float4 a = sv_set_float(1.5f, 2.5f, 3.5f, 4.5f);
sv_float4 b = sv_set_float(0.5f, 1.0f, 2.0f, 3.0f);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_float4 result = sv_float_add(a, b, mask); // [2.0, 3.5, 5.5, 4.5]sv_float4 sv_float_sub(sv_float4 a, sv_float4 b, sv_mask mask);Description: Performs element-wise subtraction of two float vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise difference for active lanes and values from 'a' for inactive lanes
Latency: 4 cycles (masked — active lanes only)
Example:
sv_float4 result = sv_float_sub(a, b, mask); // [1.0, 1.5, 1.5, 4.5]sv_float4 sv_float_mul(sv_float4 a, sv_float4 b, sv_mask mask);Description: Performs element-wise multiplication of two float vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise product for active lanes and values from 'a' for inactive lanes
Latency: 4 cycles (masked — active lanes only)
Example:
sv_float4 result = sv_float_mul(a, b, mask); // [0.75, 2.5, 7.0, 4.5]sv_float4 sv_float_div(sv_float4 a, sv_float4 b, sv_mask mask);Description: Performs element-wise division of two float vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: Dividend vector (also provides values for masked-off lanes)b: Divisor vectormask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise quotient for active lanes and values from 'a' for inactive lanes
Latency: 11 cycles (masked — active lanes only)
Example:
sv_float4 result = sv_float_div(a, b, mask); // [3.0, 2.5, 1.75, 4.5]sv_float4 sv_float_abs(sv_float4 a, sv_mask mask);Description: Computes the absolute value of each element in a float vector. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the input vector (a).
Parameters:
a: Input vector (also provides values for masked-off lanes)mask: Mask controlling which lanes to operate on
Return Value: Vector containing the absolute values for active lanes and original values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_float4 negative = sv_set_float(-1.5f, -2.5f, 3.5f, -4.5f);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_float4 result = sv_float_abs(negative, mask); // [1.5, 2.5, 3.5, -4.5]sv_float4 sv_float_sqrt(sv_float4 a, sv_mask mask);Description: Computes the square root of each element in a float vector. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the input vector (a).
Parameters:
a: Input vector (also provides values for masked-off lanes)mask: Mask controlling which lanes to operate on
Return Value: Vector containing the square roots for active lanes and original values from 'a' for inactive lanes
Latency: 14 cycles (masked — active lanes only)
Example:
sv_float4 squares = sv_set_float(1.0f, 4.0f, 9.0f, 16.0f);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_float4 result = sv_float_sqrt(squares, mask); // [1.0, 2.0, 3.0, 16.0]sv_int4 sv_int_min(sv_int4 a, sv_int4 b, sv_mask mask);Description: Computes the element-wise minimum of two integer vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise minimum for active lanes and values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_int4 a = sv_set_int(5, 2, 8, 1);
sv_int4 b = sv_set_int(3, 6, 4, 9);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_int4 result = sv_int_min(a, b, mask); // [3, 2, 4, 1]sv_int4 sv_int_max(sv_int4 a, sv_int4 b, sv_mask mask);Description: Computes the element-wise maximum of two integer vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise maximum for active lanes and values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_int4 result = sv_int_max(a, b, mask); // [5, 6, 8, 1]sv_float4 sv_float_min(sv_float4 a, sv_float4 b, sv_mask mask);Description: Computes the element-wise minimum of two float vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise minimum for active lanes and values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_float4 a = sv_set_float(5.5f, 2.1f, 8.3f, 1.7f);
sv_float4 b = sv_set_float(3.2f, 6.8f, 4.1f, 9.5f);
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
sv_float4 result = sv_float_min(a, b, mask); // [3.2, 2.1, 4.1, 1.7]sv_float4 sv_float_max(sv_float4 a, sv_float4 b, sv_mask mask);Description: Computes the element-wise maximum of two float vectors. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the first operand (a).
Parameters:
a: First vector operand (also provides values for masked-off lanes)b: Second vector operandmask: Mask controlling which lanes to operate on
Return Value: Vector containing the element-wise maximum for active lanes and values from 'a' for inactive lanes
Latency: 1 cycle (masked — active lanes only)
Example:
sv_float4 result = sv_float_max(a, b, mask); // [5.5, 6.8, 8.3, 1.7]sv_float4 sv_float_hadd(sv_float4 a, sv_mask mask);Description: Performs horizontal addition on pairs of adjacent elements. Transforms [a,b,c,d] to [a+b, a+b, c+d, c+d]. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the input vector (a).
Parameters:
a: Input vector (also provides values for masked-off lanes)mask: Mask controlling which lanes to operate on
Return Value: Vector with horizontal sums for active lanes and original values from 'a' for inactive lanes
Latency: 5 cycles (masked — active lanes only)
Example:
sv_float4 input = sv_set_float(1.0f, 2.0f, 3.0f, 4.0f);
sv_mask mask = sv_init_ones(2); // [T, T, F, F]
sv_float4 result = sv_float_hadd(input, mask); // [3.0, 3.0, 3.0, 4.0]sv_float4 sv_float_interleave(sv_float4 a, sv_mask mask);Description: Interleaves elements by swapping the middle two elements. Transforms [a,b,c,d] to [a,c,b,d]. The operation is only performed on lanes where the mask is true. For lanes where the mask is false, the result comes from the input vector (a).
Parameters:
a: Input vector (also provides values for masked-off lanes)mask: Mask controlling which lanes to operate on
Return Value: Vector with interleaved elements for active lanes and original values from 'a' for inactive lanes
Latency: 5 cycles (masked — active lanes only)
Example:
sv_float4 input = sv_set_float(1.0f, 2.0f, 3.0f, 4.0f);
sv_mask mask = sv_init_ones(4); // [T, T, T, T]
sv_float4 result = sv_float_interleave(input, mask); // [1.0, 3.0, 2.0, 4.0]sv_mask sv_int_eq(sv_int4 a, sv_int4 b, sv_mask mask)Description: Performs element-wise equality comparison between two integer vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes are equal
Latency: 3 cycles (masked — active lanes only)
Example:
sv_int4 a = sv_set_int(1, 2, 3, 4);
sv_int4 b = sv_set_int(1, 0, 3, 5);
sv_mask result = sv_int_eq(a, b, all_true); // [T, F, T, F]sv_mask sv_int_lt(sv_int4 a, sv_int4 b, sv_mask mask)Description: Performs element-wise less-than comparison between two integer vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are less than b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_mask result = sv_int_lt(a, b, all_true); // [F, F, F, T]sv_mask sv_int_gt(sv_int4 a, sv_int4 b, sv_mask mask)Description: Performs element-wise greater-than comparison between two integer vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are greater than b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_mask result = sv_int_gt(a, b, all_true); // [F, T, F, F]sv_mask sv_int_le(sv_int4 a, sv_int4 b, sv_mask mask)Description: Performs element-wise less-than-or-equal comparison between two integer vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are less than or equal to b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_int4 a = sv_set_int(1, 2, 3, 4);
sv_int4 b = sv_set_int(1, 0, 3, 5);
sv_mask result = sv_int_le(a, b, all_true); // [T, F, T, T]sv_mask sv_int_ge(sv_int4 a, sv_int4 b, sv_mask mask)Description: Performs element-wise greater-than-or-equal comparison between two integer vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are greater than or equal to b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_mask result = sv_int_ge(a, b, all_true); // [T, T, T, F]sv_mask sv_float_eq(sv_float4 a, sv_float4 b, sv_mask mask)Description: Performs element-wise equality comparison between two float vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes are equal
Latency: 3 cycles (masked — active lanes only)
Example:
sv_float4 a = sv_set_float(1.0f, 2.0f, 3.0f, 4.0f);
sv_float4 b = sv_set_float(1.0f, 0.0f, 3.0f, 5.0f);
sv_mask result = sv_float_eq(a, b, all_true); // [T, F, T, F]sv_mask sv_float_lt(sv_float4 a, sv_float4 b, sv_mask mask)Description: Performs element-wise less-than comparison between two float vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are less than b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_mask result = sv_float_lt(a, b, all_true); // [F, F, F, T]sv_mask sv_float_gt(sv_float4 a, sv_float4 b, sv_mask mask)Description: Performs element-wise greater-than comparison between two float vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are greater than b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_mask result = sv_float_gt(a, b, all_true); // [F, T, F, F]sv_mask sv_float_le(sv_float4 a, sv_float4 b, sv_mask mask)Description: Performs element-wise less-than-or-equal comparison between two float vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are less than or equal to b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_float4 a = sv_set_float(1.0f, 2.0f, 3.0f, 4.0f);
sv_float4 b = sv_set_float(1.0f, 0.0f, 3.0f, 5.0f);
sv_mask result = sv_float_le(a, b, all_true); // [T, F, T, T]sv_mask sv_float_ge(sv_float4 a, sv_float4 b, sv_mask mask)Description: Performs element-wise greater-than-or-equal comparison between two float vectors.
Parameters:
a: First vector operandb: Second vector operandmask: Mask controlling which lanes to compare; inactive lanes outputfalse(zeroing)
Return Value: Mask indicating which lanes of a are greater than or equal to b
Latency: 3 cycles (masked — active lanes only)
Example:
sv_mask result = sv_float_ge(a, b, all_true); // [T, T, T, F]sv_mask sv_mask_all_true();Description: Creates a mask with all lanes set to true.
Parameters: None
Return Value: Mask with all lanes set to true
Latency: 0 cycles (no counter recorded)
Example:
sv_mask mask = sv_mask_all_true(); // [T, T, T, T]sv_mask sv_init_ones(int first_n);Description: Creates a mask with the first n lanes set to true and the remaining lanes set to false.
Parameters:
first_n: Number of lanes to set to true (from the beginning)
Return Value: Mask with the specified pattern
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_mask mask = sv_init_ones(2); // [T, T, F, F]sv_mask sv_mask_not(sv_mask a);Description: Performs logical NOT operation on each lane of a mask.
Parameters:
a: Input mask
Return Value: Mask with inverted values
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_mask input = sv_init_ones(2); // [T, T, F, F]
sv_mask result = sv_mask_not(input); // [F, F, T, T]sv_mask sv_mask_or(sv_mask a, sv_mask b);Description: Performs element-wise logical OR operation on two masks.
Parameters:
a: First mask operandb: Second mask operand
Return Value: Mask containing the OR results
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_mask a = sv_init_ones(2); // [T, T, F, F]
sv_mask b = sv_init_ones(3); // [T, T, T, F]
sv_mask result = sv_mask_or(a, b); // [T, T, T, F]sv_mask sv_mask_and(sv_mask a, sv_mask b);Description: Performs element-wise logical AND operation on two masks.
Parameters:
a: First mask operandb: Second mask operand
Return Value: Mask containing the AND results
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_mask result = sv_mask_and(a, b); // [T, T, F, F]bool sv_mask_all(sv_mask a);Description: Checks if all lanes in a mask are true.
Parameters:
a: Input mask
Return Value: True if all lanes are true, false otherwise
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_mask mask1 = sv_mask_all_true(); // [T, T, T, T]
sv_mask mask2 = sv_init_ones(3); // [T, T, T, F]
bool result1 = sv_mask_all(mask1); // true
bool result2 = sv_mask_all(mask2); // falsebool sv_mask_any(sv_mask a);Description: Checks if any lane in a mask is true.
Parameters:
a: Input mask
Return Value: True if at least one lane is true, false otherwise
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_mask mask1 = sv_init_ones(1); // [T, F, F, F]
sv_mask mask2 = sv_mask_not(sv_mask_all_true()); // [F, F, F, F]
bool result1 = sv_mask_any(mask1); // true
bool result2 = sv_mask_any(mask2); // falseint sv_cntbits(sv_mask a);Description: Counts the number of true lanes in a mask.
Parameters:
a: Input mask
Return Value: Number of true lanes
Latency: 1 cycle (unmasked — all lanes)
Example:
sv_mask mask = sv_init_ones(3); // [T, T, T, F]
int count = sv_cntbits(mask); // 3std::ostream& operator<<(std::ostream& os, const sv_int4& v);Description: Stream insertion operator for printing integer vectors in a readable format.
Example:
sv_int4 vec = sv_set_int(1, 2, 3, 4);
std::cout << vec; // Output: [1, 2, 3, 4]std::ostream& operator<<(std::ostream& os, const sv_float4& v);Description: Stream insertion operator for printing float vectors in a readable format.
Example:
sv_float4 vec = sv_set_float(1.5f, 2.5f, 3.5f, 4.5f);
std::cout << vec; // Output: [1.5, 2.5, 3.5, 4.5]std::ostream& operator<<(std::ostream& os, const sv_mask& m);Description: Stream insertion operator for printing masks in a readable format using T/F notation.
Example:
sv_mask mask = sv_init_ones(2);
std::cout << mask; // Output: [T, T, F, F]