devela::_dep::safe_arch

Module naming_conventions

Available on crate feature dep_safe_arch only.
Expand description

An explanation of the crate’s naming conventions.

This crate attempts to follow the general naming scheme of verb_type when the operation is “simple”, and verb_description_words_type when the operation (op) needs to be more specific than normal. Like this:

  • add_m128
  • add_saturating_i8_m128i

§Types

Currently, only x86 and x86_64 types are supported. Among those types:

  • m128 and m256 are always considered to hold f32 lanes.
  • m128d and m256d are always considered to hold f64 lanes.
  • m128i and m256i hold integer data, but each op specifies what lane width of integers the operation uses.
  • If the type has _s on the end then it’s a “scalar” operation that affects just the lowest lane. The other lanes are generally copied forward from one of the inputs, though the details there vary from op to op.
  • The SIMD types are often referred to as “registers” because each SIMD typed value represents exactly one CPU register when you’re doing work.

§Operations

There’s many operations that can be performed. When possible, safe_arch tries to follow normal Rust naming (eg: adding is still add and left shifting is still shl), but if an operation doesn’t normally exist at all in Rust then we basically have to make something up.

Many operations have more than one variant, such as add and also add_saturating. In this case, safe_arch puts the “core operation” first and then any “modifiers” go after, which isn’t how you might normally say it in English, but it makes the list of functions sort better.

As a general note on SIMD terminology: When an operation uses the same indexed lane in two different registers to determine the output, that is a “vertical” operation. When an operation uses more than one lane in the same register to determine the output, that is a “horizontal” operation.

  • Vertical: out[0] = a[0] + b[0], out[1] = a[1] + b[1]
  • Horizontal: out[0] = a[0] + a[1], out[1] = b[0] + b[1]

§Operation Glossary

Here follows the list of all the main operations and their explanations.

  • abs: Absolute value (wrapping).
  • add: Addition. This is “wrapping” by default, though some other types of addition are available. Remember that wrapping signed addition is the same as wrapping unsigned addition.
  • average: Averages the two inputs.
  • bitand: Bitwise And, a & b, like the trait.
  • bitandnot: Bitwise (!a) & b. This seems a little funny at first but it’s useful for clearing bits. The output will be based on the b side’s bit pattern, but with all active bits in a cleared:
    • bitandnot(0b0010, 0b1011) == 0b1001
  • bitor: Bitwise Or, a | b, like the trait.
  • bitxor: Bitwise eXclusive Or, a ^ b, like the trait.
  • blend: Merge the data lanes of two SIMD values by taking either the b value or a value for each lane. Depending on the instruction, the blend mask can be either an immediate or a runtime value.
  • cast: Convert between data types while preserving the exact bit patterns, like how transmute works.
  • ceil: “Ceiling”, rounds towards positive infinity.
  • cmp: Numeric comparisons of various kinds. This generally gives “mask” output where the output value is of the same data type as the inputs, but with all the bits in a “true” lane as 1 and all the bits in a “false” lane as 0. Remember that with floating point values all 1s bits is a NaN, and with signed integers all 1s bits is -1.
    • An “Ordered comparison” checks if neither floating point value is NaN.
    • An “Unordered comparison” checks if either floating point value is NaN.
  • convert: This does some sort of numeric type change. The details can vary wildly. Generally, if the number of lanes goes down then the lowest lanes will be kept. If the number of lanes goes up then the new high lanes will be zero.
  • div: Division.
  • dot_product: This works like the matrix math operation. The lanes are multiplied and then the results are summed up into a single value.
  • duplicate: Copy the even or odd indexed lanes to the other set of lanes. Eg, [1, 2, 3, 4] becomes [1, 1, 3, 3] or [2, 2, 4, 4].
  • extract: Get a value from the lane of a SIMD type into a scalar type.
  • floor: Rounds towards negative infinity.
  • fused: All the fused operations are a multiply as well as some sort of adding or subtracting. The details depend on which fused operation you select. The benefit of this operation over a non-fused operation are that it can compute slightly faster than doing the mul and add separately, and also the output can have higher accuracy in the result.
  • insert: The opposite of extract, this puts a new value into a particular lane of a SIMD type.
  • load: Reads an address and makes a SIMD register value. The details can vary because there’s more than one type of load, but generally this is a &T -> U style operation.
  • max: Picks the larger value from each of the two inputs.
  • min: Picks the smaller value from each of the two inputs.
  • mul: Multiplication. For floating point this is just “normal” multiplication, but for integer types you tend to have some options. An integer multiplication of X bits will produce a 2X bit output, so generally you’ll get to pick if you want to keep the high half of that, the low half of that (a normal “wrapping” mul), or “widen” the outputs to be all the bits at the expense of not multiplying half the lanes the lanes.
  • pack: Take the integers in the a and b inputs, reduce them to fit within the half-sized integer type (eg: i16 to i8), and pack them all together into the output.
  • population: The “population” operations refer to the bits within an integer. Either counting them or adjusting them in various ways.
  • rdrand: Use the hardware RNG to make a random value of the given length.
  • rdseed: Use the hardware RNG to make a random seed of the given length. This is less commonly available, but theoretically an improvement over rdrand in that if you have to combine more than one usage of this operation to make your full seed size then the guess difficulty rises at a multiplicative rate instead of just an additive rate. For example, two u64 outputs concatenated to a single u128 have a guess difficulty of 2^(64*64) with rdseed but only 2^(64+64) with rdrand.
  • read_timestamp_counter: Lets you read the CPU’s cycle counter, which doesn’t strictly mean anything in particular since even the CPU’s clock rate isn’t even stable over time, but you might find it interesting as an approximation during benchmarks, or something like that.
  • reciprocal: Turns x into 1/x. Can also be combined with a sqrt operation.
  • round: Convert floating point values to whole numbers, according to one of several available methods.
  • set: Places a list of scalar values into a SIMD lane. Conceptually similar to how building an array works in Rust.
  • splat: Not generally an operation of its own, but a modifier to other operations such as load and set. This will copy a given value across a SIMD type as many times as it can be copied. For example, a 32-bit value splatted into a 128-bit register will be copied four times.
  • shl: Bit shift left. New bits shifted in are always 0. Because the shift is the same for both signed and unsigned values, this crate simply marks left shift as always being an unsigned operation.
    • You can shift by an immediate value (“imm”), all lanes by the same value (“all”), or each lane by its own value (“each”).
  • shr: Bit shift right. This comes in two forms: “Arithmetic” shifts shift in the starting sign bit (which preserves the sign of the value), and “Logical” shifts shift in 0 regardless of the starting sign bit (so the result ends up being positive). With normal Rust types, signed integers use arithmetic shifts and unsigned integers use logical shifts, so these functions are marked as being for signed or unsigned integers appropriately.
    • As with shl, you can shift by an immediate value (“imm”), all lanes by the same value (“all”), or each lane by its own value (“each”).
  • sign_apply: Multiplies one set of values by the signum (1, 0, or -1) of another set of values.
  • sqrt: Square Root.
  • store: Writes a SIMD value to a memory location.
  • string_search: A rather specialized instruction that lets you do byte based searching within a register. This lets you do some very high speed searching through ASCII strings when the stars align.
  • sub: Subtract.
  • shuffle: This lets you re-order the data lanes. Sometimes x86/x64 calls this is called “shuffle”, and sometimes it’s called “permute”, and there’s no particular reasoning behind the different names, so we just call them all shuffle.
    • shuffle_{args}_{lane-type}_{lane-sources}_{simd-type}.
    • “args” is the input arguments: a (one arg) or ab (two args), then either v (runtime-varying) or i (immediate). All the immediate shuffles are macros, of course.
    • “lane type” is f32, f64, i8, etc. If there’s a z after the type then you’ll also be able to zero an output position instead of making it come from a particular source lane.
    • “lane sources” is generally either “all” which means that all lanes can go to all other lanes, or “half” which means that each half of the lanes is isolated from the other half, and you can’t cross data between the two halves, only within a half (this is how most of the 256-bit x86/x64 shuffles work).
  • unpack: Takes a SIMD value and gets out some of the lanes while widening them, such as converting i16 to i32.