Module naming_conventions
dep_safe_arch
only.Expand description
An explanation of the crate’s naming conventions.
This crate attempts to follow the general naming scheme of verb_type
when
the operation is “simple”, and verb_description_words_type
when the
operation (op) needs to be more specific than normal. Like this:
add_m128
add_saturating_i8_m128i
§Types
Currently, only x86
and x86_64
types are supported. Among those types:
m128
andm256
are always considered to holdf32
lanes.m128d
andm256d
are always considered to holdf64
lanes.m128i
andm256i
hold integer data, but each op specifies what lane width of integers the operation uses.- If the type has
_s
on the end then it’s a “scalar” operation that affects just the lowest lane. The other lanes are generally copied forward from one of the inputs, though the details there vary from op to op. - The SIMD types are often referred to as “registers” because each SIMD typed value represents exactly one CPU register when you’re doing work.
§Operations
There’s many operations that can be performed. When possible, safe_arch
tries to follow normal Rust naming (eg: adding is still add
and left
shifting is still shl
), but if an operation doesn’t normally exist at all
in Rust then we basically have to make something up.
Many operations have more than one variant, such as add
and also
add_saturating
. In this case, safe_arch
puts the “core operation” first
and then any “modifiers” go after, which isn’t how you might normally say it
in English, but it makes the list of functions sort better.
As a general note on SIMD terminology: When an operation uses the same indexed lane in two different registers to determine the output, that is a “vertical” operation. When an operation uses more than one lane in the same register to determine the output, that is a “horizontal” operation.
- Vertical:
out[0] = a[0] + b[0]
,out[1] = a[1] + b[1]
- Horizontal:
out[0] = a[0] + a[1]
,out[1] = b[0] + b[1]
§Operation Glossary
Here follows the list of all the main operations and their explanations.
abs
: Absolute value (wrapping).add
: Addition. This is “wrapping” by default, though some other types of addition are available. Remember that wrapping signed addition is the same as wrapping unsigned addition.average
: Averages the two inputs.bitand
: Bitwise And,a & b
, like the trait.bitandnot
: Bitwise(!a) & b
. This seems a little funny at first but it’s useful for clearing bits. The output will be based on theb
side’s bit pattern, but with all active bits ina
cleared:bitandnot(0b0010, 0b1011) == 0b1001
bitor
: Bitwise Or,a | b
, like the trait.bitxor
: Bitwise eXclusive Or,a ^ b
, like the trait.blend
: Merge the data lanes of two SIMD values by taking either theb
value ora
value for each lane. Depending on the instruction, the blend mask can be either an immediate or a runtime value.cast
: Convert between data types while preserving the exact bit patterns, like howtransmute
works.ceil
: “Ceiling”, rounds towards positive infinity.cmp
: Numeric comparisons of various kinds. This generally gives “mask” output where the output value is of the same data type as the inputs, but with all the bits in a “true” lane as 1 and all the bits in a “false” lane as 0. Remember that with floating point values all 1s bits is a NaN, and with signed integers all 1s bits is -1.- An “Ordered comparison” checks if neither floating point value is NaN.
- An “Unordered comparison” checks if either floating point value is NaN.
convert
: This does some sort of numeric type change. The details can vary wildly. Generally, if the number of lanes goes down then the lowest lanes will be kept. If the number of lanes goes up then the new high lanes will be zero.div
: Division.dot_product
: This works like the matrix math operation. The lanes are multiplied and then the results are summed up into a single value.duplicate
: Copy the even or odd indexed lanes to the other set of lanes. Eg,[1, 2, 3, 4]
becomes[1, 1, 3, 3]
or[2, 2, 4, 4]
.extract
: Get a value from the lane of a SIMD type into a scalar type.floor
: Rounds towards negative infinity.fused
: All the fused operations are a multiply as well as some sort of adding or subtracting. The details depend on which fused operation you select. The benefit of this operation over a non-fused operation are that it can compute slightly faster than doing the mul and add separately, and also the output can have higher accuracy in the result.insert
: The opposite ofextract
, this puts a new value into a particular lane of a SIMD type.load
: Reads an address and makes a SIMD register value. The details can vary because there’s more than one type ofload
, but generally this is a&T -> U
style operation.max
: Picks the larger value from each of the two inputs.min
: Picks the smaller value from each of the two inputs.mul
: Multiplication. For floating point this is just “normal” multiplication, but for integer types you tend to have some options. An integer multiplication of X bits will produce a 2X bit output, so generally you’ll get to pick if you want to keep the high half of that, the low half of that (a normal “wrapping” mul), or “widen” the outputs to be all the bits at the expense of not multiplying half the lanes the lanes.pack
: Take the integers in thea
andb
inputs, reduce them to fit within the half-sized integer type (eg:i16
toi8
), and pack them all together into the output.population
: The “population” operations refer to the bits within an integer. Either counting them or adjusting them in various ways.rdrand
: Use the hardware RNG to make a random value of the given length.rdseed
: Use the hardware RNG to make a random seed of the given length. This is less commonly available, but theoretically an improvement overrdrand
in that if you have to combine more than one usage of this operation to make your full seed size then the guess difficulty rises at a multiplicative rate instead of just an additive rate. For example, twou64
outputs concatenated to a singleu128
have a guess difficulty of 2^(64*64) withrdseed
but only 2^(64+64) withrdrand
.read_timestamp_counter
: Lets you read the CPU’s cycle counter, which doesn’t strictly mean anything in particular since even the CPU’s clock rate isn’t even stable over time, but you might find it interesting as an approximation during benchmarks, or something like that.reciprocal
: Turnsx
into1/x
. Can also be combined with asqrt
operation.round
: Convert floating point values to whole numbers, according to one of several available methods.set
: Places a list of scalar values into a SIMD lane. Conceptually similar to how building an array works in Rust.splat
: Not generally an operation of its own, but a modifier to other operations such asload
andset
. This will copy a given value across a SIMD type as many times as it can be copied. For example, a 32-bit value splatted into a 128-bit register will be copied four times.shl
: Bit shift left. New bits shifted in are always 0. Because the shift is the same for both signed and unsigned values, this crate simply marks left shift as always being an unsigned operation.- You can shift by an immediate value (“imm”), all lanes by the same value (“all”), or each lane by its own value (“each”).
shr
: Bit shift right. This comes in two forms: “Arithmetic” shifts shift in the starting sign bit (which preserves the sign of the value), and “Logical” shifts shift in 0 regardless of the starting sign bit (so the result ends up being positive). With normal Rust types, signed integers use arithmetic shifts and unsigned integers use logical shifts, so these functions are marked as being for signed or unsigned integers appropriately.- As with
shl
, you can shift by an immediate value (“imm”), all lanes by the same value (“all”), or each lane by its own value (“each”).
- As with
sign_apply
: Multiplies one set of values by the signum (1, 0, or -1) of another set of values.sqrt
: Square Root.store
: Writes a SIMD value to a memory location.string_search
: A rather specialized instruction that lets you do byte based searching within a register. This lets you do some very high speed searching through ASCII strings when the stars align.sub
: Subtract.shuffle
: This lets you re-order the data lanes. Sometimes x86/x64 calls this is called “shuffle”, and sometimes it’s called “permute”, and there’s no particular reasoning behind the different names, so we just call them all shuffle.shuffle_{args}_{lane-type}_{lane-sources}_{simd-type}
.- “args” is the input arguments:
a
(one arg) orab
(two args), then eitherv
(runtime-varying) ori
(immediate). All the immediate shuffles are macros, of course. - “lane type” is
f32
,f64
,i8
, etc. If there’s az
after the type then you’ll also be able to zero an output position instead of making it come from a particular source lane. - “lane sources” is generally either “all” which means that all lanes can go to all other lanes, or “half” which means that each half of the lanes is isolated from the other half, and you can’t cross data between the two halves, only within a half (this is how most of the 256-bit x86/x64 shuffles work).
unpack
: Takes a SIMD value and gets out some of the lanes while widening them, such as convertingi16
toi32
.