SIMD Math
Nitpick's SIMD math support (v0.55.6) extends the scalar math compiler builtins to simd<flt64, N> types, enabling vectorized transcendental computation using lane-wise LLVM expansion.
Overview
SIMD math operations apply a scalar math function to each lane of a SIMD vector independently. For functions with native LLVM vector intrinsics (sqrt, abs), the compiler emits a single vector instruction. For transcendentals (sin, cos, exp, log), the compiler performs lane-wise expansion — unrolling the N lanes into N scalar calls and repacking.
Supported Functions
All scalar math builtins that accept flt64 also accept simd<flt64, N>:
| Function | LLVM Strategy | Notes |
|---|---|---|
sqrt(v) |
llvm.sqrt.v*f64 — single vector instruction |
Fastest |
abs(v) |
llvm.fabs.v*f64 — single vector instruction |
Fastest |
sin(v) |
Lane-wise expansion | N scalar calls |
cos(v) |
Lane-wise expansion | N scalar calls |
tan(v) |
Lane-wise expansion | N scalar calls |
exp(v) |
Lane-wise expansion | N scalar calls |
exp2(v) |
Lane-wise expansion | N scalar calls |
log(v) |
Lane-wise expansion | N scalar calls |
log2(v) |
Lane-wise expansion | N scalar calls |
log10(v) |
Lane-wise expansion | N scalar calls |
pow(v, e) |
Lane-wise expansion | Both v and e may be SIMD |
floor(v) |
Lane-wise expansion | |
ceil(v) |
Lane-wise expansion | |
round(v) |
Lane-wise expansion | |
trunc(v) |
Lane-wise expansion |
Basic Usage
// 4-lane SIMD
simd<flt64, 4>:angles = { 0.0, PI()/6.0, PI()/4.0, PI()/3.0 };
simd<flt64, 4>:sins = sin(angles); // { 0, 0.5, 0.707, 0.866 }
simd<flt64, 4>:coss = cos(angles); // { 1, 0.866, 0.707, 0.5 }
// 8-lane SIMD
simd<flt64, 8>:vals = { 1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0 };
simd<flt64, 8>:logs = log2(vals); // { 0, 1, 2, 3, 4, 5, 6, 7 }
Pythagorean Identity — Verified Lane-Wise
The identity sin(x)² + cos(x)² == 1 holds lane-by-lane within floating-point precision:
simd<flt64, 4>:x = { 0.1, 0.5, 1.2, 3.0 };
simd<flt64, 4>:s = sin(x);
simd<flt64, 4>:c = cos(x);
simd<flt64, 4>:pyth = s * s + c * c;
// pyth ≈ { 1.0, 1.0, 1.0, 1.0 } within 1e-15
Element Access
SIMD results can be extracted by index:
simd<flt64, 4>:r = sqrt({ 1.0, 4.0, 9.0, 16.0 });
flt64:lane0 = r[0]; // 1.0
flt64:lane1 = r[1]; // 2.0
flt64:lane2 = r[2]; // 3.0
flt64:lane3 = r[3]; // 4.0
Arithmetic on SIMD Results
SIMD math results compose with standard SIMD arithmetic:
simd<flt64, 4>:a = { 1.0, 2.0, 3.0, 4.0 };
simd<flt64, 4>:b = { 5.0, 6.0, 7.0, 8.0 };
// Combine trig and arithmetic
simd<flt64, 4>:dot_approx = sin(a) * cos(b) + cos(a) * sin(b);
// dot_approx[i] == sin(a[i] + b[i]) (angle sum identity)
NaN and Infinity Handling
SIMD math functions follow IEEE 754 for each lane independently:
simd<flt64, 4>:v = { -1.0, 0.0, 1.0, 4.0 };
simd<flt64, 4>:r = sqrt(v);
// r[0] == NaN (sqrt of negative)
// r[1] == 0.0
// r[2] == 1.0
// r[3] == 2.0
NaN in one lane does not affect other lanes.
Performance Notes
Vector-Native Functions (sqrt, abs)
For sqrt and abs, the compiler emits a single AVX2/AVX-512/NEON vector instruction (e.g., vfsqrt). These are as fast as a scalar call — effectively free when the data is already in SIMD registers.
Lane-Wise Expansion (sin, cos, exp, log, ...)
For transcendentals, the compiler unrolls to N scalar libm calls. This is semantically correct but does not benefit from SIMD-width acceleration unless the target supports SVML (Intel Short Vector Math Library) or similar.
Optimization hint: On targets with SVML available, LLVM's auto-vectorizer can sometimes replace the unrolled calls with a vector libm call. Enable with -O2 or -O3.
Preferred SIMD Width
| Target | Recommended Width | Reason |
|---|---|---|
| x86-64 + AVX2 | simd<flt64, 4> |
256-bit registers |
| x86-64 + AVX-512 | simd<flt64, 8> |
512-bit registers |
| ARM64 + NEON | simd<flt64, 2> |
128-bit registers |
SIMD vs. Scalar Math Summary
| Scenario | Recommendation |
|---|---|
| Single value computation | Scalar builtins |
| Batched trig (sin/cos of N angles) | simd<flt64, 4> or simd<flt64, 8> |
| Batched sqrt / abs | SIMD — direct vector instruction |
| Loop over large arrays | SIMD for throughput, scalar for correctness verification |
Related
- standard_library/math.md — scalar math reference
- types/fix256.md — deterministic fixed-point alternative
- verification/02_rules_and_limits.md — formal verification of math properties