SIMD Math

Nitpick's SIMD math support (v0.55.6) extends the scalar math compiler builtins to simd<flt64, N> types, enabling vectorized transcendental computation using lane-wise LLVM expansion.

Overview

SIMD math operations apply a scalar math function to each lane of a SIMD vector independently. For functions with native LLVM vector intrinsics (sqrt, abs), the compiler emits a single vector instruction. For transcendentals (sin, cos, exp, log), the compiler performs lane-wise expansion — unrolling the N lanes into N scalar calls and repacking.

Supported Functions

All scalar math builtins that accept flt64 also accept simd<flt64, N>:

Function	LLVM Strategy	Notes
`sqrt(v)`	`llvm.sqrt.v*f64` — single vector instruction	Fastest
`abs(v)`	`llvm.fabs.v*f64` — single vector instruction	Fastest
`sin(v)`	Lane-wise expansion	N scalar calls
`cos(v)`	Lane-wise expansion	N scalar calls
`tan(v)`	Lane-wise expansion	N scalar calls
`exp(v)`	Lane-wise expansion	N scalar calls
`exp2(v)`	Lane-wise expansion	N scalar calls
`log(v)`	Lane-wise expansion	N scalar calls
`log2(v)`	Lane-wise expansion	N scalar calls
`log10(v)`	Lane-wise expansion	N scalar calls
`pow(v, e)`	Lane-wise expansion	Both `v` and `e` may be SIMD
`floor(v)`	Lane-wise expansion
`ceil(v)`	Lane-wise expansion
`round(v)`	Lane-wise expansion
`trunc(v)`	Lane-wise expansion

Basic Usage

// 4-lane SIMD
simd<flt64, 4>:angles = { 0.0, PI()/6.0, PI()/4.0, PI()/3.0 };
simd<flt64, 4>:sins   = sin(angles);   // { 0, 0.5, 0.707, 0.866 }
simd<flt64, 4>:coss   = cos(angles);   // { 1, 0.866, 0.707, 0.5 }

// 8-lane SIMD
simd<flt64, 8>:vals = { 1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0 };
simd<flt64, 8>:logs = log2(vals);      // { 0, 1, 2, 3, 4, 5, 6, 7 }

Pythagorean Identity — Verified Lane-Wise

The identity sin(x)² + cos(x)² == 1 holds lane-by-lane within floating-point precision:

simd<flt64, 4>:x = { 0.1, 0.5, 1.2, 3.0 };
simd<flt64, 4>:s = sin(x);
simd<flt64, 4>:c = cos(x);
simd<flt64, 4>:pyth = s * s + c * c;
// pyth ≈ { 1.0, 1.0, 1.0, 1.0 } within 1e-15

Element Access

SIMD results can be extracted by index:

simd<flt64, 4>:r = sqrt({ 1.0, 4.0, 9.0, 16.0 });
flt64:lane0 = r[0];  // 1.0
flt64:lane1 = r[1];  // 2.0
flt64:lane2 = r[2];  // 3.0
flt64:lane3 = r[3];  // 4.0

Arithmetic on SIMD Results

SIMD math results compose with standard SIMD arithmetic:

simd<flt64, 4>:a = { 1.0, 2.0, 3.0, 4.0 };
simd<flt64, 4>:b = { 5.0, 6.0, 7.0, 8.0 };

// Combine trig and arithmetic
simd<flt64, 4>:dot_approx = sin(a) * cos(b) + cos(a) * sin(b);
// dot_approx[i] == sin(a[i] + b[i])  (angle sum identity)

NaN and Infinity Handling

SIMD math functions follow IEEE 754 for each lane independently:

simd<flt64, 4>:v = { -1.0, 0.0, 1.0, 4.0 };
simd<flt64, 4>:r = sqrt(v);
// r[0] == NaN   (sqrt of negative)
// r[1] == 0.0
// r[2] == 1.0
// r[3] == 2.0

NaN in one lane does not affect other lanes.

Performance Notes

Vector-Native Functions (`sqrt`, `abs`)

For sqrt and abs, the compiler emits a single AVX2/AVX-512/NEON vector instruction (e.g., vfsqrt). These are as fast as a scalar call — effectively free when the data is already in SIMD registers.

Lane-Wise Expansion (`sin`, `cos`, `exp`, `log`, ...)

For transcendentals, the compiler unrolls to N scalar libm calls. This is semantically correct but does not benefit from SIMD-width acceleration unless the target supports SVML (Intel Short Vector Math Library) or similar.

Optimization hint: On targets with SVML available, LLVM's auto-vectorizer can sometimes replace the unrolled calls with a vector libm call. Enable with -O2 or -O3.

Preferred SIMD Width

Target	Recommended Width	Reason
x86-64 + AVX2	`simd<flt64, 4>`	256-bit registers
x86-64 + AVX-512	`simd<flt64, 8>`	512-bit registers
ARM64 + NEON	`simd<flt64, 2>`	128-bit registers

SIMD vs. Scalar Math Summary

Scenario	Recommendation
Single value computation	Scalar builtins
Batched trig (sin/cos of N angles)	`simd<flt64, 4>` or `simd<flt64, 8>`
Batched sqrt / abs	SIMD — direct vector instruction
Loop over large arrays	SIMD for throughput, scalar for correctness verification

standard_library/math.md — scalar math reference
types/fix256.md — deterministic fixed-point alternative
verification/02_rules_and_limits.md — formal verification of math properties