← Back to AILP Home

Binary Serialization (binary)

The binary module provides a structured, endian-aware serialization framework in Nitpick. It is primarily used to encode native primitive types (integers, strings, floats, booleans) into contiguous byte formats suitable for disk storage, inter-process communication, or network I/O.

In the current Nitpick specification, binary buffers are managed as opaque int64 handles via the stdlib/binary.npk module rather than a first-class language type.

Overview

Unlike string (which is designed for textual operations but lacks specific character encodings) or buffer (which provides raw, unencoded contiguous memory manipulation), a binary context is inherently stateful. Every binary object implicitly tracks:

Writes always append to the binary length, while reads advance the internal pos cursor.

Creating a Binary Buffer

To interact with binary functions, you must import the standard module and use the bin_new allocator, which returns an opaque int64 handle representing the buffer.

use "binary.npk".*;

func:main = int32() {
    // Allocate a new binary buffer (returns an int64 handle)
    int64:buf = raw bin_new();

    // Do operations...

    // Free the buffer when done
    drop bin_free(buf);
    exit(0);
};

Writing Data

Writes append data explicitly to the end of the buffer (at offset len). The internal capacity is dynamically expanded automatically. All integers and floats are encoded in little-endian byte order.

// Write primitives
drop bin_write_int8(buf, 10i32);
drop bin_write_int16(buf, 2000i32);
drop bin_write_int32(buf, 300000i32);
drop bin_write_int64(buf, 9000000000i64);

drop bin_write_flt32(buf, 3.14f64);
drop bin_write_flt64(buf, 2.7182818f64);

drop bin_write_bool(buf, 1i32); // Evaluates as true

Note: flt32 accepts an f64 argument for ergonomic compatibility but intrinsically down-casts and serializes 4 bytes. bool accepts an int32 but writes a single byte (1 or 0).

Reading Data

Reads decode bytes natively back into Nitpick values. Every read operation evaluates the bytes starting from pos and advances pos by the number of bytes read.

Before reading, ensure the cursor is properly positioned. Usually, you must seek to the beginning of the buffer.

// Seek back to the start of the buffer
drop bin_seek(buf, 0i64);

// Reading primitives
int32:val1 = raw bin_read_int8(buf);    // reads 1 byte, returns sign-extended int32
int32:val2 = raw bin_read_int16(buf);   // reads 2 bytes
int32:val3 = raw bin_read_int32(buf);   // reads 4 bytes
int64:val4 = raw bin_read_int64(buf);   // reads 8 bytes

flt64:f1 = raw bin_read_flt32(buf);     // reads 4 bytes, returns up-casted flt64
flt64:f2 = raw bin_read_flt64(buf);     // reads 8 bytes

int32:b1 = raw bin_read_bool(buf);      // reads 1 byte (1 or 0)

String Encoding

Strings in binary are length-prefixed, NOT null-terminated. This provides O(1) buffer allocation upon decoding and safe containment of internal null bytes.

When bin_write_str is invoked: 1. The 8-byte (int64) length of the string is written in little-endian. 2. The raw bytes of the string are written immediately following the prefix.

drop bin_write_str(buf, "Hello World");

// Resets cursor to 0
drop bin_seek(buf, 0i64);

// Automatically decodes the 8-byte prefix and reads 11 bytes
string:s = raw bin_read_str(buf);

File I/O

Nitpick supports direct interaction between binary buffers and the filesystem.

// Write the current buffer to a binary file
drop bin_to_file(buf, "data.bin");

// Read a binary file directly into a new buffer
int64:file_buf = raw bin_from_file("data.bin");

Cursor Control

You can manually inspect the read/write cursor and payload sizes:

Error Handling

The binary module uses Nitpick's strict error-checking parameters. If a seek offset goes beyond len, the cursor is clamped to len.

If a read occurs beyond the boundary (where bin_remaining() is less than the required bytes), undefined memory reads may occur at the libc-level boundary since k-semantics does not strictly enforce boundary offsets here. Be careful to ensure you have enough bytes using bin_remaining before consuming payloads.

Complete API Table

Function Signature Description
bin_new() -> int64 Allocates a new empty binary buffer and returns the handle.
bin_free(h) Deallocates a buffer and its underlying data block.
bin_write_int8(h, v) Writes 1 byte to the end of the buffer.
bin_write_int16(h, v) Writes 2 bytes (little-endian) to the end.
bin_write_int32(h, v) Writes 4 bytes (little-endian) to the end.
bin_write_int64(h, v) Writes 8 bytes (little-endian) to the end.
bin_write_flt32(h, v) Writes a downcasted 4-byte IEEE 754 float.
bin_write_flt64(h, v) Writes an 8-byte IEEE 754 float.
bin_write_bool(h, v) Writes a single byte (0 or 1).
bin_write_str(h, s) Writes an 8-byte length prefix followed by the string payload.
bin_read_int8(h) -> int32 Reads 1 byte, returning a sign-extended int32.
bin_read_int16(h) -> int32 Reads 2 bytes (little-endian).
bin_read_int32(h) -> int32 Reads 4 bytes (little-endian).
bin_read_int64(h) -> int64 Reads 8 bytes (little-endian).
bin_read_flt32(h) -> flt64 Reads 4 bytes, returning an up-casted flt64.
bin_read_flt64(h) -> flt64 Reads 8 bytes.
bin_read_bool(h) -> int32 Reads 1 byte (returns 0 or 1).
bin_read_str(h) -> string Reads an 8-byte length prefix, then returns the allocated string.
bin_seek(h, pos) Sets the read cursor (clamps to length bounds).
bin_size(h) -> int64 Returns the total bytes written.
bin_pos(h) -> int64 Returns the current read cursor position.
bin_remaining(h) -> int64 Returns available bytes to read (len - pos).
bin_from_file(path) -> int64 Reads a file from disk into a newly allocated buffer handle.
bin_to_file(h, path) Writes the buffer payload into a disk file.

Examples

Defining a Network Packet Sequence:

use "binary.npk".*;

func:build_packet = int64(int32:type, int64:timestamp, string:payload) {
    int64:h = raw bin_new();
    drop bin_write_int32(h, type);
    drop bin_write_int64(h, timestamp);
    drop bin_write_str(h, payload);
    pass h;
};

Complete Encode/Decode and File I/O Example:

use "binary.npk".*;

struct:User = {
    int32:id;
    string:name;
};

func:main = int32() {
    // 1. Encode
    User:u = { id: 42i32, name: "Alice" };
    int64:buf_out = raw bin_new();
    drop bin_write_int32(buf_out, u.id);
    drop bin_write_str(buf_out, u.name);

    // 2. Save to file
    drop bin_to_file(buf_out, "user.dat");
    drop bin_free(buf_out);

    // 3. Load from file
    int64:buf_in = raw bin_from_file("user.dat");

    // 4. Decode
    User:loaded = {
        id: raw bin_read_int32(buf_in),
        name: raw bin_read_str(buf_in)
    };

    drop bin_free(buf_in);
    exit 0;
};

Error Handling Patterns

When using bin_read_* or bin_seek, invalid boundaries (reading past len or seeking past len) are gracefully clamped or return safe default zeroes to prevent segmentation faults. However, standard error handling should explicitly check bin_remaining() before performing grouped deserializations.

int64:rem = raw bin_remaining(buf);
if (rem < 8i64) {
    // Cannot safely read an int64 payload
    println("Error: truncated payload");
    exit 1;
}
int64:val = raw bin_read_int64(buf);

Performance Notes

Known Limitations

binary vs buffer

A buffer acts as a raw mutable memory region. You access specific byte offsets, deal heavily in pointer arithmetic, and interface closely with C FFI structures or memory pages.

The binary module represents an abstracted serialization stream. It natively enforces layout, endianness, capacity reallocation, and encoding parameters on behalf of the developer. Use binary when dumping variables to disk/network, and use buffer when you just need raw unmanaged RAM blocks.