KaisarCode

ngram.c

ngram.c - Sliding-window n-gram traversal

ngram.c is a minimalist C library and CLI for descending sliding-window n-gram traversal of text. It enables semantic analysis by emitting token spans and executing commands for each chunk, designed as a composable native primitive for the KaisarCode ecosystem.


CLI

Traverse text and emit n-gram chunks based on token window constraints.

Examples

Basic n-gram extraction (default 1-5 tokens):

./bin/x86_64/linux/ngram "The quick brown fox"

Extraction with custom window size and separators:

./bin/x86_64/linux/ngram "The quick brown fox" --max 3 --min 2 --sep " ,"

Execute a command for each chunk and close span on stdout:

./bin/x86_64/linux/ngram "The quick brown fox" --cmd "grep -q fox"

Standard input processing:

echo "The quick brown fox" | ./bin/x86_64/linux/ngram

Parameters

FlagDescription
--max, -max <n>Maximum tokens per block
--min, -min <n>Minimum tokens per block
--sep, -sep <s>Custom separator characters
--cmd, -cmd <cmd>Execute command for each chunk
--help, -hShow help and usage
--version, -vShow version

Output

Chunks are printed to stdout, one per line:

The quick brown fox
The quick brown
quick brown fox
The quick
quick brown
brown fox

Public API

#include "ngram.h"

int my_visitor(const kc_ngram_chunk_t *chunk, void *context) {
    printf("%.*s\n", (int)(chunk->byte_end - chunk->byte_start), chunk->input + chunk->byte_start);
    return 0; // 1 to close span, -1 to abort
}

kc_ngram_options_t options;
kc_ngram_options_default(&options);
options.max_tokens = 3;

kc_ngram_execute("The quick brown fox", &options, my_visitor, NULL);

Build

Compiled artifacts are generated under bin/{arch}/{platform}/ for the host architecture running the build.

make clean && make

Multiarch Builds

The project is prepared to build artifacts for multiple architectures under bin/{arch}/{platform}/. A plain make builds only the current host architecture, while the targets below build the full matrix or a specific target.

make all
make x86_64/linux
make x86_64/windows
make i686/linux
make i686/windows
make aarch64/linux
make aarch64/android
make armv7/linux
make armv7/android
make armv7hf/linux
make riscv64/linux
make powerpc64le/linux
make mips/linux
make mipsel/linux
make mips64el/linux
make s390x/linux
make loongarch64/linux

License

GPLv3

This project is distributed under the GNU General Public License version 3 (GPLv3).


Repo

GitHub: kaisarcode/ngram.c