ngram.c
ngram.c - Sliding-window n-gram traversal
ngram.c is a minimalist C library and CLI for descending sliding-window n-gram traversal of text. It enables semantic analysis by emitting token spans and executing commands for each chunk, designed as a composable native primitive for the KaisarCode ecosystem.
CLI
Traverse text and emit n-gram chunks based on token window constraints.
Examples
Basic n-gram extraction (default 1-5 tokens):
./bin/x86_64/linux/ngram "The quick brown fox"
Extraction with custom window size and separators:
./bin/x86_64/linux/ngram "The quick brown fox" --max 3 --min 2 --sep " ,"
Execute a command for each chunk and close span on stdout:
./bin/x86_64/linux/ngram "The quick brown fox" --cmd "grep -q fox"
Standard input processing:
echo "The quick brown fox" | ./bin/x86_64/linux/ngram
Parameters
| Flag | Description |
|---|---|
--max, -max <n> | Maximum tokens per block |
--min, -min <n> | Minimum tokens per block |
--sep, -sep <s> | Custom separator characters |
--cmd, -cmd <cmd> | Execute command for each chunk |
--help, -h | Show help and usage |
--version, -v | Show version |
Output
Chunks are printed to stdout, one per line:
The quick brown fox
The quick brown
quick brown fox
The quick
quick brown
brown fox
Public API
#include "ngram.h"
int my_visitor(const kc_ngram_chunk_t *chunk, void *context) {
printf("%.*s\n", (int)(chunk->byte_end - chunk->byte_start), chunk->input + chunk->byte_start);
return 0; // 1 to close span, -1 to abort
}
kc_ngram_options_t options;
kc_ngram_options_default(&options);
options.max_tokens = 3;
kc_ngram_execute("The quick brown fox", &options, my_visitor, NULL);
Build
Compiled artifacts are generated under bin/{arch}/{platform}/ for the host architecture running the build.
make clean && make
Multiarch Builds
The project is prepared to build artifacts for multiple architectures under bin/{arch}/{platform}/. A plain make builds only the current host architecture, while the targets below build the full matrix or a specific target.
make all
make x86_64/linux
make x86_64/windows
make i686/linux
make i686/windows
make aarch64/linux
make aarch64/android
make armv7/linux
make armv7/android
make armv7hf/linux
make riscv64/linux
make powerpc64le/linux
make mips/linux
make mipsel/linux
make mips64el/linux
make s390x/linux
make loongarch64/linux
Beta Notice
This is a beta project tested only on Debian x86_64. It was created out of a personal need for these libraries, but no guarantees are provided regarding its stability or future support. You are free to test it, use it, and modify it as you please.
If you'd like to reach out, you can send an email to [email protected]. Please note that I do not accept pull requests; the goal is to avoid long-term dependency on platforms like GitHub, and I do not maintain fixed infrastructure to guarantee long-term stability for these projects.
Repo
You can download the repository and read the most up-to-date documentation directly from its official source.
GitHub: kaisarcode/ngram.c
License
This project is distributed under the GNU General Public License version 3 (GPLv3).
