ipa.c
ipa.c - Schema-Driven Semantic Island Parser
ipa.c is a compact, stand-alone island parser library and CLI. Given a schema of named nodes and example descriptions, it finds which nodes match spans of natural language text, what they emit, and how they break down into child nodes.
CLI
Build mode
Compile a schema manifest into a binary resource:
./bin/x86_64/linux/ipa path/to/schema.json -b
The compiled resource is written to the same directory with the .json extension replaced by .ipa.
Parse mode
Match input text against a compiled schema:
./bin/x86_64/linux/ipa path/to/schema.ipa "install firefox"
Pipe input through standard input:
echo "install firefox" | ./bin/x86_64/linux/ipa path/to/schema.ipa
Output
Parse results are written to stdout as a JSON object:
{
"ok": true,
"matches": [
{
"id": "install_command",
"span": "install firefox",
"score": 0.9821,
"emit": {"action": "install"},
"children": [
{"id": "software", "span": "firefox", "score": 0.9934, "emit": {"software": "firefox"}}
]
}
]
}
When no node matches above the threshold, matches is empty and ok is false.
Parameters
| Flag | Description |
|---|---|
-b, --build | Compile schema.json into schema.ipa |
-h, --help | Show help and usage |
-v, --version | Show version |
Exit codes
| Code | Meaning |
|---|---|
0 | Success |
1 | Runtime or parse error |
2 | Build or schema error |
Schema manifest
Schemas are defined in JSON with schema ipa.map.v1:
{
"schema": "ipa.map.v1",
"id": "myschema",
"vendor": "myvendor",
"version": "1.0.0",
"defaults": {
"node_threshold": 0.80,
"min_ngram": 1,
"max_ngram": 5
},
"nodes": [
{
"id": "install_command",
"descriptions": [
"install firefox",
"install chrome",
"setup software",
"add application"
],
"emit": {
"action": "install"
},
"children": [
{
"id": "software",
"descriptions": ["firefox", "chrome", "brave", "vlc"],
"emit": {
"software": "$raw"
}
}
]
}
]
}
Nodes
Each node has an id, a list of descriptions used to generate its embedding, an optional emit map of key-value pairs, and an optional children array of child nodes.
The parser scans the input using descending n-gram windows. When a span scores above the threshold, the span is closed (not re-subdivided) and the match is recorded. If the matched node has children, the same span is scanned recursively against the child index to find refinements.
Emit values
Emit values are literal strings. The special value "$raw" is resolved at runtime to the raw text of the matched span.
Public API
#include "ipa.h"
/* Open a compiled schema resource */
kc_ipa_options_t opts = kc_ipa_options_default();
opts.schema_path = strdup("path/to/schema.ipa");
kc_ipa_schema_t *schema = NULL;
kc_ipa_open(&schema, &opts);
kc_ipa_listen_signals(schema);
/* Parse input */
kc_ipa_result_t result;
kc_ipa_parse(schema, "install firefox", &result);
/* Use result */
if (result.ok) {
printf("id: %s\n", result.matches[0].id);
}
/* Free result and schema */
kc_ipa_result_free(&result);
kc_ipa_close(schema);
kc_ipa_options_free(&opts);
Lifecycle
kc_ipa_open()- maps the.iparesource into memory and builds runtime HNSW indexes.kc_ipa_parse()- scans input for semantic islands and scores them against the schema. Thread-safe.kc_ipa_result_free()- releases all strings and arrays owned by a result.kc_ipa_close()- releases all resources owned by the schema handle.kc_ipa_build()- compiles aschema.jsonmanifest into a.ipabinary resource.
Build
Compiled artifacts are generated under bin/{arch}/{platform}/ for the host architecture running the build. The build compiles the vendored source snapshots from lib/, including lib/ggml and lib/model.gguf, and does not require external kclib archives.
make clean && make
Multiarch Builds
The project is prepared to build artifacts for multiple architectures under bin/{arch}/{platform}/. A plain make builds only the current host architecture, while the targets below build the full matrix or a specific target.
make all
make x86_64/linux
make x86_64/windows
make i686/linux
make i686/windows
make aarch64/linux
make aarch64/android
make armv7/linux
make armv7/android
make armv7hf/linux
make riscv64/linux
make powerpc64le/linux
make mips/linux
make mipsel/linux
make mips64el/linux
make s390x/linux
make loongarch64/linux
Dependencies
| Path | Description |
|---|---|
lib/ggml/ | Tensor computation library for machine learning |
lib/emb.h, lib/libemb.c | Vector embeddings library |
lib/hnsw.h, lib/libhnsw.c | Approximate nearest neighbor search library |
lib/mmap.h, lib/libmmap.c | Memory-mapped I/O library |
lib/ngram.h | N-gram text processing library |
lib/model.gguf | Embedded model weights |
Beta Notice
This is a beta project tested only on Debian x86_64. It was created out of a personal need for these libraries, but no guarantees are provided regarding its stability or future support. You are free to test it, use it, and modify it as you please.
If you'd like to reach out, you can send an email to [email protected]. Please note that I do not accept pull requests; the goal is to avoid long-term dependency on platforms like GitHub, and I do not maintain fixed infrastructure to guarantee long-term stability for these projects.
Repo
You can download the repository and read the most up-to-date documentation directly from its official source.
GitHub: kaisarcode/ipa.c
License
This project is distributed under the GNU General Public License version 3 (GPLv3).
