KaisarCode

tpm.c

tpm.c - Text Profile Matcher

tpm.c tells you how similar a piece of text is to a reference text. Create a profile from representative samples, then score any input from 0.0 (dissimilar) to 1.0 (very similar).


CLI

Example

Create a map file with Python code:

cat > python.map <<EOF
def hello():
    print("hello world")

Score Python input against it:

echo "def foo(): pass" | ./bin/x86_64/linux/tpm python.map 0.999992


### Demo: use cases

Create profiles for different domains and compare how inputs score against
matching vs mismatching profiles.

**Natural language - English vs Spanish:**

cat > english.map <<EOF The quick brown fox jumps over the lazy dog. Pack my box with five dozen liquor jugs. The five boxing wizards jump quickly. EOF

cat > spanish.map <<EOF El zorro marron salta sobre el perro perezoso. Los exploradores descubrieron una nueva especie. Las civilizaciones antiguas construyeron estructuras magnificas. EOF


| Input | vs english.map | vs spanish.map |
| :---- | :------------: | :------------: |
| "The quick brown fox jumps over the lazy dog near the river bank." | 0.445819 | 0.000334 |
| "El zorro marron salta sobre el perro perezoso cerca del arroyo." | 0.001779 | 0.131115 |

**Programming language - Python vs JavaScript:**

Tells Python and JS syntax apart. Python input scores higher on the Python
profile than on the JS profile.

| Input | vs python.map | vs javascript.map |
| :---- | :-----------: | :---------------: |
| `def add(a, b): return a + b` | 0.164708 | 0.048800 |
| `function add(a, b) { return a + b; }` | 0.055775 | 0.298180 |

**Log format - Apache vs Syslog:**

Distinguishes HTTP access logs from system logs by their line structure.

| Input | vs apache.map | vs syslog.map |
| :---- | :-----------: | :-----------: |
| '127.0.0.1 - admin [12/May/2026:08:30:00 +0000] "GET /dashboard HTTP/1.1" 200 4567' | 0.390322 | 0.000875 |
| 'May 12 08:30:00 laptop kernel: PCI device enabled for power management' | 0.001641 | 0.016910 |

### Parameters

| Flag | Description |
| :--- | :--- |
| `-n <size>` | N-gram size (default 3, max 8) |
| `-h`, `--help` | Show help and usage |
| `-v`, `--version` | Show version |

### Input

Map file: one or more lines of representative text. Stdin: text to score.

### Output

A single line with the score formatted to six decimal places:

0.763264


### Exit codes

| Code | Meaning |
| :--- | :------ |
| 0 | Success |
| 1 | Error (missing map, invalid args, I/O failure, profile build error) |

---

## Public API

### Types

typedef struct kc_tpm kc_tpm_t;


### Status codes

| Symbol | Value |
| :----- | :---- |
| `KC_TPM_OK` | 0 |
| `KC_TPM_ERROR` | -1 |

### Functions

| Function | Returns | Description |
| :------- | :------ | :---------- |
| `kc_tpm_open(void)` | `kc_tpm_t *` | Allocate a new context. Returns NULL on failure. |
| `kc_tpm_build(tpm, map_text, ngram_size)` | `int` | Build an n-gram profile from map text. `ngram_size` must be 1–8. |
| `kc_tpm_score(tpm, input_text)` | `double` | Score input text against the built profile. Returns 0.0–1.0. |
| `kc_tpm_close(tpm)` | `void` | Free the context. Safe on NULL. |

### Lifecycle

kc_tpm_t *t = kc_tpm_open(); kc_tpm_build(t, map_text, 3); double score = kc_tpm_score(t, input_text); kc_tpm_close(t);


---

## Build

Compiled artifacts are generated under `bin/{arch}/{platform}/` for the host
architecture running the build.

make clean && make


### Multiarch Builds

The project is prepared to build artifacts for multiple architectures under
`bin/{arch}/{platform}/`. A plain `make` builds only the current host
architecture, while the targets below build the full matrix or a specific
target.

make all make x86_64/linux make x86_64/windows make i686/linux make i686/windows make aarch64/linux make aarch64/android make armv7/linux make armv7/android make armv7hf/linux make riscv64/linux make powerpc64le/linux make mips/linux make mipsel/linux make mips64el/linux make s390x/linux make loongarch64/linux


---

## License

[![GPLv3](https://www.gnu.org/graphics/gplv3-127x51.png)](https://www.gnu.org/licenses/gpl-3.0.html)

This project is distributed under the **GNU General Public License version 3 (GPLv3)**. 

---

## Repo

**GitHub:** [kaisarcode/tpm.c](https://github.com/kaisarcode/tpm.c)