KaisarCode

gguf.c

gguf.c - GGUF utilities

gguf.c is a library and CLI for working with GGUF model files. It parses model headers and metadata, loads tokenizers (GPT-2 BPE, SentencePiece, Unigram), and constructs transformer compute graphs via GGML for Llama, Mistral, Qwen2/2.5/3, Gemma, and GPT-2 architectures. No inference runtime, KV cache, or LoRA - suitable for model inspection, tokenization, and graph prototyping without the full inference engine.


Supported Architectures

ArchitectureGraph builder
llama, mistral, mixtralkc_gguf_build_graph_llama
qwen2, qwen2.5, qwen3kc_gguf_build_graph_llama
gemmakc_gguf_build_graph_gemma
gpt2kc_gguf_build_graph_gpt2

Quantization Types

The --quantize / -q operation converts between any two types, using F32 as an intermediate when needed.

Float (no quantization)

TypeBits/element
F3232
F1616
BF1616

Quantization targets

These types can be used as --quantize targets:

TypeBits/elementCategory
Q1_0~11-bit
Q4_0~4Legacy
Q4_1~4.5Legacy
Q5_0~5Legacy
Q5_1~5.5Legacy
Q8_0~8Legacy
Q2_K~2K-quant
Q3_K~3K-quant
Q4_K~4K-quant
Q5_K~5K-quant
Q6_K~6K-quant
IQ2_S~2I-quant
IQ3_XXS~3I-quant
IQ3_S~3I-quant
IQ4_NL~4I-quant
IQ4_XS~4I-quant
TQ1_0~1Ternary
TQ2_0~2Ternary
MXFP444-bit format
NVFP44NV 4-bit format

Dequantize-only types

These types exist in GGUF files and can be dequantized to F32/F16, but cannot be used as quantization targets:

TypeBits/elementCategory
Q8_1~8.5Legacy
IQ1_S~1I-quant
IQ1_M~1.5I-quant
IQ2_XXS~2I-quant
IQ2_XS~2I-quant

CLI

Inspect model metadata, tokenize text, or decode token IDs from standard input.

Examples

Print model metadata:

./bin/x86_64/linux/gguf model.gguf --info

Tokenize text:

./bin/x86_64/linux/gguf model.gguf --tokenize "Hello world"

Decode token IDs (one per line, from stdin):

./bin/x86_64/linux/gguf model.gguf --detokenize < ids.txt

Quantize a GGUF file:

./bin/x86_64/linux/gguf model.gguf -q Q4_K -o model-q4.gguf

Parameters

FlagDescription
MODEL (positional)GGUF model path (required)
-i, --infoPrint model metadata
-t, --tokenizeTokenize input text, print token IDs
-d, --detokenizeDecode token IDs from stdin to text
-q, --quantize TYPEQuantize a GGUF file
-o, --output PATHOutput path for quantized file (required with --quantize)
-h, --helpShow help and usage
-v, --versionShow version

Public API

#include "gguf.h"

Model loading

int kc_gguf_model_load(kc_gguf_model_t **out, const char *path);
void kc_gguf_model_free(kc_gguf_model_t *m);
const char *kc_gguf_error(const kc_gguf_model_t *m);

kc_gguf_model_t is a flat exposed struct with model dimensions and weight tensor handles. No inference state, no KV cache, no LoRA.

Metadata helpers

uint32_t kc_gguf_get_arch_u32(ctx, arch, field, def);
float   kc_gguf_get_arch_f32(ctx, arch, field, def);
uint32_t kc_gguf_get_kv_u32(ctx, key, def);
float   kc_gguf_get_kv_f32(ctx, key, def);
int     kc_gguf_get_arch(ctx, arch, arch_size);

Graph builders

struct ggml_tensor *kc_gguf_build_graph_impl(cctx, m, n_tokens, gf, embd_out, pos_out, params);
struct ggml_tensor *kc_gguf_build_graph_llama(cctx, m, n_tokens, gf, embd_out, pos_out);
struct ggml_tensor *kc_gguf_build_graph_gemma(cctx, m, n_tokens, gf, embd_out, pos_out);
struct ggml_tensor *kc_gguf_build_graph_gpt2(cctx, m, n_tokens, gf, embd_out, pos_out);

Tokenizer

int         kc_gguf_tokenizer_load(out, gguf, n_vocab, error, error_size);
int         kc_gguf_tokenizer_encode(tok, input, tokens, max_tokens, error, error_size);
const char *kc_gguf_tokenizer_decode(tok, id);
int         kc_gguf_tokenizer_bos(tok);
int         kc_gguf_tokenizer_eos(tok);
int         kc_gguf_tokenizer_add_bos(tok);
int         kc_gguf_tokenizer_unk(tok);
int         kc_gguf_tokenizer_is_eog(tok, id);
void        kc_gguf_tokenizer_free(tok);

Build

Compiled artifacts are generated under bin/{arch}/{platform}/ for the host architecture running the build.

make clean && make

Multiarch Builds

The project is prepared to build artifacts for multiple architectures under bin/{arch}/{platform}/. A plain make builds only the current host architecture.

make all
make x86_64/linux
make x86_64/windows
make i686/linux
make i686/windows
make aarch64/linux
make aarch64/android
make armv7/linux
make armv7/android
make armv7hf/linux
make riscv64/linux
make powerpc64le/linux
make mips/linux
make mipsel/linux
make mips64el/linux
make s390x/linux
make loongarch64/linux

Dependencies

PathDescription
lib/ggml/Tensor computation library for machine learning

Beta Notice

This is a beta project tested only on Debian x86_64. It was created out of a personal need for these utilities, but no guarantees are provided regarding its stability or future support. You are free to test it, use it, and modify it as you please.

If you'd like to reach out, you can send an email to [email protected]. Please note that I do not accept pull requests; the goal is to avoid long-term dependency on platforms like GitHub, and I do not maintain fixed infrastructure to guarantee long-term stability for these projects.


Repo

You can download the repository and read the most up-to-date documentation directly from its official source.

GitHub: kaisarcode/gguf.c

License

GPLv3

This project is distributed under the GNU General Public License version 3 (GPLv3).