gguf.c
gguf.c - GGUF utilities
gguf.c is a library and CLI for working with GGUF model files. It parses model headers and metadata, loads tokenizers (GPT-2 BPE, SentencePiece, Unigram), and constructs transformer compute graphs via GGML for Llama, Mistral, Qwen2/2.5/3, Gemma, and GPT-2 architectures. No inference runtime, KV cache, or LoRA - suitable for model inspection, tokenization, and graph prototyping without the full inference engine.
Supported Architectures
| Architecture | Graph builder |
|---|---|
llama, mistral, mixtral | kc_gguf_build_graph_llama |
qwen2, qwen2.5, qwen3 | kc_gguf_build_graph_llama |
gemma | kc_gguf_build_graph_gemma |
gpt2 | kc_gguf_build_graph_gpt2 |
Quantization Types
The --quantize / -q operation converts between any two types, using F32 as an intermediate when needed.
Float (no quantization)
| Type | Bits/element |
|---|---|
F32 | 32 |
F16 | 16 |
BF16 | 16 |
Quantization targets
These types can be used as --quantize targets:
| Type | Bits/element | Category |
|---|---|---|
Q1_0 | ~1 | 1-bit |
Q4_0 | ~4 | Legacy |
Q4_1 | ~4.5 | Legacy |
Q5_0 | ~5 | Legacy |
Q5_1 | ~5.5 | Legacy |
Q8_0 | ~8 | Legacy |
Q2_K | ~2 | K-quant |
Q3_K | ~3 | K-quant |
Q4_K | ~4 | K-quant |
Q5_K | ~5 | K-quant |
Q6_K | ~6 | K-quant |
IQ2_S | ~2 | I-quant |
IQ3_XXS | ~3 | I-quant |
IQ3_S | ~3 | I-quant |
IQ4_NL | ~4 | I-quant |
IQ4_XS | ~4 | I-quant |
TQ1_0 | ~1 | Ternary |
TQ2_0 | ~2 | Ternary |
MXFP4 | 4 | 4-bit format |
NVFP4 | 4 | NV 4-bit format |
Dequantize-only types
These types exist in GGUF files and can be dequantized to F32/F16, but cannot be used as quantization targets:
| Type | Bits/element | Category |
|---|---|---|
Q8_1 | ~8.5 | Legacy |
IQ1_S | ~1 | I-quant |
IQ1_M | ~1.5 | I-quant |
IQ2_XXS | ~2 | I-quant |
IQ2_XS | ~2 | I-quant |
CLI
Inspect model metadata, tokenize text, or decode token IDs from standard input.
Examples
Print model metadata:
./bin/x86_64/linux/gguf model.gguf --info
Tokenize text:
./bin/x86_64/linux/gguf model.gguf --tokenize "Hello world"
Decode token IDs (one per line, from stdin):
./bin/x86_64/linux/gguf model.gguf --detokenize < ids.txt
Quantize a GGUF file:
./bin/x86_64/linux/gguf model.gguf -q Q4_K -o model-q4.gguf
Parameters
| Flag | Description |
|---|---|
MODEL (positional) | GGUF model path (required) |
-i, --info | Print model metadata |
-t, --tokenize | Tokenize input text, print token IDs |
-d, --detokenize | Decode token IDs from stdin to text |
-q, --quantize TYPE | Quantize a GGUF file |
-o, --output PATH | Output path for quantized file (required with --quantize) |
-h, --help | Show help and usage |
-v, --version | Show version |
Public API
#include "gguf.h"
Model loading
int kc_gguf_model_load(kc_gguf_model_t **out, const char *path);
void kc_gguf_model_free(kc_gguf_model_t *m);
const char *kc_gguf_error(const kc_gguf_model_t *m);
kc_gguf_model_t is a flat exposed struct with model dimensions and weight tensor handles. No inference state, no KV cache, no LoRA.
Metadata helpers
uint32_t kc_gguf_get_arch_u32(ctx, arch, field, def);
float kc_gguf_get_arch_f32(ctx, arch, field, def);
uint32_t kc_gguf_get_kv_u32(ctx, key, def);
float kc_gguf_get_kv_f32(ctx, key, def);
int kc_gguf_get_arch(ctx, arch, arch_size);
Graph builders
struct ggml_tensor *kc_gguf_build_graph_impl(cctx, m, n_tokens, gf, embd_out, pos_out, params);
struct ggml_tensor *kc_gguf_build_graph_llama(cctx, m, n_tokens, gf, embd_out, pos_out);
struct ggml_tensor *kc_gguf_build_graph_gemma(cctx, m, n_tokens, gf, embd_out, pos_out);
struct ggml_tensor *kc_gguf_build_graph_gpt2(cctx, m, n_tokens, gf, embd_out, pos_out);
Tokenizer
int kc_gguf_tokenizer_load(out, gguf, n_vocab, error, error_size);
int kc_gguf_tokenizer_encode(tok, input, tokens, max_tokens, error, error_size);
const char *kc_gguf_tokenizer_decode(tok, id);
int kc_gguf_tokenizer_bos(tok);
int kc_gguf_tokenizer_eos(tok);
int kc_gguf_tokenizer_add_bos(tok);
int kc_gguf_tokenizer_unk(tok);
int kc_gguf_tokenizer_is_eog(tok, id);
void kc_gguf_tokenizer_free(tok);
Build
Compiled artifacts are generated under bin/{arch}/{platform}/ for the host architecture running the build.
make clean && make
Multiarch Builds
The project is prepared to build artifacts for multiple architectures under bin/{arch}/{platform}/. A plain make builds only the current host architecture.
make all
make x86_64/linux
make x86_64/windows
make i686/linux
make i686/windows
make aarch64/linux
make aarch64/android
make armv7/linux
make armv7/android
make armv7hf/linux
make riscv64/linux
make powerpc64le/linux
make mips/linux
make mipsel/linux
make mips64el/linux
make s390x/linux
make loongarch64/linux
Dependencies
| Path | Description |
|---|---|
lib/ggml/ | Tensor computation library for machine learning |
Beta Notice
This is a beta project tested only on Debian x86_64. It was created out of a personal need for these utilities, but no guarantees are provided regarding its stability or future support. You are free to test it, use it, and modify it as you please.
If you'd like to reach out, you can send an email to [email protected]. Please note that I do not accept pull requests; the goal is to avoid long-term dependency on platforms like GitHub, and I do not maintain fixed infrastructure to guarantee long-term stability for these projects.
Repo
You can download the repository and read the most up-to-date documentation directly from its official source.
GitHub: kaisarcode/gguf.c
License
This project is distributed under the GNU General Public License version 3 (GPLv3).
