The agent-native tabular foundation model

Calibrated predictions for tabular data, inside the agent stack.

PredictLM is the smallest open-weight tabular foundation model with calibrated uncertainty out of the box. 13M parameters, Apache-2.0, runs on a laptop. Ships as a Python sklearn-style class and a Claude / Cursor / Continue MCP tool — give your LLM agent a real numerical co-processor instead of letting it hallucinate predictions. Deployable on your own infrastructure for healthcare, finance, and public-sector workloads that can't ship data offshore.

Try the Mini model Read the model card

Parameters: 13M / 26M
License: Apache-2.0
Uncertainty: Calibrated
MCP tool: Day-1 ready

How it works

Three steps. One forward pass.

No training loop. No hyperparameter search. PredictLM treats your labeled rows as context and conditions on them the way a language model conditions on a prompt.

01
Context
Pass a small set of labeled rows from your tabular dataset — typically 5 to 500. Mixed numeric and categorical features are handled natively.
02
In-context inference
The transformer attends across features and rows simultaneously, building an implicit predictor for your dataset on the fly.
03
Predict
Predictions for new rows come out immediately, with calibrated uncertainty — full predictive distributions for regression, class probabilities for classification.

The model family

Two checkpoints. Same interface.

Base prioritizes raw accuracy. Mini is tuned to run on a laptop CPU — and stays statistically tied with Base on classification.

PredictLM

Base 26M

Highest accuracy

Parameters26M
Checkpoint105 MB
LicenseApache-2.0

Evaluation on 25 OpenML datasets

Regression

R² 0.589 mean

R² 0.755 median

Classification

acc 0.685 mean

acc 0.799 median

Competitive with XGBoost on regression — point estimate ahead but CI overlap is within sampling noise.

Model card on Hugging Face

PredictLM

Mini 13M

Recommended · edge / CPU

Parameters13M
Checkpoint~54 MB
LicenseApache-2.0

Versus Base — same 25-dataset benchmark

Classification

Tied delta −0.001

CI [−0.027, +0.029]

Regression

−4 pp R²

CI [−6.5, −1.5]

~95% of Base's quality at half the size. Runs comfortably on CPU.

Model card on Hugging Face

Quickstart

From `pip install` to predictions in under a minute.

Standard transformers loading. No custom training loop, no per-dataset config. Works on CPU for Mini.

predict.pypython

# pip install predictlm
from predictlm import PredictLM

# One model object — the partner ckpt is downloaded on first .predict()
# and the package runs the published Duo + TTT ensemble under the hood.
model = PredictLM.from_pretrained("zerooneresearch/predictlm-mini-13m")

# Your data: ~50 labeled rows + new rows to predict
# X_train, y_train, X_test as numpy arrays or pandas

# Just .fit().predict() — returns the 0.751 cls / 0.609 reg result
# on the locked 25-dataset OpenML benchmark.
preds = model.fit(X_train, y_train).predict(X_test)

# Need single-model latency? Pass auto_duo=False at load:
# model = PredictLM.from_pretrained(..., auto_duo=False)  # ~0.673 cls

Use cases

Built for the spots where traditional ML breaks down.

Tabular ML without training

Drop-in regression or classification on a brand-new dataset in seconds. Skip the hyperparameter search, skip the cross-validation harness — just call predict.

Small-data settings

Sweet spot is roughly 10–500 labeled rows — the regime where boosted trees often overfit and deep models simply don't converge.

Calibrated uncertainty

Our calibrated regression head returns a full predictive distribution over the target — not just a point estimate. Useful for risk-aware downstream decisions.

Evaluation

Recipe comparison

Mean and median performance across the locked 25-dataset OpenML benchmark (10 regression + 10 classification, fair-set filter n_features ≤ 128, seed = 42). The PredictLM weights are the same in every row below — only the inference recipe changes.

Model	Params	Reg. R² (mean)	Reg. R² (median)	Cls. acc (mean)	Cls. acc (median)
PredictLM-Mini (zero-tune)	13M	0.536	0.657	0.673	0.747
PredictLM-Base (zero-tune)	26M	0.589	0.755	0.685	0.799
Mini + TTT	13M	0.595	0.661	0.742	0.733
Base + TTT	26M	0.608	0.671	0.748	0.756
Duo + TTT (w=0.40)	13M + 26M	0.609	0.673	0.751	0.763
XGBoost (baseline)	—	0.561	0.744	0.679	0.793

Bold entries mark the per-column best. The bottom three rows use the same Mini and Base weights as the top two — only the inference recipe differs. TTT applies ~15 inner Adam steps of self-supervised fine-tuning on the user's in-context examples per task.

Citation

Citing PredictLM.

If PredictLM helps your work, a citation is appreciated.

predictlm.bibbibtex

@misc{predictlm2026,
  author = {Svoboda, M. and collaborators},
  title  = {PredictLM: Tabular Foundation Models for In-Context Prediction},
  year   = {2026},
  url    = {https://huggingface.co/zerooneresearch}
}

Read related research

Calibrated predictions for tabular data, inside the agent stack.

Context

In-context inference

Predict

Tabular ML without training

Small-data settings

Calibrated uncertainty