Skip to content
Zero One Research/PredictLM
The agent-native tabular foundation model · Built in the EU

Calibrated predictions for tabular data, inside the agent stack.

PredictLM is the smallest open-weight tabular foundation model with calibrated uncertainty out of the box. 13M parameters, Apache-2.0, runs on a laptop. Ships as a Python sklearn-style class and a Claude / Cursor / Continue MCP tool — give your LLM agent a real numerical co-processor instead of letting it hallucinate predictions. EU-based and GDPR-native — deployable for healthcare, finance, and public-sector workloads that can't ship data offshore.

Parameters
13M / 26M
License
Apache-2.0
Uncertainty
Calibrated
MCP tool
Day-1 ready
How it works

Three steps. One forward pass.

No training loop. No hyperparameter search. PredictLM treats your labeled rows as context and conditions on them the way a language model conditions on a prompt.

  1. 01

    Context

    Pass a small set of labeled rows from your tabular dataset — typically 5 to 500. Mixed numeric and categorical features are handled natively.

  2. 02

    In-context inference

    The transformer attends across features and rows simultaneously, building an implicit predictor for your dataset on the fly.

  3. 03

    Predict

    Predictions for new rows come out immediately, with calibrated uncertainty — full predictive distributions for regression, class probabilities for classification.

The model family

Two checkpoints. Same interface.

Base prioritizes raw accuracy. Mini is tuned to run on a laptop CPU — and stays statistically tied with Base on classification.

PredictLM

Base 26M

Highest accuracy
  • Parameters26M
  • Checkpoint105 MB
  • LicenseApache-2.0

Evaluation on 25 OpenML datasets

Regression

R² 0.589 mean

R² 0.755 median

Classification

acc 0.685 mean

acc 0.799 median

Competitive with XGBoost on regression — point estimate ahead but CI overlap is within sampling noise.

PredictLM

Mini 13M

Recommended · edge / CPU
  • Parameters13M
  • Checkpoint~54 MB
  • LicenseApache-2.0

Versus Base — same 25-dataset benchmark

Classification

Tied delta −0.001

CI [−0.027, +0.029]

Regression

−4 pp

CI [−6.5, −1.5]

~95% of Base's quality at half the size. Runs comfortably on CPU.

Quickstart

From pip install to predictions in under a minute.

Standard transformers loading. No custom training loop, no per-dataset config. Works on CPU for Mini.

predict.pypython
# pip install predictlm
from predictlm import PredictLM

# One model object — the partner ckpt is downloaded on first .predict()
# and the package runs the published Duo + TTT ensemble under the hood.
model = PredictLM.from_pretrained("zerooneresearch/predictlm-mini-13m")

# Your data: ~50 labeled rows + new rows to predict
# X_train, y_train, X_test as numpy arrays or pandas

# Just .fit().predict() — returns the 0.751 cls / 0.609 reg result
# on the locked 25-dataset OpenML benchmark.
preds = model.fit(X_train, y_train).predict(X_test)

# Need single-model latency? Pass auto_duo=False at load:
# model = PredictLM.from_pretrained(..., auto_duo=False)  # ~0.673 cls
Use cases

Built for the spots where traditional ML breaks down.

Tabular ML without training

Drop-in regression or classification on a brand-new dataset in seconds. Skip the hyperparameter search, skip the cross-validation harness — just call predict.

Small-data settings

Sweet spot is roughly 10–500 labeled rows — the regime where boosted trees often overfit and deep models simply don't converge.

Calibrated uncertainty

Our calibrated regression head returns a full predictive distribution over the target — not just a point estimate. Useful for risk-aware downstream decisions.

Evaluation

Recipe comparison

Mean and median performance across the locked 25-dataset OpenML benchmark (10 regression + 10 classification, fair-set filter n_features ≤ 128, seed = 42). The PredictLM weights are the same in every row below — only the inference recipe changes.

ModelParamsReg. R² (mean)Reg. R² (median)Cls. acc (mean)Cls. acc (median)
PredictLM-Mini (zero-tune)13M0.5360.6570.6730.747
PredictLM-Base (zero-tune)26M0.5890.7550.6850.799
Mini + TTT13M0.5950.6610.7420.733
Base + TTT26M0.6080.6710.7480.756
Duo + TTT (w=0.40)13M + 26M0.6090.6730.7510.763
XGBoost (baseline)0.5610.7440.6790.793

Bold entries mark the per-column best. The bottom three rows use the same Mini and Base weights as the top two — only the inference recipe differs. TTT applies ~15 inner Adam steps of self-supervised fine-tuning on the user's in-context examples per task.

Citation

Citing PredictLM.

If PredictLM helps your work, a citation is appreciated.

predictlm.bibbibtex
@misc{predictlm2026,
  author = {Svoboda, M. and collaborators},
  title  = {PredictLM: Tabular Foundation Models for In-Context Prediction},
  year   = {2026},
  url    = {https://huggingface.co/zerooneresearch}
}