Recommend

Which setup should I run?

Next action

Loading workspace state

InferGrade is checking account, runner, and evidence readiness.

Setup

Get ready to run

One path: account, runner, first benchmark.

Overview

Find the setup to run next, then inspect the evidence behind it.

Start with Recommend for an answer-first setup choice. Explore public evidence or queue a benchmark when you already know what you want to inspect.

How evidence works
Evidence and setup status Checking evidence and runner readiness.
Active runs
syncing
Verified results
usable in decisions
Open blockers
checking
Sign in
Account attached
Pair a runner
Local execution ready
Choose evidence
Recommendation ready
Run or compare
Next action

Recommend

Which setup should I run?

Recent runs

Tracked execution

More tools

Exports and community

Open exports and contributor activity Download evidence snapshots or inspect community activity.

Top contributors

Community evidence stays cumulative and exportable.

Recommendations

Find the setup to run

Why this answer? Plot, table, caveats, and next benchmark. Tradeoffs ready Open for plot, table, caveats, and next run.
Question and filters Known-good questions first, with light scope edits.
Advanced filters
Download data

Explore

Inspect families, setup matches, and evidence

Historical Results

Recent benchmark evidence

Model Backend Use Case TTFT Tok/s Hardware Capability Verification

Compare

Choose between families, variants, and quants

Preset views

Start from a useful model-choice stance, then refine the exact variants or inspect individual runs.

Individual run comparison

Result

Result

Family Explorer

Branches, quants, and nearby matches

Download data

Build

Build the next evidence run

Why run this benchmark

Run the benchmark that would change the answer.

Start from Recommend when possible; otherwise choose a model and evidence lane below.

1 Model 2 Benchmarks 3 Queue
Model

Choose the model first. The goal filter only narrows suggested starters and benchmark hints.

Use public artifacts without connecting Hugging Face.

Benchmark scope

Choose the evidence this run should produce.

Benchmark groups

Adjust related checks together.

Individual checks

Exact checks for this run.

Run details

Optional context for history.

Advanced overrides
Artifact and runtime

Only adjust these if you need an override.

Ontology hints

Most users should keep the inferred values.

Run plan JSON Inspect or export the prepared plan.

Run plan

Ready to queue after preparation.

No run plan prepared yet.

Run Status

Active and recent runs

Recent runs

Live timeline

Saved plans

Reusable runs

My Runs

Contributor activity