LLM Bench

Generation Speed

~0 t/s

Prompt Processing Speed

~0 t/s

VRAM Budget

System / Background Apps 0 GB

Chat History Memory 0 GB

AI Brain / Weights 0 GB

Total Runtime Memory 0 GB

System RAM Spill 0 GB

MoE RAM Spill 0 GB

Run Mode

CLI Server

Executable Path

Environment

Model

Quantization

16000 Tokens

8k16k32k64k96k128k256k

VRAM Per GPU (GB)

GPU Count

System RAM (GB)

CPU Physical Cores

Flash Attention Disable KV Offload Disable Prompt Caching Enable Context Shift No MMAP MLock NUMA

Thread Priority

Split Mode

MoE CPU Expert Offload

Host

Port

Parallel Slots

Cache K

Cache V

GPU Bandwidth (GB/s)

RAM Bandwidth (GB/s)

RoPE Scaling

RoPE Freq Base

RoPE Freq Scale

Model Path

Jinja Template

LoRA Path

LoRA Scale

NUMA Mode

Threads Batch

Batch Size

Temp

Top P

Top K

Min P

Repeat Penalty

Mirostat

Benchmark Log

Model	Context	Prompt T/S	Gen T/S	Command