LLM Bench
Reset State
Generation Speed
~0 t/s
Prompt Processing Speed
~0 t/s
VRAM Budget
System / Background Apps
0
GB
Chat History Memory
0
GB
AI Brain / Weights
0
GB
Total Runtime Memory
0 GB
System RAM Spill
0 GB
MoE RAM Spill
0 GB
Run Mode
CLI
Server
Executable Path
Environment
GPU
CPU
Model
Quantization
Thinking Model Controls
Disable Reasoning
16000
Tokens
8k
16k
32k
64k
96k
128k
256k
VRAM Per GPU (GB)
GPU Count
System RAM (GB)
CPU Physical Cores
Flash Attention
Disable KV Offload
Disable Prompt Caching
Enable Context Shift
No MMAP
MLock
NUMA
Thread Priority
?
Normal
Medium
High
Realtime
Split Mode
?
Layer (Default)
Row
Tensor
MoE CPU Expert Offload
Host
Port
Parallel Slots
Cache K
f16
q8_0
q4_0
Cache V
f16
q8_0
q4_0
GPU Bandwidth (GB/s)
RAM Bandwidth (GB/s)
RoPE Scaling
None
Linear
Yarn
RoPE Freq Base
RoPE Freq Scale
Model Path
Jinja Template
LoRA Path
LoRA Scale
NUMA Mode
Distribute
Isolate
Numactl
Threads Batch
Batch Size
Temp
Top P
Top K
Min P
Repeat Penalty
Mirostat
0
1
2
Copy Command
Log Current Estimate
Benchmark Log
Copy Results
Model
Context
Prompt T/S
Gen T/S
Command