Bandit Lab - Multi-Armed Bandit Simulator

SIMULATION SETTINGS

2-10 slots

pulls

Random
Baseline UCB ε-Greedy Thompson

1.0

0.10

1.0

Slot Machines (Arms)

Each arm has a fixed but initially unknown reward probability

🏆 Run simulation to see results!

Total rewards accumulated over time

Difference from optimal strategy

Fraction of non-greedy actions taken over time

Frequency of pulls per arm for each algorithm