SIMULATION SETTINGS
Environment
2-10 slots
pulls
Algorithms
UCB Parameters
ε-Greedy Parameters
Thompson Parameters
Slot Machines (Arms)
Each arm has a fixed but initially unknown reward probability
🏆
Run simulation to see results!
Cumulative Reward
Total rewards accumulated over time
Cumulative Regret
Difference from optimal strategy
Exploration Rate
Fraction of non-greedy actions taken over time
Arm Selection Distribution
Frequency of pulls per arm for each algorithm