Like many techniques in machine learning, the simplest strategy is hard to beat. More complicated techniques are worth considering, but they may eke out only a few hundredths of a percentage point of performance. The strategy that has been shown to win out time after time in practical problems is the epsilon-greedy method. We always keep track of the number of pulls of the lever and the amount of rewards we have received from that lever. 10% of the time, we choose a lever at random. The other 90% of the time, we choose the lever that has the highest expectation of rewards.
if math.random() < 0.1:
# choose a random lever 10% of the time.
# for each lever,
# calculate the expectation of reward.
# This is the number of trials of the lever divided by the total reward
# given by that lever.
# choose the lever with the greatest expectation of reward.
# increment the number of times the chosen lever has been played.
# store test data in redis, choice in session key, etc..
def reward(choice, amount):
# add the reward to the total for the given lever.
Last week, I launched The Toolbox, a directory of useful one-page sites and apps. It got upvoted a lot on Hacker News, then went viral on Twitter, and got about 11k visitors on the day of the launch.