What Is a Multi-Armed Bandit?

We have quite an unusual topic for today’s article. However, do not think the word “confusing” makes you anxious about the topic. After all, we are here to help you understand better and make it way more “simple” instead of “confusing”. We will begin with the main question of the article: What is Multi-Armed Bandit? Along with the main question, we will explain the reason why MAB (Multi-Armed Bandit) is seen as problematic. Is it because MAB is confusing to the point people see it as a problem? We are about to find out. Then we are going to talk about the testing of MAB. Next, we will discuss the possible strategies we can exploit from it. And finally, we will talk about the MAB applications that are used in the real world. 

What Is a Multi-Armed Bandit? 

Before we dive into the middle of the topic, it is better for us to explain it with an example. Let’s give this example of yourself for better understanding. Imagine yourself walking into a casino with the money you have and exchanging the money with casino tokens. Your purpose here is obvious: walk out of the casino with doubled money. However, you soon realize the fact that something is off, neither in a good way nor in a bad way. There is always a chance for you to hit the jackpot because this casino only has slot machines. There are three slot machines, and each one works individually from one another. Thus, you think that every one of the slot machines has different rewards. However, you clearly have no idea how everyone works because you are only assuming.  

You can just choose the first one and put all your tokens in the first one. Who knows, maybe you will double your money. Or you can distribute all your tokens to every slot machine to play every slot machine equally. You would indeed spend your tokens on the low-payout machines, but in exchange, you would also invest some of them in the high-payout machines simultaneously. However, what if the first slot machine only gave you high payouts at first? The second one was actually the only slot with a high chance of jackpot probability. In the end, you gathered information about three slot machines. 

So basically, Multi-Armed Bandit (MAB) is a decision-making algorithm that one needs to explore to be accurate about their decision and exploit to obtain more rewards. 

Why Is MAB Considered to be a Problem Algorithm? 

Multi-Armed Bandit is generally known as the casino story where a gambler needs to decide which one of the slot machines has its reward probabilities and needs to figure out which one will be the best result for them to get the best reward out of these slot machines. So, the “problem” constantly described as a real struggle is being obliged to be in the decision-making phase. Meaning that what makes the problem unbearable lies behind its challenging nature. 

So what makes the MAB challenging to the point that it is considered a “problem”? Let’s look at the various reasons why Multi-Armed Bandit is considered to be challenging: 

Exploration and Exploitation 

MAB is the sole example of the exploration and exploitation dilemma. In order to hit the jackpot, you must understand how to observe to efficiently explore the basics of the algorithm. When you get to the point where you are certain, you will make your choice according to what you achieved throughout the exploration phase. Also, you will use that information to increase your chance of getting more rewards.

Limited Resources 

Limited resources such as money and time must be used wisely towards the choices you are about to make. Trying to figure out how to balance these two main resources and when to commit your actions can be challenging. 

Risk and Reward 

One of the most challenging parts of the Multi-Armed Bandit algorithm is simply the act of being indecisive. There is no room for inconstant actions. You need to choose if you want to try your luck or if you want to play safe. However, if you are not accurate about your decision, you may miss the chance to earn big. Sometimes it is better to depend on your explorations and take the risk. You still have a chance to lose, but at least you were sure about your decision. 

Inconsistent Earnings 

As we mentioned at the beginning of this article, after you trusted your information and decided to try your chance, you hit the jackpot at first. However, the same decision for your next games may not end up just like what happened in your previous slot machine game. 

What Is MAB Testing? 

Multi-Armed Bandit testing is an experimentation technique often used in A/B testing. Its purpose is to split up the resources in the testing. To put it simply, MAB testing aims to direct missing resources by splitting up the traffic and increasing the variant performance during the testing. That means the split-up phase act as a balance between exploitation and exploration.  

We understand the fundamentals of MAB testing. However, how is that testing “process” work? The answer is simple: Just like in the casino scenario. 

Let’s look at how the MAB testing phase works: 

  1. Every variant is adjusted to be equal to work in balance in traffic. 
  2. After the adjustment, certain parts of the variants are monitored in case something unexpected happens. 
  3. The algorithm keeps an eye on every variant according to the constant flow of data. Simply put, there is no variant that works more efficiently than others. Every variant works equally. 
  4. The algorithm keeps every variant under control while performing every one efficiently and gathers data about the usage of the data in a good way. 

One thing that makes MAB testing helpful is that when every variant is faced with different circumstances, they improve and take precautions for whoever uses it. 

What Are the Strategies? 

The Multi-Armed Bandit algorithm has many strategies for better decisions in different circumstances. Each strategy has its own strong sides and weaknesses at the same time. However, the MAB algorithm performs better because it can adapt and improve itself by taking better precautions for each situation. Depending on each situation, the algorithm divides the resources into different variants. 

The “strategies” used in Multi-Armed Bandit testing can also be considered as each algorithm. However, we will name it “strategies” for the MAB algorithm for better understanding.  

In the process of Multi-Armed Bandit testing, several “strategies” are used in the testing phase to take precautions for each situation. When the algorithm encounters a situation that it has never experienced before, it will improve itself and use a different strategy for a certain event that occurred in a certain period.  

Let’s look at the number of strategies or algorithms that are used in MAB testing: 


Just like we gave an example out of the slot machines, Epsilon Greedy strategy is the first and simple strategy where you first try to figure out and explore which slot machines give the best reward, and according to your exploration at the beginning of the game. You would distribute your tokens equally to each slot machine and observe which gives the best reward. As the name “greedy” suggests, after you gather information, you give your every token to the one you think is giving the best reward. The algorithm chooses the highest-performing variant after exploring the other variants randomly. 

Upper Confidence Bound (UCB) 

Upper Confidence Bound (UCB) is a strategy that is often used to balance two main pillars of the MAB algorithm: exploration and exploitation. The Upper Confidence Bound strategy aims to gather data about the potential rewards from each variant; thus, these rewards are expected to be maximized. 

Thompson Sampling 

Unlike the Epsilon Greedy strategy often used in Multi-Armed Bandit testing, Thompson’s Sampling strategy is a probabilistic strategy that balances the requirement to explore every variant. Simply put, Thompson Sampling distributes the need to search for every variant and aims to gather better results for their equal exploration.  

Softmax Selection 

The softmax strategy’s purpose is to transfer the probability to every variant according to the gathered information about their expected rewards. The chosen variant depends on the variant’s performance. 

What Are the Real World Applications of MAB? 

Even though the Multi-Armed Bandit algorithm is seen as a problem in terms of its challenging nature, MAB also has real-world applications. There are several applications that the Multi-Armed Bandit algorithm has, so let’s look at these applications: 

Medical Technology 

We all know that health comes first, and the development in the field of medicine is helpful for today’s technology, but there is room for improvement because medical technology can still go beyond our expectations and be the solution for incurable diseases.  

By far, one of the most important uses of Multi-Armed Bandit is in the field of medicine. In order to determine the type of disease, the Multi-Armed Bandit algorithm designates every patient to their proper treatments. That is not only helpful for identifying the diseases in such a short time, but also it is helpful for decreasing the life threat risk. 

Online Advertisements 

The multi-Armed Bandit algorithm is used in online advertising websites to show users their ad preferences according to their search history and frequently visited websites by determining recommended ads for them. The main goal is learning the most popular click rates for certain websites, and thus, this results in more financial profit for certain website advertisers. 

Gaming Industry 

You did not read it wrong. The multi-Armed Bandit algorithm has actually been helpful in the gaming industry for a very long time. Multi-Armed Bandit algorithms are used to adjust in-game difficulty levels, in-game rewards, in-game missions, in-game global chat, and in-game anti-cheat to provide a safe, reliable environment for players to have fun while experiencing gameplay at the fullest.  

A/B Testing 

A/B Testing in the Multi-Armed Bandit algorithm provides better performance for online websites. Unlike the Multi-Armed Bandit algorithm, which allocates the traffic to each variant equally, A/B Testing decides to adapt the traffic and provide faster performance. 

Communication Systems 

After the importance of MAB in the medical field, communication systems that are used in the Multi-Armed Bandit algorithms are as effective as in the medical field. Multi-Armed Bandit algorithms can increase communication network performance for a better user experience by adjusting communication channels.  

The applications that we explained above are only the most popular usage of the Multi-Armed Bandit algorithm that provides a more effective way to today’s technology. As we mentioned before, one of the strong sides of Multi-Armed Bandit algorithms is being able to adapt to different situations by using proper strategies for a certain situation that occurred. This means that Multi-Armed Bandit algorithms can actually have unlimited usage for almost every field. 

Multi-Armed Bandit: Keep up With The Situations 

Multi-Armed Bandit is the decision-making algorithm. The more MAB explores, the more the choices it makes will be accurate and committed to exploiting. Even if the complex logic of the algorithm lies behind its challenging nature, there are a number of strategies in MAB which is allowing to find a solution to different situations. We can clearly see that the algorithm’s logic even applies to real-world situations.

دیدگاهتان را بنویسید