Is reinforcement (machine) learning overhyped?

Imagine you are about to sit down to play a game with a friend. But this isn’t just any friend – it’s a computer program that doesn’t know the rules of the game. It does, however, understand that it has a goal, and that goal is to win.

Because this friend doesn’t know the rules, it starts by making random moves. Some of them make absolutely no sense, and winning for you is easy. But let’s just say you enjoy playing with this friend so much that you decide to devote the rest of your life (and future lives if you believe in that idea) to exclusively playing this game.

The digital friend will eventually win because it gradually learns the winning moves required to beat you. This scenario may seem far-fetched, but it should give you a basic idea of how reinforcement learning (RL) – an area of machine learning (ML) – roughly works.

Just How Intelligent is Reinforcement Learning?

Human intelligence encompasses many characteristics, including the attainment of knowledge, a desire to expand intellectual capacity, and intuitive thinking. Our capacity for intelligence, however, was largely questioned when Garry Kasparov, a champion chess player, lost to an IBM computer named Deep Blue. Besides capturing the attention of the public, doomsday scenarios depicting a world where robots rule humans took hold of mainstream consciousness.

Deep Blue, however, was not an average opponent. Playing with this program is analogous to a match with a thousand-year-old human that devoted their entire life to continuously playing chess. Accordingly, Deep Blue was skilled in playing a specific game – not in other intellectual pursuits like playing an instrument, writing a book, conducting a scientific experiment, raising a child, or fixing a car.

In no way am I attempting to downplay the achievement of the creation of Deep Blue. Instead, I am simply suggesting that the idea that computers can surpass us in intellectual capability requires careful examination, starting with a breakdown of RL mechanics.

How Reinforcement Learning Works

As mentioned previously, RL is a subset of ML concerned with how intelligent agents should act in
an environment to maximize the notion of cumulative reward.

In plain terms, RL robot agents are trained on a reward and punishment mechanism where they are rewarded for correct moves and punished for the wrong ones. RL Robots don’t “think” about the best actions to make – they just make all the moves possible in order to maximize chances of success.

Drawbacks of Reinforcement Learning

The main drawback of reinforcement learning is the exorbitant amount of resources it requires to achieve its goal. This is illustrated by the success of RL in another game called GO – a popular 2-player game where the goal is to use playing pieces (called stones) to maximize territory on a board while avoiding the loss of stones.

AlphaGo Master, a computer program that defeated human players in Go, required a massive investment that included many engineers, thousands of years worth of game-playing experience, and an astonishing 256 GPUs and 128,000 CPU cores. That’s a lot of energy to use in learning to win a game. This then begs the question of whether it is rational to design AI that cannot think intuitively. Shouldn’t AI research attempt to mimic human intelligence? One argument favoring RL is that we should not expect AI agents to behave like humans, and its use to solve complex problems warrants further development. On the other hand, an argument against RL is that AI research should focus on enabling machines to do things that only humans and animals are presently capable of doing. When viewed in that light, AI’s comparison to human intelligence is appropriate.

Quantum Reinforcement Learning

There’s an emerging field of reinforcement learning that purportedly solves some of
the problems outlined above. Quantum reinforcement learning (QRL) has been studied as a way to speed up calculations.

Primarily, QRL should speed up learning by optimizing the exploration (finding strategies) and exploitation (picking the best strategy) phases. Some of the current applications and proposed quantum calculations improve database search, factoring large numbers into primes, and much more. While QRL still hasn’t arrived in a groundbreaking fashion, there’s an expectation that it may resolve some of the great challenges for regular reinforcement learning.

Business Cases for RL

As I mentioned before, in no way do I want to undermine the importance of RL research and development. In fact, at Oxylabs, we have been working on RL models that will optimize web scraping resource allocation.

With that said, here is just a sample of some real-life uses for RL derived from a McKinsey report highlighting current use cases across a wide range of industries:

  1. Optimizing silicon and chip design, optimizing manufacturing processes, and improving yields for the semiconductor industry
  2. Increasing yields, optimizing logistics to reduce waste and costs, and improving margins in agriculture
  3. Reducing time to market for new systems in the aerospace and defense industries
  4. Optimizing design processes and increasing manufacturing yields for the automotive industries
  5. Increasing revenue through real-time trading and pricing strategies, improving the customer experience, and delivering advanced personalization to clients in financial services
  6. Optimizing mine design, managing power generation and applying holistic logistics scheduling to optimize operations, reduce costs and increase yields in mining
  7. Increasing yields through real- time monitoring and precision drilling, optimizing tanker routing and enabling predictive maintenance to prevent equipment failure and outages in the oil and gas industry
  8. Facilitating drug discovery, optimizing research processes, automating production and optimizing biologic methods for the pharmaceutical industry
  9. Optimizing supply chains, implementing advanced inventory modeling and delivering advanced personalizations for customers in the retail sector
  10. Optimizing and managing networks and applying customer personalization in the telecom industry
  11. Optimizing routing, network planning, warehouse operations in transport and logistics
  12. Extracting data from websites with the use of next-generation proxies

Rethinking Reinforcement Learning

Reinforcement learning may be limited, but it’s hardly overrated. Moreover, as research and development into RL increases, so do potential use cases across almost every sector of the economy. Wide-scale adoption depends on several factors, including optimizing the design of algorithms, configuring learning environments, and the availability of computing power.

Aleksandras Šulženko

Product Owner at Oxylabs.io

Why DEIB is Imperative to Tech’s Future

Hadas Almog from AppsFlyer • 17th March 2025

We’ve been seeing Diversity, Equity, Inclusion, and Belonging (DEIB) initiatives being cut time and time again throughout the tech industry. DEIB dedicated roles have been eliminated, employee resource groups have lost funding, and initiatives once considered crucial have been deprioritised in favour of “more immediate business needs.” The justification for these cuts is often the...

The need to eradicate platform dependence

Sue Azari • 10th March 2025

The advertising industry is undergoing a seismic shift. Connected TV (CTV), Retail Media Networks (RMNs), and omnichannel strategies are rapidly redefining how brands engage with consumers. As digital privacy regulations evolve and platform dynamics shift, advertisers must recognise a fundamental truth. You cannot build a sustainable business on borrowed ground. The recent uncertainty surrounding TikTok...

The need to clean data for effective insight

David Sheldrake • 05th March 2025

There is more data today than ever before. In fact, the total amount of data created, captured, copied, and consumed globally has now reached an incredible 149 zettabytes. The growth of the big mountain is not expected to slow down, either, with it expected to reach almost 400 zettabytes within the next three years. Whilst...

What can be done to democratize VDI?

Dennis Damen • 05th March 2025

Virtual Desktop Infrastructure (VDI) offers businesses enhanced security, scalability, and compliance, yet it remains a niche technology. One of the biggest barriers to widespread adoption is a severe talent gap. Many IT professionals lack hands-on VDI experience, as their careers begin with physical machines and increasingly shift toward cloud-based services. This shortage has created a...

Tech and Business Outlook: US Confident, European Sentiment Mixed

Viva Technology • 11th February 2025

The VivaTech Confidence Barometer, now in its second edition, reveals strong confidence among tech executives regarding the impact of emerging technologies on business competitiveness, particularly AI, which is expected to have the most significant impact in the near future. Surveying tech leaders from Europe and North America, 81% recognize their companies as competitive internationally, with...

How smart labels are transforming supply chains

Sharath Muddaiah • 27th January 2025

As e-commerce continues to rise globally, the impact of just-in-time manufacturing and rising consumer expectations mean the need for real-time visibility has never been greater. Smart labels directly address this demand, offering solutions to long-standing challenges like shipment delays, theft, and the lack of traceability. With the smart label market projected to grow from $14.1...