Recently, OpenAI has shown that applying spherical perturbation on parameter space provides an alternative for reinforcement learning[1]. On this month, DeepMind and OpenAI, both the xxx in RL, propose method using pertubation on parameter space for exploration[2,3]. Let’s see how this work and how it affect some off-the-shelf RL algorithms


It all starts from OpenAI’s paper using ES(Evolution strategy) to train the network, regardless of RL or SL problem[1]. ES is an optimization technique based on ideas of adaptation and evolution. It provides us a scalable method to train the NN; however, it’s relatively more data-inefficient compared to gradient descent family. (somebody might dislike it)

No matter you’re in which side, you should agree that applying perturbation in parameter space does give us some insight on RL.It’s expected that combining both of them can another powerful algorithms.

Let’s first see some comparisons of RL ans ES:

  RL ES
sparse reward suffer from sparse reward mitigate the issue from sparse reward
scalable A3C(saturated at about 16 cores) easily parallelize
data-efficiency high,(model-free is still low) very low

I emphasize the data-efficiency so as to introduce the following methods: using parameter space pertubation for exploration. I’ll mainly focus on DeepMind’s paper[2], since it’s easy to understand the modification on it(I’ll compare two of them later.)

Reference

[1] Evolution Strategies as a Scalable Alternative to Reinforcement Learning [2] Noisy Networks for Exploration [3] Parameter Space Noise for Exploration