Scaling Reinforcement Learning Methods by Incremental Feature Dependency Discovery - Alborz Geramifard

Optimal planning under uncertainty is a challenging problem facing practitioners dealing with real-world domain. MDPs facilitate a mathematical framework for solving these problem, but unfortunately for realistic multi-agent planning, the size of the state space is exponential in the number of agents and researchers quickly realized the limitation of tabular representations and introduced approximations to cope with the complexity and boost the learning speed through generalization. Finding the right representation is a critical milestone to scale the existing MDP solvers to larger domains.

Due to their ease of use, theoretical analytical, and empirical results, the linear family of function approximators have been favored in the literature. In this setting, the target function is approximated as the linear combination of a set of feature vectors. Many approaches try to find the right set of features offline, but online methods enjoy the advantage of adaptability to dynamic environment and often the case lower computational complexity. Hence these methods have improved the learning speed of existing reinforcement learning (RL) algorithms in low dimensional domains, yet existing online expansion methods do not scale well to high dimensional problems.

Our research has explored the conjecture that one of the main difficulties limiting this scaling is that features defined over the full-dimensional state space often generalize poorly. Hence, we introduced incremental Feature Dependency Discovery (iFDD) as a computationally-inexpensive method for representational expansion that can be combined with any online, value-based RL method that uses binary features. Unlike other online expansion techniques, iFDD creates new features in low dimensional subspaces of the full state space where the approximation error persist. We provided convergence and computational complexity guarantees for iFDD.

The usability of our approach in large state space such as UAV mission planning was confirmed by comparing the effectiveness of iFDD with Sarsa (green) against representations that
  • use only the initial features (blue)
  • use the full tabular representation (red)
  • use two state-of-the-art representation-expansion methods: adaptive tile coding (ATC) , which cuts the space into finer regions through time (cyan), and sparse distributed memories (SDM) , which creates overlapping sets of regions (orange)

The above figure shows the first empirical domain, in which the goal is to provide surveillance for two targets using three fuel limited UAVs. Moreover, the task requires at least one UAV to remain in the communication node to rely back the information to the base. Notice how iFDD could obtain much more cumulative reward (Y-axis) as it observed more interaction steps (X-axis).

Above Figure depicts a rescue mission in which a heterogeneous team of UAVs plan to rescue as many people as possible highlighted as positive numbers close to each node. To carry out the rescue at a particular node, the medic UAV should visit the node while the communication UAV is no further than one edge from that node in order to provide satellite information. For nodes with stochastic rescue outcomes, the numbers inside clouds represent the success rate. While the initial representation performed well initiall, applying iFDD roughly doubled the number of survivers. Both ATC and SDM methods led the agent to believe that sending out UAVs is dangerous and should be avoided, hence the zero performance.

Planning space for both of these domains exceeds hundred million possibilities. Adding useful features, iFDD allowed the agent to learn the task substantially better than the other methods.

Related links