Optimal planning under uncertainty is a challenging problem facing practitioners dealing with real-world domain. MDPs facilitate a mathematical framework for solving these problem, but unfortunately for realistic multi-agent planning, the size of the state space is exponential in the number of agents and researchers quickly realized the limitation of tabular representations and introduced approximations to cope with the complexity and boost the learning speed through generalization. Finding the

Due to their ease of use, theoretical analytical, and empirical results, the linear family of function approximators have been favored in the literature. In this setting, the target function is approximated as the linear combination of a set of feature vectors. Many approaches try to find the right set of features offline, but online methods enjoy the advantage of adaptability to dynamic environment and often the case lower computational complexity. Hence these methods have improved the learning speed of existing reinforcement learning (RL) algorithms in low dimensional domains, yet existing online expansion methods do not scale well to high dimensional problems.

Our research has explored the conjecture that one of the main difficulties limiting this scaling is that features defined over the full-dimensional state space often generalize poorly. Hence, we introduced incremental Feature Dependency Discovery (iFDD) as a computationally-inexpensive method for representational expansion that can be combined with any online, value-based RL method that uses binary features. Unlike other online expansion techniques, iFDD creates new features in low dimensional subspaces of the full state space where the approximation error persist. We provided convergence and computational complexity guarantees for iFDD.

The usability of our approach in large state space such as UAV mission planning was confirmed by comparing the effectiveness of iFDD with Sarsa (green) against representations that

- use only the initial features (blue)
- use the full tabular representation (red)
- use two state-of-the-art representation-expansion methods: adaptive tile coding (ATC) , which cuts the space into finer regions through time (cyan), and sparse distributed memories (SDM) , which creates overlapping sets of regions (orange)

Planning space for both of these domains exceeds hundred million possibilities. Adding useful features, iFDD allowed the agent to learn the task substantially better than the other methods.