Optimal Cross-layer Wireless Control Policies using TD-Learning

Sean Meyn, Wei Chen and Daniel O'Neill

These plots illustrate the convexity, symmetry and non-increasing properties of L* for both the single and multiple flows cases.

Quadratic cost


The distance between the relative value function from VIA and the approximation by the proposed basis. The small error demonstrates the effectiveness of the basis.

VIA convergence


The corresponding control policies. As anticipated, the policy is zero when the state is large.


Abstract: We present an on-line crosslayer control technique to characterize and approximate optimal policies for wireless networks. Our approach combines network utility maximization and adaptive modulation over an infinite discrete-time horizon using a class of performance measures we call time smoothed utility functions. We model the system as an average-cost Markov decision problem. Model approximations are used to find suitable basis functions for application of least squares TD-learning techniques. The approach yields network control policies that learn the underlying characteristics of the random wireless channel and that approximately optimize network performance.

See also,

Chapter 11 of CTCN, and
