Temporal difference learning is a widely-used algorithm to estimate the value function of an MDP under a given policy. Here, we consider TD learning with linear function approximation and a constant learning rate, and obtain bounds on its finite-time performance. Motivated by these bounds, we will present a heuristic to adapt the learning rate to achieve fast convergence. Joint work with Lei Ying and Harsh Gupta.
R. Srikant is the Fredric G. and Elizabeth H. Nearing Endowed Professor of Electrical and Computer Engineering and the Coordinated Science Lab at the University of Illinois at Urbana-Champaign. His research interests are in the areas of applied probability, stochastic networks, and control theory, with applications to machine learning, cloud computing, and communication networks. He is the recipient of the 2019 IEEE Koji Kobayashi Computers and Communications Award and the 2015 IEEE INFOCOM Achievement Award. He has also received several Best Paper awards, including the 2017 Applied Probability Society Best Publication Award, the 2015 IEEE INFOCOM Best Paper Award and the 2015 WiOpt Best Paper Award. He was the Editor-in-Chief of the IEEE/ACM Transactions on Networking from 2013-2017.
~ Seminars are open to the public. We hope you can join us! ~
Wednesday, October 30, 2019 at 2:00pm
Fascitelli Advanced Center for Engineering, 010C 2 East Alumni Avenue
Login to interact with events, personalize your calendar, and get recommendations.