reinforcement learning -...
TRANSCRIPT
Reinforcement Learning
LU 1 - Introduction
Dr. Joschka BodeckerAG Maschinelles Lernen und Naturlichsprachliche Systeme
Albert-Ludwigs-Universitat Freiburg
AcknowledgementSlides courtesy of Martin Riedmiller and Martin Lauer
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (1)
Organisational issues
Dr. Joschka BoedeckerRoom 00010, building [email protected] hours: Tuesday 2 - 3 pm
no script - slides available onlinehttp://ml.informatik.uni-freiburg.de/teaching/ws1516/rl
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (2)
Dates winter term 2015/2016
3+1Lecture Monday, 14:00 (c.t.) - 15:30, SR 02-017, building 052Wednesday, 16:00 (s.t) - 17:30, SR 02-017, building 052
Exercise sessions on Wednesday, 16:00 - 17:30, interleaved with lecturestarting at Oct. 28held by Jan Wulfing, [email protected]
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (3)
Goal of this lecture
Introduction of learning problem typeReinforcement LearningIntroduction to the mathematical basicsof an independently learning system.
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (4)
Goal of the 1. unit
Motivation, definition and differentiation
Outline
I Examples
I Solution approaches
I Machine Learning
I Reinforcement Learning
I Overview
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (5)
Example Backgammon
Can a program independently learn Backgammon?
Learning from success (win) andfailure (loss)
Neuro-Backgammon:Playing at world champion level(Tesauro, 1992)
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (6)
Example pole balancing (control engineering)
Can a program independently learn balancing?
Learning from success and failure
Neural RL Controller:Noise, inaccuracies, unknownbehaviour, non-linearities, ...(Riedmiller et.al. )
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (7)
Example robot soccer
Can programs independently learn how to cooperate?
Learning from success and failure
Cooperative RL Agents:Complexity, distributed intelligence, ...(Riedmiller et.al. )
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (8)
Example: Autonomous (e.g. humanoid) robots
Task: Movement control similar to humans(walking, running, playing soccer, cycling, skiing,...)Input: Image from cameraOutput: Control signals to the joints
Problems:
I very complex
I consequences of actions hard to predict
I interference / noise
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (9)
Example: Maze
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (10)
The ’Agent Concept’
[Russell and Norvig 1995,page 33] ”An agent isanything that can beviewed as perceiving itsenvironment throughsensors and acting uponthat environment througheffectors.”
examples:
I a human
I a robot arm
I an autonomous car
I a motor controller
I ...
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (11)
Solution approaches in ’Artificial Intelligence’ (AI)
I Planning / search (e.g. A∗, backtracking)
I Deduction (e.g. logic programming, predicate logic)
I Expert systems (e.g. knowledge generated by experts)
I Fuzzy control systems (fuzzy logic)
I Genetic algorithms (evolution of solutions)
I Machine Learning (e.g. reinforcement learning)
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (12)
Types of learning (in humans)
I Learning from a teacher
I Structuring of objects
I Learning from experience
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (13)
Types of Machine Learning (ML)
I Learning with a teacher. Supervised Learning:Examples of input / (target-)output. Goal: generalization (in general notsimply memorization)
I Structuring / recognition of correlations. Unsupervised learning:Goal: Clustering of similar data points, e.g. for preprocessing.
I Learning through reward / penalty. Reinforcement Learning:Prerequisite: Specification of target goal (or events to be avoided). . . .
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
Types of Machine Learning (ML)
I Learning with a teacher. Supervised Learning:Examples of input / (target-)output. Goal: generalization (in general notsimply memorization)
I Structuring / recognition of correlations. Unsupervised learning:Goal: Clustering of similar data points, e.g. for preprocessing.
I Learning through reward / penalty. Reinforcement Learning:Prerequisite: Specification of target goal (or events to be avoided). . . .
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
Types of Machine Learning (ML)
I Learning with a teacher. Supervised Learning:Examples of input / (target-)output. Goal: generalization (in general notsimply memorization)
I Structuring / recognition of correlations. Unsupervised learning:Goal: Clustering of similar data points, e.g. for preprocessing.
I Learning through reward / penalty. Reinforcement Learning:Prerequisite: Specification of target goal (or events to be avoided). . . .
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
Machine Learning: ’ingredients’
1. Type of the learning problem (given / seeked)
2. Representation of learned solution knowledgetable, rules, linear mapping, neural network, . . .
3. Solution process (observed data 7→ solution)(heuristic) search, gradient descent, optimization technique, . . .
Not at all: ’For this problem I need a neural network’
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (15)
Emphasis of the lecture: Reinforcement Learning
I No information regarding the solution strategy required
I Independent learning of a strategy by smart trial of solutions (’trial anderror’)
I Biggest challenge of a learning system
I Representation of solution knowledge by usage of a function approximator(e.g. tables, linear models, neural networks, etc.)
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (16)
RL using the example of autonomous robots
bad: Damage (fall, ...)good: task done successfullybetter: fast / low energy / smoothmovements /. . .⇒ optimization!
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (17)
Reinforcement Learning (RL)
Also: Learning from evaluations, autonomous learning, neuro dynamicprogramming
I Defines a learning type and not a method!Central feature: Evaluating training signal - e.g. ’good’ / ’bad’
I RL with immediate evaluation:Decision 7→ EvaluationExample: Parameter for a basketball throw
I RL with rewards delayed in timeDecision, decision, . . . , decision → evaluationsubstantially harder; interesting, because of versatile applications
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (18)
Delayed RL
I Decision, decision, . . . , decision → evaluation
I Example: Robotics, control systems, games (chess, backgammon)
I Basic problem: Temporal credit assignment
I Basic architecture: Actor-critic system
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (19)
Multistage decision problems
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (20)
Actor-critic system (Barto, Sutton, 1983)
Actor: In situation s choose action u (strategy π : S → U)Critic: ’Distribution’ of the external signal onto single actions
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (21)
Reinforcement Learning
I 1959 Samuel’s Checker-Player: Temporal difference (TD) methods
I 1968 Michie and Chambers: Boxes
I 1983 Barto, Sutton’s AHC/ACE, 1987 Sutton’s TD(λ)
I Early 90ies: Correlation between dynamic programming (DP) and RL:Werbos, Sutton, Barto, Watkins, Singh, Bertsekas
I DP - classic optimization technique (late 50ies: Bellman)too much effort for large tasksAdvantage: Clean mathematical formulation, convergences
I 2000 Policy Gradient methods (Sutton et. al, Peters et. al, ...)
I 2005 Fitted Q (Batch DP method) (Ernst et. al, Riedmiller, ..)
I many examples of successful, at least practically relevant applications since
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (22)
Other examples
field input goal exampleoutput (actions)
games board situation winning backgammon, chessvalid move
robotics sensor data reference value pendulum, robot soccercontrol variable
sequence state gain assembly line, mobile networkplanning candidate
benchmark state goal position mazedirection
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (23)
Goal: Autonomous learning system
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (24)
Approach - rough outline
I Formulation of the learning problem as an optimization task
I Solution by learning based on the optimization technique of DynamicProgramming
I Difficulties:I very large state spaceI process behaviour unknown
I Application of approximation techniques (e.g. neural networks, ...)
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
Approach - rough outline
I Formulation of the learning problem as an optimization task
I Solution by learning based on the optimization technique of DynamicProgramming
I Difficulties:I very large state spaceI process behaviour unknown
I Application of approximation techniques (e.g. neural networks, ...)
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
Approach - rough outline
I Formulation of the learning problem as an optimization task
I Solution by learning based on the optimization technique of DynamicProgramming
I Difficulties:I very large state spaceI process behaviour unknown
I Application of approximation techniques (e.g. neural networks, ...)
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
Outline of lecture
1. part: Introduction
2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration
3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning
4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning
5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture
1. part: Introduction
2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration
3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning
4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning
5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture
1. part: Introduction
2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration
3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning
4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning
5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture
1. part: Introduction
2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration
3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning
4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning
5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture
1. part: Introduction
2. part: Dynamic ProgrammingMarkov Decision Problems, Backwards DP, Value Iteration, Policy Iteration
3. part: Approximate DP / Reinforcement LearningMonte Carlo methods, stochastic approximation, TD(λ), Q-learning
4. part: Advanced methods of Reinforcement LearningPolicy Gradient methods, hierarchic methods, POMDPs, relationalReinforcement Learning
5. part: Applications of Reinforcement LearningRobot soccer, Pendulum, RL competition
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Further courses on machine learning
I lecture: machine learning (summer term)
I lab course: deep learning (Wed., 10-12)
I Bachelor-/ Master theses, team projects
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (27)
Further readings
D. P. Bertsekas and J.N. Tsitsiklis. Neuro Dynamic Programming. AthenaScientific, Belmont, Massachusetts, 1996.
A. Barto and R. Sutton. Reinforcement Learning. MIT Press, Cambridge,Massachusetts, 1998.
M. Puterman. Markov Decision Processes: Discrete Stochastic DynamicProgramming. John Wiley and Sons, New York, 1994.
L.P. Kaelbling, M.L. Littman and A.W. Moore. Reinforcement Learning: Asurvey. Journal of Artificial Intelligence Research, 4:237-285, 1996
M. Wiering (ed.). Reinforcement learning : state-of-the-Art. Springer, 2012
WWW:
I http://www-all.cs.umass.edu/rlr/
I http://richsutton.com/RL-FAQ.html
Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (28)