散文網(wǎng) » 科技 »學(xué)習(xí) » Reinforcement Learning_Code_Value Function Approximation

Reinforcement Learning_Code_Value Function Approximation

2023-04-08 11:30 作者:別叫我小紅 0人讀過 | 我要投稿

Following results and code are the implementation of value function approximation, including Monte Carlo, Sarsa and deep Q-learning, in?Gymnasium's?Cart Pole environment.

RESULTS:

Visualizations of (i)?changes?in?scores, losses and epsilons, and (ii) animation results.

1. Monte Carlo

Fig. 1.1. Changes in scores, losses and epsilons.

2. Sarsa

Original Sarsa, which is exactly what is used here, may has same need as Q-learning, which refers to a resample buffer.?

Because in original implementation of? Sarsa and Q-learning, q-value is updated once an action is taken, which will lead the algorithm to extreme instability.?

So, to get better results, we need to update q-value after a number of steps, which means introducing experience replay.