国产精品天干天干,亚洲毛片在线,日韩gay小鲜肉啪啪18禁,女同Gay自慰喷水

歡迎光臨散文網(wǎng) 會員登陸 & 注冊

Reinforcement Learning_Code_Temporal Difference Learning_Frozen

2023-04-02 22:56 作者:別叫我小紅  | 我要投稿

Here are some terrible code that has lots of redundancy, is not well object-oriented, and has poor results. Hope I can draw a lesson from them in the future.


RESULTS:

Visualizations?of (i) action value tables and optimal actions,?(ii)?changes in?steps and rewards?with?episodes,?and?(iii) animation results are shown below respectively.

(It should be noticed that, for some mistakes, the animation results may differ from those demonstrated by?the action value tables.)

1. Q-Learning (bootstrap, off-policy)

(1) With Epsilon-greedy Explorer

Fig. 1.(1).1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.

Fig. 1.(1).2. Changes in steps and rewards with episodes.

Fig. 1.(1).3. Animation result?with map_size = 4.

Fig. 1.(1).4. Animation result with map_size = 7.
Fig. 1.(1).5. Animation result with map_size = 9.
Fig. 1.(1).6. Animation result with map_size = 11.

(2) With Random Explorer



Fig. 1.(2).1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.


Fig. 1.(2).2. Changes in steps and rewards with episodes.

From the steps results in Fig. 1.(1). 2,?we can see that the average steps number almost does not?decrease?with the episodes. It may be caused by random explorer, who just chooses a random direction when asked to take an action?and ignores?existing?improvements in target policy.


Fig. 1.(2).3. Animation result with map_size = 11.

2.Sarsa (bootstrap, on-policy)


Fig. 2.1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.?


Fig. 2.2. Changes in steps and rewards with episodes.


Fig. 2.3. Animation result with map_size = 11.

3. Sarsa(%5Clambda%20) (bootstrap, on-policy)

Based on Sarsa, Sarsa(%5Clambda%20) introduces backward view of temporal difference and has an eligibility trace.

Fig. 3.1.? Action value tables and optimal actions with map_size = 4, 7, 9, 11.?

Fig. 3.2. Changes in steps and rewards with episodes.


Fig. 3.3. Animation result with map_size = 11.

4. Monte Carlo (not bootstrap, on-policy)




Fig. 4.1. Action value tables and optimal actions with map_size = 4, 7, 9, 11.?


Fig. 4.2. Changes in steps and rewards with episodes.



Fig. 4.3. Animation result with map_size = 11.

CODES:

FrozenLake_bench.py

Params.py


QLearningLeaner.py


EpsilonGreedyExplorer.py


UniformExplorer.py


SarsaAgent.py


SarsaLambdaAgent.py


MonteCarolAgent.py


Visualization.py


The above codes are based on Gymnasium Documentation's tutorial "Frozenlake benchmark" and expand solutions to Sarsa, Sarsa(%5Clambda%20) and Monte Carlo algorithms.


Reference

[1]?https://gymnasium.farama.org/tutorials/training_agents/FrozenLake_tuto/

Reinforcement Learning_Code_Temporal Difference Learning_Frozen的評論 (共 條)

分享到微博請遵守國家法律
永济市| 鄂伦春自治旗| 阿瓦提县| 武强县| 清徐县| 四平市| 公主岭市| 垦利县| 桂东县| 那曲县| 永新县| 鄱阳县| 佛山市| 江都市| 彭水| 攀枝花市| 谷城县| 仁布县| 开化县| 城口县| 黎川县| 桦川县| 麻江县| 专栏| 华阴市| 晋州市| 柳州市| 宾阳县| 于都县| 肇州县| 蓬溪县| 沐川县| 利辛县| 留坝县| 司法| 安图县| 类乌齐县| 台州市| 金溪县| 广安市| 侯马市|