国产精品天干天干,亚洲毛片在线,日韩gay小鲜肉啪啪18禁,女同Gay自慰喷水

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

Reinforcement Learning_Code_Policy Gradient

2023-04-10 23:35 作者:別叫我小紅  | 我要投稿

Following results and code are the implementation of policy gradient, including REINFORCE, in Gymnasium's Cart Pole environment.

RESULTS:

Visualizations of (i) changes in scores and?losses, and (ii) animation results.

Since REINFROCE makes use of?Monte Carlo estimation, its convergence rate is slow and it does?not converge after 10 thousand steps.

However, it has got a not too bad result and is hopefully to achieve more than 200 points if?more steps are given.

Fig. 1. Changes in scores and?losses.

Fig. 2. Animation results.


CODE:

NetWork.py


REINFORCEAgent.py


train_and_test.py


The above code are mainly based on Chapter 9 of?Hands-on Reinforcement Learning [1] and my previous implementation of value function apporximation with Mente Carlo [2].


Reference

[1]?https://hrl.boyuai.com/

[2]?https://www.bilibili.com/read/cv22924612



Reinforcement Learning_Code_Policy Gradient的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
茌平县| 巴林左旗| 永平县| 宣武区| 望奎县| 澳门| 庆云县| 平和县| 曲松县| 贺州市| 诏安县| 邛崃市| 白玉县| 进贤县| 额敏县| 朝阳区| 嘉荫县| 彭州市| 黄陵县| 吉首市| 宁蒗| 巴楚县| 曲麻莱县| 临潭县| 丁青县| 陇川县| 江孜县| 凤翔县| 上思县| 融水| 勐海县| 张家界市| 益阳市| 宝鸡市| 六枝特区| 泽库县| 梁平县| 乃东县| 英德市| 营山县| 绵竹市|