国产精品天干天干,亚洲毛片在线,日韩gay小鲜肉啪啪18禁,女同Gay自慰喷水

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

Reinforcement Learning_Code_Value Function Approximation

2023-04-08 11:30 作者:別叫我小紅  | 我要投稿

Following results and code are the implementation of value function approximation, including Monte Carlo, Sarsa and deep Q-learning, in?Gymnasium's?Cart Pole environment.


RESULTS:

Visualizations of (i)?changes?in?scores, losses and epsilons, and (ii) animation results.

1. Monte Carlo

Fig. 1.1. Changes in scores, losses and epsilons.
Fig.?1.2. Animation results.

2. Sarsa

Original Sarsa, which is exactly what is used here, may has same need as Q-learning, which refers to a resample buffer.?

Because in original implementation of? Sarsa and Q-learning, q-value is updated once an action is taken, which will lead the algorithm to extreme instability.?

So, to get better results, we need to update q-value after a number of steps, which means introducing experience replay.


Fig. 2.1. Changes in scores, losses and epsilons.
Fig. 2.2. Animation results.

3. Deep Q-learning

Here we use experience replay and fixed Q-targets.

Fig. 3.1. Changes in scores, losses and epsilons.
Fig. 3.2. Animation results.


CODE:

NetWork.py


MCAgent.py


SarsaAgent.py


ReplayBuffer.py


DQNAgent.py


train_and_test.py


The above code?are mainly based on rainbow-is-all-you-need[1] and extend?solutions to Monte Carlo and?Sarsa.


Reference

[1] https://github.com/Curt-Park/rainbow-is-all-you-need


Reinforcement Learning_Code_Value Function Approximation的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
浠水县| 塔河县| 日喀则市| 徐水县| 唐河县| 青铜峡市| 刚察县| 金门县| 利川市| 塘沽区| 嵊州市| 安徽省| 河南省| 辉县市| 垦利县| 涡阳县| 濉溪县| 绥化市| 鄂托克前旗| 广丰县| 乌拉特前旗| 建始县| 石狮市| 鄂托克旗| 西青区| 通榆县| 昌江| 六盘水市| 衡山县| 盱眙县| 佛教| 留坝县| 宜昌市| 法库县| 三江| 布拖县| 蚌埠市| 文安县| 竹北市| 三门峡市| 新干县|