国产精品天干天干,亚洲毛片在线,日韩gay小鲜肉啪啪18禁,女同Gay自慰喷水

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

《注意力矩陣乘法》Attention as matrix multiplication

2023-02-27 20:38 作者:學(xué)的很雜的一個(gè)人  | 我要投稿


來(lái)源:https://e2eml.school/transformers.html#softmax

中英雙語(yǔ)版,由各類翻譯程序和少量自己理解的意思做中文注釋


相關(guān)文章匯總在文集:Transformers from Scratch(中文注釋)

--------------------------------------------------------------------------------------------------------------------


Feature weights could be straightforward to build by counting how often each word pair/next word transition occurs in training, but attention masks are not.?

通過(guò)計(jì)算每個(gè)單詞對(duì)/下一個(gè)單詞轉(zhuǎn)換在訓(xùn)練中發(fā)生的頻率,可以很容易地建立特征權(quán)重,但注意力掩碼不是。

Up to this point, we've pulled the mask vector out of thin air.?

到目前為止,我們已經(jīng)憑空拉出了掩模矢量。

How transformers find the relevant mask matters.?

transformers是如何找到相關(guān)的掩碼。

It would be natural to use some sort of lookup table, but now we are focusing hard on expressing everything as matrix multiplications.?

使用某種查找表是很自然的,但現(xiàn)在我們專注于將所有內(nèi)容表示為矩陣乘法。

We can use the same?lookup?method we introduced above by stacking the mask vectors for every word into a matrix and using the one-hot representation of the most recent word to pull out the relevant mask.

我們可以使用與上面介紹的相同的查找方法,將每個(gè)單詞的掩碼向量堆疊到一個(gè)矩陣中,并使用最新單詞的獨(dú)熱表示來(lái)提取相關(guān)的掩碼。

In the matrix showing the collection of mask vectors, we've only shown the one we're trying to pull out, for clarity.

在顯示掩碼向量集合的矩陣中,為了清楚起見(jiàn),我們只顯示我們?cè)噲D提取的那個(gè)。

We're finally getting to the point where we can start tying into the paper.

我們終于到了可以開(kāi)始進(jìn)入到論文的地步。

This mask lookup is represented by the?QK^T?term in the attention equation.

這種掩碼查找由注意方程中的QK^T項(xiàng)表示。

The query?Q?represents the feature of interest and the matrix?K?represents the collection of masks.

查詢 Q 表示感興趣的特征,矩陣 K 表示掩碼的集合。

Because it's stored with masks in columns, rather than rows, it needs to be transposed (with the?T?operator) before multiplying.

因?yàn)樗怯醚诖a存儲(chǔ)在列中,而不是在行中,所以在乘法之前需要轉(zhuǎn)置(使用 T 運(yùn)算符)。

By the time we're all done, we'll make some important modifications to this, but at this level it captures the concept of a differentiable lookup table that transformers make use of.

當(dāng)我們完成所有操作時(shí),我們將對(duì)此進(jìn)行一些重要的修改,但在此級(jí)別,它捕獲了transformers使用的可微查找表的概念。

《注意力矩陣乘法》Attention as matrix multiplication的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
贺州市| 大同市| 元阳县| 阜城县| 饶河县| 东光县| 石景山区| 深泽县| 化隆| 治多县| 康乐县| 丽江市| 石嘴山市| 红原县| 崇阳县| 桐庐县| 万安县| 雅安市| 遵化市| 厦门市| 泰和县| 汉源县| 昌乐县| 玉山县| 巩义市| 新和县| 黄冈市| 焉耆| 象山县| 盱眙县| 湖口县| 白山市| 平昌县| 定西市| 茌平县| 青铜峡市| 锡林浩特市| 大理市| 苏尼特左旗| 阿城市| 临泽县|