Word2Vec (4):Pytorch 實作 Word2Vec with Softmax
用 pytorch 實現最簡單版本的 CBOW 與 skipgram,objective function 採用 minimize negative log likelihood with softmax
CBOW
CBOW 的思想是用兩側 context 詞預測中間 center 詞,context 詞有數個,視 window size 大小而定
- $V$: the vocabulary size
- $N$ : the embedding dimension
- $W$: the input side matrix which is $V \times N$
- each row is the $N$ dimension vector
- $\text{v}_{w_i}$ is the representation of the input word $w_i$
- $W’$: the output side matrix which is $N \times V$
- each column is the $N$ dimension vector
- $\text{v}^{‘}_{w_j}$ is the j-th column of the matrix $W’$ representing $w_j$
Condition probability $P(center | context; \theta)$ 中 variable $\textit{center word}$ 有限,所以是個 descrete probability,可以轉化成多分類問題來解
令 $w_O$ 表 center word, $w_I$ 表 input 的 context word,則
- $h$ 表 hidden layer 的輸出,其值為 input context word vector 的平均 $\cfrac{1}{C}(\text{v}_{w_1} + \text{v}_{w_2}+ …+ \text{v}_{w_C})^T$
訓練過程 $\text{maximize log of condition probability } P(w_O|w_I; \theta$
Pytorch CBOW + softmax
CBOW + softmax 模型定義
1 | class CBOWSoftmax(nn.Module): |
syn0 對應到 input 側的 embedding matrix $W$
syn1 對應到 output 側的 embedding matrix $W’$
loss 的計算
$- log \cfrac{\exp(h^\top \text{v}^{‘}_{w_{O}})}{\sum_{w_i \in V} \exp(h^\top \text{v}^{‘}_{w_i})}$
input: context 跟 center 內容都是將 word index 化
因爲 context 是由 windows size N 個 words 組成,所以總共有 N 個 word embedding ,常規操作是 sum or mean
Training Stage
訓練過程省略,有興趣的可以去 github 看 notebook
seed9D/hands-on-machine-learning
取出 Embedding
創建一個衡量 cosine similarity的 class
1 | class CosineSimilarity: |
僅使用 syn0 做為 embedding,記得 L2 norm
1 | syn0 = model.syn0.weight.data |
訓練的 corpus 是聖經,所以簡單看下 jesus 與 christ 兩個 word 的相似詞,效果不予置評
Skipgram
skipgram 的思想是用中心詞 center word 去預測兩側的 context words
- $V$: the vocabulary size
- $N$ : the embedding dimension
- $W$: the input side matrix which is $V \times N$
- each row is the $N$ dimension vector
- $\text{v}_{w_i}$ is the representation of the input word $w_i$
- $W’$: the output side matrix which is $N \times V$
- each column is the $N$ dimension vector
- $\text{v}^{‘}_{w_j}$ is the j-th column of the matrix $W’$ representing $w_j$
令 $w_I$ 表 input 的 center word, $w_{O,j}$ 表 target 的 第 $j$ 個 context word ,則 condition probability
- $h$ 表 hidden layer 的輸出,在 skipgram 實際上就是 $\text{v}_{w_I}$
Skipgram 的 objective function
Pytorch skipgram + softmax
模型
1 | class SkipgramSoftmax(nn.Module): |
- syn0 對應到 input 側的 embedding matrix $W$
- syn1 對應到 output 側的 embedding matrix $W’$
實際上,skipgram 每筆 training data 只需要 (a center word, a context word) 的 pair 即可
所以 loss function 實現上非常簡單
Training Stage
訓練過程省略,有興趣的可以去 github 看 notebook
seed9D/hands-on-machine-learning
Evaluation
取出 embedding,這次 embedding 嘗試 $(W + W’)/2$
1 | syn0 = model.syn0.weight.data |
一樣看 jesus 跟 christ 的相似詞,感覺似乎比 CBOW 好一點
Reference
- https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html
- https://towardsdatascience.com/implementing-word2vec-in-pytorch-skip-gram-model-e6bae040d2fb
- 基于PyTorch实现word2vec模型 https://lonepatient.top/2019/01/18/Pytorch-word2vec.htm
- Rong, X. (2014). word2vec Parameter Learning Explained, 1–21. Retrieved from http://arxiv.org/abs/1411.2738
- https://github.com/FraLotito/pytorch-continuous-bag-of-words/blob/master/cbow.py
Word2Vec (4):Pytorch 實作 Word2Vec with Softmax
https://seed9d.github.io/Pytorch-Implement-Naive-Word2Vec-with-Softmax/