Word2Vec (4):Pytorch 實作 Word2Vec with Softmax

用 pytorch 實現最簡單版本的 CBOW 與 skipgram,objective function 採用 minimize negative log likelihood with softmax

CBOW

CBOW 的思想是用兩側 context 詞預測中間 center 詞,context 詞有數個,視 window size 大小而定

  • $V$: the vocabulary size
  • $N$ : the embedding dimension
  • $W$: the input side matrix which is $V \times N$
    • each row is the $N$ dimension vector
    • $\text{v}_{w_i}$ is the representation of the input word $w_i$
  • $W’$: the output side matrix which is $N \times V$
    • each column is the $N$ dimension vector
    • $\text{v}^{‘}_{w_j}$ is the j-th column of the matrix $W’$ representing $w_j$

Condition probability $P(center | context; \theta)$ 中 variable $\textit{center word}$ 有限,所以是個 descrete probability,可以轉化成多分類問題來解

令 $w_O$ 表 center word, $w_I$ 表 input 的 context word,則

  • $h$ 表 hidden layer 的輸出,其值為 input context word vector 的平均 $\cfrac{1}{C}(\text{v}_{w_1} + \text{v}_{w_2}+ …+ \text{v}_{w_C})^T$

訓練過程 $\text{maximize log of condition probability } P(w_O|w_I; \theta$

Pytorch CBOW + softmax

CBOW + softmax 模型定義

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class CBOWSoftmax(nn.Module):
def __init__(self, vocab_size, embedding_dim):
super().__init__()
self.syn0 = nn.Embedding(vocab_size, embedding_dim)
self.syn1 = nn.Linear(embedding_dim, vocab_size)

def forward(self, context, center):
# context: [b_size, windows_size]
# center: [b_size, 1]
embds = self.syn0(context).mean(dim=1) # [b_size, embedding_dim]
out = self.syn1(embds)

log_probs = F.log_softmax(out, dim=1)
loss = F.nll_loss(log_probs, center.view(-1), reduction='mean')
return loss
  • syn0 對應到 input 側的 embedding matrix $W$

  • syn1 對應到 output 側的 embedding matrix $W’$

  • loss 的計算

    $- log \cfrac{\exp(h^\top \text{v}^{‘}_{w_{O}})}{\sum_{w_i \in V} \exp(h^\top \text{v}^{‘}_{w_i})}$

  • input: context 跟 center 內容都是將 word index 化

  • 因爲 context 是由 windows size N 個 words 組成,所以總共有 N 個 word embedding ,常規操作是 sum or mean

Training Stage

訓練過程省略,有興趣的可以去 github 看 notebook

seed9D/hands-on-machine-learning

取出 Embedding

創建一個衡量 cosine similarity的 class

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class CosineSimilarity:
def __init__(self, word_embedding, idx_to_word_dict, word_to_idx_dict):
self.word_embedding = word_embedding # normed already
self.idx_to_word_dict = idx_to_word_dict
self.word_to_idx_dict = word_to_idx_dict

def get_synonym(self, word, topK=10):
idx = self.word_to_idx_dict[word]
embed = self.word_embedding[idx]

cos_similairty = w2v_embedding @ embed

topK_index = np.argsort(-cos_similairty)[:topK]
pairs = []
for i in topK_index:
w = self.idx_to_word_dict[i]
# pairs[w] = cos_similairty[i]
pairs.append((w, cos_similairty[i]))
return pairs

僅使用 syn0 做為 embedding,記得 L2 norm

1
2
3
4
5
6
syn0 = model.syn0.weight.data

w2v_embedding = syn0
w2v_embedding = w2v_embedding.numpy()
l2norm = np.linalg.norm(w2v_embedding, 2, axis=1, keepdims=True)
w2v_embedding = w2v_embedding / l2norm

訓練的 corpus 是聖經,所以簡單看下 jesus 與 christ 兩個 word 的相似詞,效果不予置評

Pytorch%20Implement%20Naive%20Word2Vec%20with%20Softmax%20ae605d15ce0e403694f9d8049c1f2354/Untitled%202.png

Skipgram

skipgram 的思想是用中心詞 center word 去預測兩側的 context words

  • $V$: the vocabulary size
  • $N$ : the embedding dimension
  • $W$: the input side matrix which is $V \times N$
    • each row is the $N$ dimension vector
    • $\text{v}_{w_i}$ is the representation of the input word $w_i$
  • $W’$: the output side matrix which is $N \times V$
    • each column is the $N$ dimension vector
    • $\text{v}^{‘}_{w_j}$ is the j-th column of the matrix $W’$ representing $w_j$

令 $w_I$ 表 input 的 center word, $w_{O,j}$ 表 target 的 第 $j$ 個 context word ,則 condition probability

  • $h$ 表 hidden layer 的輸出,在 skipgram 實際上就是 $\text{v}_{w_I}$

Skipgram 的 objective function

Pytorch skipgram + softmax

模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class SkipgramSoftmax(nn.Module):
def __init__(self, vocab_size, embedding_dim):
super().__init__()
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
self.syn0 = nn.Embedding(vocab_size, embedding_dim) # |V| x |K|
self.syn1 = nn.Linear(embedding_dim, vocab_size) # |K| x |V|

def forward(self, center, context):
# center: [b_size, 1]
# context: [b_size, 1]
embds = self.syn0(center.view(-1))
out = self.syn1(embds)
log_probs = F.log_softmax(out, dim=1)
loss = F.nll_loss(log_probs, context.view(-1), reduction='mean')
return loss
  • syn0 對應到 input 側的 embedding matrix $W$
  • syn1 對應到 output 側的 embedding matrix $W’$

實際上,skipgram 每筆 training data 只需要 (a center word, a context word) 的 pair 即可

所以 loss function 實現上非常簡單

Training Stage

訓練過程省略,有興趣的可以去 github 看 notebook

seed9D/hands-on-machine-learning

Evaluation

取出 embedding,這次 embedding 嘗試 $(W + W’)/2$

1
2
3
4
5
6
7
syn0 = model.syn0.weight.data
syn1 = model.syn1.weight.data

w2v_embedding = (syn0 + syn1) / 2
w2v_embedding = w2v_embedding.numpy()
l2norm = np.linalg.norm(w2v_embedding, 2, axis=1, keepdims=True)
w2v_embedding = w2v_embedding / l2norm

一樣看 jesus 跟 christ 的相似詞,感覺似乎比 CBOW 好一點

Reference

Word2Vec (4):Pytorch 實作 Word2Vec with Softmax

https://seed9d.github.io/Pytorch-Implement-Naive-Word2Vec-with-Softmax/

Author

seed9D

Posted on

2021-01-31

Updated on

2021-02-10

Licensed under


Comments