Word2Vec (1):NLP Language Model

General Form

展開來後 $P(w_1, w_2, w_3,…,w_T) = P(w_1)P(x_2|w_1)P(w_3|w_2, w_1)…P(w_T|w_1,…w_{T-1})$

  • EX

Ngram Model

根據 Markov assumption, word $i$ 只跟其包含 $i$ 的連續 $n$ 個 word 有關, 即前面 $n -1$ 個 word

其中 conditional probability 統計 corpus 內的 word frequency,可以看到 ngram 只跟前面 $n-1$ 個 word 有關

如果 $k=2$ 則稱為 bigram model :

最簡單的實作直接統計 corpus 內的 word frequency 得到 conditional probability:

2 gram model

但通常泛化性不足,所以用 neural network 與 softmax 來泛化 n gram conditional probability $p(w_i \: | \: w_{i-1} , \cdots , w_{i-i+1})$

Neural Network Implementation

In neural network, we achieve the same objective using the softmax layer

$p(w_t \: | \: w_{t-1} , \cdots , w_{t-n+1}) = \dfrac{\text{exp}({h^\top v’_{w_t}})}{\sum_{w_i \in V} \text{exp}({h^\top v’_{w_i}})}$

  • $h$ is the output vector of the penultimate network layer
  • $v^{‘}_{w}$ is the output embedding of word $w$
  • the inner product $h^\top v’_{w_t}$ computes the unnormalized log probability of word $w_t$
  • the denominator normalizes log probability by sum of the log-probabilities of all word in $V$

Implement Ngram model with Pytorch

Creating Corpus and Training Pairs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

test_sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()

trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2]) for i in range(len(test_sentence) - 2)]
vocab = set(test_sentence)
word_to_idx = {word: i for i, word in enumerate(vocab)}

Define N Gram Model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class NGramLanguageModel(nn.Module):
def __init__(self, vocab_size, embedding_dim, context_size):
super(NGramLanguageModel, self).__init__()
self.embeddings = nn.Embedding(vocab_size, embedding_dim)
self.linear1 = nn.Linear(context_size * embedding_dim, 128)
self.linear2 = nn.Linear(128, vocab_size)

def forward(self, inputs):
embeds = self.embeddings(inputs).view(1, -1)
out = F.relu(self.linear1(embeds))

out = self.linear2(out)

log_probs = F.log_softmax(out, dim=1)
return log_probs

Training

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CONTEXT_SIZE = 2
EMBEDDING_DIM = 10
loss_function = nn.NLLLoss()
net = NGramLanguageModel(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)
optimizer = optim.SGD(net.parameters(), lr=0.001)
losses = []
for epoch in range(10):
total_loss = 0
for context, target in trigrams:
context_idxs = torch.tensor([word_to_idx[w] for w in context], dtype=torch.long)

net.zero_grad()

log_probs = net(context_idxs)
loss = loss_function(log_probs, torch.tensor([word_to_idx[target]], dtype=torch.long))
loss.backward()
optimizer.step()
total_loss += loss.data
print("epcoh {} loss {}".format(epoch, total_loss))
losses.append(total_loss)

Fetch Embedding

1
emb = net.embeddings(torch.tensor([i for i in range(len(vocab))])).detach().numpy()

Reference

Word2Vec (1):NLP Language Model

https://seed9d.github.io/NLP-language-model/

Author

seed9D

Posted on

2021-01-24

Updated on

2021-02-10

Licensed under


Comments