實作 wide & deep 從訓練到推薦排序

Memorization & Generalization

在推薦系統中,如果要選第一個深度排序模型從傳統機器學習接軌到 DNN,那首推 google 在 2016 年提出的 Wide & Deep ,Wide & Deep 名稱來自其由一個 shallow 的 model 與一個 deep 的 network 組合而成。

在進入到 DNN 前,我們手上肯定有個正在線上運行的傳統排序模型 ex: FM, GBDT …,還有積累了很久的有效特徵組合,如果直接進入 DNN,這些經驗和積累打水漂不說,還得花很長的時間 tune 模型。

有了 Wide & Deep,我們只需將正在 serving 的那個排序模型的特徵放在 wide side 使用; 另外在建一個 deep side 的深度模型,兩個模型 joint training 即可無縫接軌到 DNN。

那 wide 跟 deep network 分別代表什麼呢 ?

Google 在 2016 的論文中總結出:Wide 負責 memorization , Deep 負責 Generalization , Wide & Deep 是 Memorization & Generalization 的體現,實際上也是推薦領域中經典的 Exploitation & Exploration 問題。

Wide

Wide side 利用 LR + cross product 構造非線性特徵。傳統淺層的模型,對於曾經出現過的 feature pairs 有很強的記憶性,很適合用來 exploit 已有的訊息。

ex: 用戶安裝了 app A ,此時曝光 app B,用戶安裝的可能性很大,wide 網路捕捉了 install A 跟 impression B 的關係,加以利用。

Deep

Deep side 利用 neural network 的 features extraction 的特性,自動找出 feature 之間的交叉關係,就算在訓練樣本中不曾出現過或者稀少的 pair,也能學出低維 embedding,達成 generalization,屬於 “exploration”。

如何處理 sparse id features?

對於高維且稀疏的 id 特徵,deep side 採用 embedding 技術,將高維稀疏的 id 轉換成低維稠密的 embedding 輸入 feedforward dense layer

Implement Wide & Deep by Pytorch

dataset 用 movielen ml-1m ,可以在以下連結找到

MovieLens

在推薦領域中,原則上所有的特徵會被 id 化,連續特徵會被分桶 label 化,MovieLens dataset 中的特徵恰好都已經分桶過了,只需要對特徵給予 label 即可。

wide 特徵處理

CrossFeatures >folded
1
2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class CrossFeatures(BaseEstimator, TransformerMixin):
def __init__(self, cross_col_pairs: List[Tuple[str, str]]):
self.cross_col_pairs = cross_col_pairs

def fit(self, df: pd.DataFrame, y=None):
self.unique_columns_ = set()
for pair in self.cross_col_pairs:
self.unique_columns_.update(list(pair))

self.crossed_colnamed_ = []

for cols in self.cross_col_pairs:
cols = list(cols)
new_colname = "_".join(cols)
self.crossed_colnamed_.append(new_colname)
return self

def transform(self, df: pd.DataFrame):
df_cross = df[self.unique_columns_].copy().astype(str)

for cols in self.cross_col_pairs:
cols = list(cols)
new_colname = "_".join(cols)
df_cross[new_colname] = df_cross[cols[0]] + \
'-' + df_cross[cols[1]]
return df_cross[self.crossed_colnamed_]

  • class CrossFeatures 將兩個 features 做交叉,即共現的 AND 關係
1
2
3
4
wide_cols = ['gender', 'age', 'occupation', 'zipCode']
crossed_cols = [('gender', 'age'), ('gender', 'occupation'), ('age', 'occupation')]
wideGenerator = WideFeaturesGenerator(wide_cols, crossed_cols)
x_wide = wideGenerator.fit_transform(X)
  • 我們想捕捉 “gender AND age”, “gender AND occupation”, “age AND occupation” 的關係
  • wide 側不放 continuous 特徵

看看維度 x_wide 的 dimension:

1
2
3
4
In:
print(x_wide.shape)
Out:
(1000209, 7)
  • wide 側總共產生 7 個特徵。 符合預期,3 組交叉特徵,4個一階特徵

但實際上,x_wide 的特徵是 label 化的,每一維僅存 label 值,看一下全局 label 值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
In:
print(wideGenerator.feature_dict_)
Out:
{'gender_F': 1,
'gender_M': 2,
'age_1': 3,
'age_56': 4,
'age_25': 5,
'age_50': 6,
'age_18': 7,
'age_45': 8,
'age_35': 9,
'occupation_10': 10,
'occupation_16': 11,
'occupation_12': 12,
'occupation_7': 13,
'occupation_1': 14,

.....
  • 可以看出,wide 側的被全局 label 化了

實際的 dimension 為

1
2
3
4
In:
print(len(wideGenerator.feature_dict_))
Out:
3659

deep 特徵處理

LabelEncoder >folded
1
2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class LabelEncoder(BaseEstimator, TransformerMixin):
def __init__(self, columns_to_encode: List[str]):
self.columns_to_encode = columns_to_encode

def fit(self, df: pd.DataFrame, y=None):
df_ = df[self.columns_to_encode].copy()

for col in self.columns_to_encode:
df_[self.columns_to_encode] = df[self.columns_to_encode].astype(
'str')

unique_column_vals = {col: df_[col].unique()
for col in self.columns_to_encode}

self.encoding_dict_ = dict()
for k, v in unique_column_vals.items():
self.encoding_dict_[k] = {val: idx for idx, val in enumerate(v)}
self.encoding_dict_[k]['unseen'] = len(self.encoding_dict_[k])

return self

def transform(self, df: pd.DataFrame):
try:
self.encoding_dict_
except AttributeError:
raise NotFittedError(
"This LabelEncoder instance is not fitted yet. "
"Call 'fit' with appropriate arguments before using this LabelEncoder."
)
df_ = df.copy()
df_[self.columns_to_encode] = df_[self.columns_to_encode].astype('str')

for col, encoding_map in self.encoding_dict_.items():
original_value = [f for f in encoding_map.keys() if f != 'unseen']
df_[col] = np.where(df_[col].isin(
original_value), df_[col], 'unseen')
df_[col] = df_[col].apply(lambda x: encoding_map[x])
return df_


  • class LabelEncoder 會將 id 類特徵重新編碼,給予一個連續編號

首先,觀察一下每個 id 類特徵的 distinct values

1
2
3
4
5
6
7
8
9
10
In:
for col in X:
print(col, len(X[col].unique()))
Out:
userId 6040
movieId 3706
gender 2
age 7
occupation 21
zipCode 3439

這讓我們大概有個底,知道每個 id 特徵要 embedding 到多少 ,下面是每個 id 類特徵 embedding 的 dimension:

1
2
3
4
5
6
7
8
9
10
category_embed_dim_mapping = {
'userId': 50,
'movieId': 50,
'gender': 2,
'age':2,
'occupation': 5,
'zipCode': 20
}
category_cols = list(category_embed_dim_mapping.keys())
continuous_cols = []
  • 在推薦領域,通常連續特徵都可以被轉化成 id 類特徵 (分桶 + label),所以 continuous_cols 為空很正常

將原始特徵轉換成 deep side 的特徵

1
2
3
deep_generator = DeepFeaturesGenerator(category_cols, continuous_cols)
df_deep = X[category_cols + continuous_cols].copy()
x_deep = deep_generator.fit_transform(X)

看看 x_deep 的 dimension

1
2
3
4
5
In:
print(x_deep.shape)
Out:
(1000209, 6)

  • 符合預期,總共 6 個特徵,都是 label 化後的 id 特徵

來看看每個特徵的實際維度 (onehot encoding 展開後)

1
2
3
4
5
6
7
8
9
In:
print(deep_generator.embed_cols_unique_labels_)
Out:
{'userId': 6041,
'movieId': 3707,
'gender': 3,
'age': 8,
'occupation': 22,
'zipCode': 3440}
  • 利用 label 化來代替 onehot encoding,可以減少大量儲存空間

Wide & Deep Model Graph

Wide Model

wide model 很簡單,就是簡單的淺層 feedforward network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Wide(nn.Module):
def __init__(self, wide_dim: int, predict_dim: int=1):
super().__init__()
self.linear = nn.Embedding(wide_dim + 1, predict_dim, padding_idx=0) # reserve 1 dim for unseen cross feature
self.bias = nn.Parameter(torch.zeros(predict_dim))
self._reset_parameters()

def _reset_parameters(self):
nn.init.kaiming_normal_(self.linear.weight, a=math.sqrt(5))
fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.linear.weight)
bound = 1 / math.sqrt(fan_in)
nn.init.uniform_(self.bias, -bound, bound)

def forward(self, X: torch.Tensor):

# X [b_size, num_of_wide_features]
return self.linear(X.long()).sum(dim=1) + self.bias # [b_size, predict_dim]

Deep Model

deep network 比較複雜,對於每個 id 類特徵,都會保存專屬的 nn.Embedding , 將高維希疏向量轉變成低維稠密 embeddings

Deep >folded
1
2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class Deep(nn.Module):
EMBEDDING_LAYER_PREFIX = "emb_layer"
DENSE_LAYER_PREFIX = "dense_layer"

def __init__(self,
columns_index: Dict[str, int],
embed_cols_info: List[Tuple[str, int, int]], # (col_name, label_size, embeding_dim)
continuous_cols: List[str],
hidden_layer_neural: List[int],
hidden_layer_dropout: List[float],
embed_col_dropout: float=0.0
):
super().__init__()
self.columns_index = columns_index
self.embed_cols_info = embed_cols_info
self.continuous_cols = continuous_cols

self.embed_layers = self._create_embed_layers(embed_cols_info)
self.embed_dropout_layer = nn.Dropout(embed_col_dropout)

self.hidden_layer_neural = self._update_hidden_layer_neural(
hidden_layer_neural)

self.dense_layer = self._create_dense_layer(hidden_layer_dropout)
self.output_dim = hidden_layer_neural[-1]

def _create_embed_layers(self, embed_cols_info: List[Tuple[str, int, int]]):
return nn.ModuleDict({self.EMBEDDING_LAYER_PREFIX + '_' + col_name.replace(".", '_'): nn.Embedding(num_label, dim) for col_name, num_label, dim in embed_cols_info})

def _create_dense_layer(self, hidden_layer_dropout):
dense_dequential = nn.Sequential()
for i in range(1, len(self.hidden_layer_neural)):
dense_dequential.add_module(
"{}_{}".format(self.DENSE_LAYER_PREFIX, i - 1),
self._create_dense_component(
self.hidden_layer_neural[i-1], self.hidden_layer_neural[i], hidden_layer_dropout[i-1], True)
)
return dense_dequential

def _update_hidden_layer_neural(self, hidden_layer_neurals: List[int]):
embed_dim = sum([embed[2] for embed in self.embed_cols_info])
continuous_dim = len(self.continuous_cols)
return [embed_dim + continuous_dim] + hidden_layer_neurals

def _create_dense_component(self, input_dim: int, output_dim: int, dropout_ratio: float=0.0, batch_norm=False):
layers = [
nn.Linear(input_dim, output_dim),
nn.LeakyReLU(inplace=True)
]
if batch_norm:
layers.append(nn.BatchNorm1d(output_dim))
layers.append(nn.Dropout(dropout_ratio))
return nn.Sequential(*layers)

def __get_embeding_layer(self, embed_col):
embed_col = self.EMBEDDING_LAYER_PREFIX + '_' + embed_col.replace('.', '_')
return self.embed_layers[embed_col]

def forward(self, deep_input_x: torch.Tensor):
embed_x = [
self.__get_embeding_layer(col)(deep_input_x[:, self.columns_index[col]].long())
for col, _, _ in self.embed_cols_info
]
embed_x = torch.cat(embed_x, 1)
continuous_cols_idx = [self.columns_index[col]
for col in self.continuous_cols]
continuous_x = deep_input_x[:, continuous_cols_idx].float()

x = torch.cat([embed_x, continuous_x], dim=1)
return self.dense_layer(x) # [b_size, hidden_layer_last_dim]

  • self.embed_layers 存放每個 id 特徵的 embeddings 向量
  • 所有的 id embedding 會在 dim=1 concatenate

Wide & Deep

建一個 nn.Module 管理 Wide and Deep

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class WideDeep(nn.Module):
def __init__(self, wide: nn.Module, deep: nn.Module):
super().__init__()

deep = nn.Sequential(
deep,
nn.Linear(deep.output_dim, 1)
)
self.wide_deep = nn.ModuleDict({
"wide": wide,
"deep": deep
})

def forward(self, x_wide: torch.Tensor, x_deep: torch.Tensor):
wide_out = self.wide_deep['wide'](x_wide) # [b_size, 1]
deep_out = self.wide_deep['deep'](x_deep) # [b_size, 1]
out = wide_out + deep_out # # [b_size, 1]
return out.view(-1) # [b_size]

@torch.no_grad()
def predict(self, x_wide: torch.Tensor, x_deep: torch.Tensor, threshold: int=0.5):
logistic = self.predict_probs(x_wide, x_deep)
return (logistic > threshold).int()

@torch.no_grad()
def predict_probs(self, x_wide: torch.Tensor, x_deep: torch.Tensor):
out = self.forward(x_wide, x_deep)
return torch.sigmoid(out.view(-1).float())
  • wide output 跟 deep output 結合方式如下

    $\text{w}^T_{wide}[x, \phi(x)] + \text{w}^T_{deep}a^{(l_f)} +b$

    • $\text{w}^T_{deep}$ 將 deep network output 轉換成 1 維的 vector
    • $a^{(l_f)}$ 為 deep network 最後一層輸出

Building Model

實例化 wide deep 模型

1
2
3
4
5
6
7
8
embed_cols_info = [(col, deep_generator.embed_cols_unique_labels_[col], embed_dim)for col, embed_dim in category_embed_dim_mapping.items()]
deep_column_idx = {col: i for i, col in enumerate(deep_generator.deep_cols)}
hidden_layers = [ 512, 256]
drop_out = [0.25, 0.2]

wide = Wide(wide_dim=np.unique(x_wide).shape[0], predict_dim=1)
deep = Deep(deep_column_idx, embed_cols_info, continuous_cols, hidden_layers, drop_out, 0.2)
wide_deep = WideDeep(wide, deep)

看看 embed_cols_info ,存放格式 ('columns_name', 'input_dims', 'embedding_dims')

1
2
3
4
5
6
[('userId', 6041, 50),
('movieId', 3707, 50),
('gender', 3, 2),
('age', 8, 2),
('occupation', 22, 5),
('zipCode', 3440, 20)]
  • userId onehot encoding 後會有 6041 維,但 embedding 後降到 100 維

Training Stage

原汁原味的 Wide & Deep ,wide 側使用 FTRL + L1 優化得到 sparse 效果 ; deep 側使用 adagrad optimizer。

但我們現在所用的 wide side 的 features 沒大到能體現 FTRL 的稀疏效果,所以 wide side and deep side 皆使用 adam optimizer

adam 訓練過程,請參閱 :

seed9D/hands-on-machine-learning

Online Ranking

wide & deep 在推薦系統中做為排序模型使用,對一批召回物品打分決定順序。 參閱 實時推薦策略流程

通常訓練模型的平台跟模型 serving 的平台是不同語言開發的,例如用 python / spark 訓練模型, serving 用 java ,造成我們很難把模型上線。尤其是複雜的深度模型,要上線 serving 需要有強大的工程支援。

常見的思路是把模型的 embedding 跟 multi layer perceptron 分離,embedding 存放在 DB 透過 index 取回,線上只需運算簡單的 MLP,有效減少線上計算量。

以 google 的 wide & deep 為例紅框內為線上 inference 部分:

Wide%20&%20Deep%20pytorch%20%E5%AF%A6%E7%8F%BE%20c1021d1986c94b39855460ce279d378d/Untitled%201.png

線上 embedding 跟 feedforward network 分離的好處就是 embedding 的訓練可以很複雜,inference 時只需訓練完的 embedding 作為輸入,模型即可運作。

演示的關係,下面都用 python 試做 online ranking,但要應用在生產得用另外語言自行開發。

在 online serving 階段,我們會需要三個 component

  1. feature provider
    • 提供 user or item 的 特徵
  2. embedding provider
    • 提供訓練好的 id embedding,抽象成度夠高的話,算法工程師是不需要關心背後的儲存源的
  3. ranker
    • 對召回物品打分

Feature Provider

feature provider 負責提供所有模型所需的原始特徵,取回特定特徵後得自行加工符合模型 input

1
2
3
4
5
6
7
8
class FeatureProvider:
def __init__(self, df: pd.DataFrame, index_col_name):
self.df = df.copy()
self.index_col_name = index_col_name

def query_features(self, index: List[int], cols: List[str])-> pd.DataFrame:
df = self.df
return df[df[self.index_col_name].isin(index)][cols].copy()

我們產生 user_feature_provider 與 item_feature_provider 分別提供 user features 和 item features

1
2
3
4
5
6
7
8
user_df = pd.read_csv(user_data_path, sep="::", header=None, engine="python", names=user_columns)
user_df = user_df.reset_index().rename(columns={'index': 'userId'})

item_df = pd.read_csv(movie_data_path, sep="::", header=None, engine="python", names=movie_columns)
item_df = item_df.reset_index().rename(columns={'index': 'movieId'})

user_feature_provider = FeatureProvider(user_df, 'userId')
item_feature_provider = FeatureProvider(item_df, 'movieId')

試試取回 userId = [5, 10] 的 age 跟 gender

1
2
3
4
5
6
In:
user_feature_provider.query_features(index=[5, 10], cols=['age', 'gender'])
Out:
age gender
25 M
35 F

Embedding Provider

embedding provider 提供已經訓練完的 id embedding

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class EmbeddingProvider:
def __init__(self, embedding_dict: nn.ModuleDict, prefix='emb_layer'):
self.prefix= prefix
self.embedding_dict = embedding_dict

def query_embedding(self, batch_labels:np.array, label_order:List[str])-> torch.Tensor:
# batch_label: [b_size, num of labels]
label_order = [self.prefix + '_' + str(label) for label in label_order]

batch_labels = torch.from_numpy(batch_labels).long()
embed_X = [
self.embedding_dict[label](batch_labels[:, idx]) for idx, label in enumerate(label_order)
]
return torch.cat(embed_X, 1) # [b_size, sum of all embedding dims]

我們將wide &deep 訓練完的 embedding 取出放進去,這一步模擬線上 embedding provider 的初始化

1
embedding_provider = EmbeddingProvider(wide_deep.wide_deep['deep'][0].embed_layers, 'emb_layer')

print 看看有哪些 id embedding可用 :

1
2
3
4
5
6
7
8
9
10
11
In:
print(embedding_provider.embedding_dict)
Out:
ModuleDict(
(emb_layer_age): Embedding(8, 2)
(emb_layer_gender): Embedding(3, 2)
(emb_layer_movieId): Embedding(3707, 50)
(emb_layer_occupation): Embedding(22, 5)
(emb_layer_userId): Embedding(6041, 50)
(emb_layer_zipCode): Embedding(3440, 20)
)
  • 有 age , gender movieId, occupation , userId, zipCode 的 embedding

Ranker

ranker 對召回物品打分,在這裡我們的 ranker 實作 wide & deep inference,更嚴格來說是 wide & deep 的 MLP 部分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class WideDeepOnlineRanker(nn.Module):
def __init__(self, offlineWideDeep: nn.Module):
super().__init__()
self.deep_dense = self._fetch_deep_dense(offlineWideDeep)
self.wide_part = offlineWideDeep.wide_deep['wide']

def _fetch_deep_dense(self, offlineWideDeep):
deep_dense = []
for dense_layer in offlineWideDeep.wide_deep['deep'][0].dense_layer:
new_layer = nn.Sequential(*[d for d in dense_layer if not isinstance(d, nn.Dropout)])
deep_dense.append(new_layer)
deep_dense.append(offlineWideDeep.wide_deep['deep'][1])

return nn.Sequential(*deep_dense)


def forward(self, wide_x: torch.Tensor, deep_x_embedding: torch.Tensor):
wide_output = self.wide_part(wide_x.long())
deep_output = self.deep_dense(deep_x_embedding)
return (wide_output + deep_output).view(-1)

@torch.no_grad()
def scoring(self, wide_x: torch.Tensor, deep_x_embedding: torch.Tensor):
self.eval()
return self.forward(wide_x, deep_x_embedding).view(-1).float()

@torch.no_grad()
def predict_probs(self, wide_x: torch.Tensor, deep_x_embedding: torch.Tensor):
self.eval()
out = self.forward(wide_x, deep_x_embedding)
return torch.sigmoid(out.view(-1)).float()
  • ranker 在生產環境中開發比較複雜的,演示關係這邊直接用 pytorch nn.Module 模擬 ranker
  • 只需要知道 ranker 的負責打分即可,內部實現因人而異

初始化 ranker,將離線訓練好的 wide & deep 輸入。

1
ranker = WideDeepOnlineRanker(wide_deep)

Recommendation Process

有了 feature provider, embedding provider, ranker 三大 component,我們就可以對召回物品打分排序了

模擬 user_id=100 與 30個召回商品:

1
2
3
4
5
6
7
8
In:
user_id = 100
match_itemIds = np.random.choice(item_df['movieId'], 30)
print(match_itemIds)
Out:
array([1866, 2421, 2869, 1217, 3535, 2535, 1542, 3569, 1884, 644, 1784,
3506, 3907, 1522, 803, 3148, 1715, 2055, 3442, 296, 3588, 3871,
1785, 628, 1527, 3401, 1921, 3449, 438, 842])

分別取回 user_id, 與 match_itemIds 模型所需特徵

1
2
3
4
5
query_user_features_cols = ['userId', 'gender', 'age', 'occupation', 'zipCode']
query_item_features_cols = ['movieId']

user_primitive_feature = user_feature_provider.query_features([user_id], query_user_features_cols)
item_primitive_feature = item_feature_provider.query_features(list(match_itemIds), query_item_features_cols)

print 數個 item_primitive_feature 看看

1
2
3
4
5
6
7
8
9
10
11
	  movieId
293 296
434 438
623 628
639 644
793 803
831 842
1199 1217
1486 1522
1491 1527
1503 1542
  • 第一個 columns 是 dataframe 本身的 index…
  • 可以看到 item 相關特徵只有用到 movieId ….,事實上 movieLen 中還有 title, genre 可以用
    • title 特徵可以先過 word2vector 做 title embedding
    • genre 是 multi-label 的特徵,本質上是 multi-hot encoding ,涉及不定長度的 tensor input,處理起來稍微麻煩。 multi-label 的特徵在推薦系統中還滿常見的,演示的關係就 let it go 了

接下來要將原始特徵轉換成 wide & deep 的 input,線上線下的特徵處理務求一致。演示關係,這邊假設我們已經 implement 好 wide feature generator 和 deep feature generator

1
2
3
4
5
6
7
user_primitive_feature['join'] = 1
item_primitive_feature['join'] = 1
primitive_features = user_primitive_feature.merge(item_primitive_feature, on = ['join']).drop('join', axis=1)
processed_wide_features = wideGenerator.transform(primitive_features)
processed_deep_features = deep_generator.transform(primitive_features)

processed_wide_features = torch.from_numpy(processed_wide_features).long()

print 看看 processed_wide_features 和 processed_deep_features 的維度

1
2
3
4
In:
processed_wide_features.shape, processed_deep_features.shape
Out:
((30, 7), (30, 6))
  • wide 特徵用到 7 個; deep 特徵用到 6 個

processed_deep_features 的 6 個特徵都是 embedding 特徵,我們透過 embedding provider 取回各自的 embedding

1
2
embedding_label_order = ['userId', 'movieId', 'gender', 'age', 'occupation', 'zipCode']
embedding_features = embedding_provider.query_embedding(processed_deep_features, embedding_label_order)

print embedding_features dimension

1
2
3
4
5
In:
print(embedding_features.shape)

Out:
torch.Size([30, 129])
  • embedding provider 內部還貼心 concatenate 好了。
    P.S. concatenate 功能應該要拆分出來,這不屬於 embedding provider 的職責

經過前面一大段特徵處理,終於可以送進 ranker 打分了

1
scores = ranker.predict_probs(processed_wide_features, embedding_features)

將 score 由高到低排序並 print 出對應的 movieId 和 movie title

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
In:
sorted_index = np.argsort(scores.detach().numpy(), axis=0)[::-1]
for idx in sorted_index:
movieId = match_itemIds[idx]
title = item_feature_provider.query_features([movieId], ['title'])['title'].item()
print("movieId:{} \t '{}' \t score:{}".format(movieId, title, round(scores[idx].item(), 3)))
Out:
movieId:1866 'Big Hit, The (1998)' score:0.77
movieId:1542 'Brassed Off (1996)' score:0.702
movieId:2869 'Separation, The (La S�paration) (1994)' score:0.553
movieId:3506 'North Dallas Forty (1979)' score:0.546
movieId:3588 'King of Marvin Gardens, The (1972)' score:0.473
movieId:438 'Cowboy Way, The (1994)' score:0.456
movieId:1921 'Pi (1998)' score:0.453
movieId:3148 'Cider House Rules, The (1999)' score:0.385
movieId:1527 'Fifth Element, The (1997)' score:0.367
movieId:803 'Walking and Talking (1996)' score:0.34
movieId:1884 'Fear and Loathing in Las Vegas (1998)' score:0.315
movieId:3535 'American Psycho (2000)' score:0.315
movieId:296 'Pulp Fiction (1994)' score:0.313
movieId:842 'Tales from the Crypt Presents: Bordello of Blood (1996)' score:0.289
movieId:3401 'Baby... Secret of the Lost Legend (1985)' score:0.286
movieId:644 'Happy Weekend (1996)' score:0.268
movieId:3907 'Prince of Central Park, The (1999)' score:0.259
movieId:3449 'Good Mother, The (1988)' score:0.226
movieId:1784 'As Good As It Gets (1997)' score:0.162
movieId:1522 'Ripe (1996)' score:0.115
movieId:3569 'Idiots, The (Idioterne) (1998)' score:0.09
movieId:628 'Primal Fear (1996)' score:0.09
movieId:2535 'Earthquake (1974)' score:0.072
movieId:1217 'Ran (1985)' score:0.07
movieId:3442 'Band of the Hand (1986)' score:0.063
movieId:2421 'Karate Kid, Part II, The (1986)' score:0.063
movieId:2055 'Hot Lead and Cold Feet (1978)' score:0.062
movieId:1715 'Office Killer (1997)' score:0.055
movieId:1785 'King of New York (1990)' score:0.034
movieId:3871 'Shane (1953)' score:0.02

到此,我們完成了 online ranking 流程

Last but not Least

在 google 論文中為了讓 wide side 的 features weights 非常稀疏, optimizer 採用 FTRL with L1 regularization,此篇實作中的 features 展開來也才一萬出頭,所以使用 FTRL 優化效果不好。

在工業界排序模型中 wide 側的交叉特徵通常是上百萬甚至上千萬級別的 id cross features,這麼龐大的 wide side weight 對線上儲存及計算造成很大的壓力,這也是為什麼 google 選用 FTRL with L1 regularization 讓 weights 稀疏的原因,因為weight 稀疏意味著特徵選擇(選擇非 0 的 weight 特徵),有著減少線上查詢量和計算量的效果。

Reference

實作 wide & deep 從訓練到推薦排序

https://seed9d.github.io/wide-deep-in-recommendation/

Author

seed9D

Posted on

2021-02-20

Updated on

2021-02-21

Licensed under


Comments