Memorization & Generalization 在推薦系統中,如果要選第一個深度排序模型從傳統機器學習接軌到 DNN,那首推 google 在 2016 年提出的 Wide & Deep ,Wide & Deep 名稱來自其由一個 shallow 的 model 與一個 deep 的 network 組合而成。
在進入到 DNN 前,我們手上肯定有個正在線上運行的傳統排序模型 ex: FM, GBDT …,還有積累了很久的有效特徵組合,如果直接進入 DNN,這些經驗和積累打水漂不說,還得花很長的時間 tune 模型。
有了 Wide & Deep,我們只需將正在 serving 的那個排序模型的特徵放在 wide side 使用; 另外在建一個 deep side 的深度模型,兩個模型 joint training 即可無縫接軌到 DNN。
那 wide 跟 deep network 分別代表什麼呢 ?
Google 在 2016 的論文中總結出:Wide 負責 memorization , Deep 負責 Generalization , Wide & Deep 是 Memorization & Generalization 的體現,實際上也是推薦領域中經典的 Exploitation & Exploration 問題。
Wide Wide side 利用 LR + cross product 構造非線性特徵。傳統淺層的模型,對於曾經出現過的 feature pairs 有很強的記憶性,很適合用來 exploit 已有的訊息。
ex: 用戶安裝了 app A ,此時曝光 app B,用戶安裝的可能性很大,wide 網路捕捉了 install A 跟 impression B 的關係,加以利用。
Deep Deep side 利用 neural network 的 features extraction 的特性,自動找出 feature 之間的交叉關係,就算在訓練樣本中不曾出現過或者稀少的 pair,也能學出低維 embedding,達成 generalization,屬於 “exploration”。
如何處理 sparse id features? 對於高維且稀疏的 id 特徵,deep side 採用 embedding 技術,將高維稀疏的 id 轉換成低維稠密的 embedding 輸入 feedforward dense layer
Implement Wide & Deep by Pytorch dataset 用 movielen ml-1m ,可以在以下連結找到
MovieLens
在推薦領域中,原則上所有的特徵會被 id 化,連續特徵會被分桶 label 化,MovieLens dataset 中的特徵恰好都已經分桶過了,只需要對特徵給予 label 即可。
wide 特徵處理 CrossFeatures >folded 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 class CrossFeatures (BaseEstimator, TransformerMixin ): def __init__ (self, cross_col_pairs: List[Tuple[str , str ]] ): self.cross_col_pairs = cross_col_pairs def fit (self, df: pd.DataFrame, y=None ): self.unique_columns_ = set () for pair in self.cross_col_pairs: self.unique_columns_.update(list (pair)) self.crossed_colnamed_ = [] for cols in self.cross_col_pairs: cols = list (cols) new_colname = "_" .join(cols) self.crossed_colnamed_.append(new_colname) return self def transform (self, df: pd.DataFrame ): df_cross = df[self.unique_columns_].copy().astype(str ) for cols in self.cross_col_pairs: cols = list (cols) new_colname = "_" .join(cols) df_cross[new_colname] = df_cross[cols[0 ]] + \ '-' + df_cross[cols[1 ]] return df_cross[self.crossed_colnamed_]
class CrossFeatures 將兩個 features 做交叉,即共現的 AND 關係
1 2 3 4 wide_cols = ['gender' , 'age' , 'occupation' , 'zipCode' ] crossed_cols = [('gender' , 'age' ), ('gender' , 'occupation' ), ('age' , 'occupation' )] wideGenerator = WideFeaturesGenerator(wide_cols, crossed_cols) x_wide = wideGenerator.fit_transform(X)
我們想捕捉 “gender AND age”, “gender AND occupation”, “age AND occupation” 的關係
wide 側不放 continuous 特徵
看看維度 x_wide 的 dimension:
1 2 3 4 In: print(x_wide.shape) Out: (1000209 , 7 )
wide 側總共產生 7 個特徵。 符合預期,3 組交叉特徵,4個一階特徵
但實際上,x_wide 的特徵是 label 化的,每一維僅存 label 值,看一下全局 label 值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 In: print(wideGenerator.feature_dict_) Out: {'gender_F' : 1 , 'gender_M' : 2 , 'age_1' : 3 , 'age_56' : 4 , 'age_25' : 5 , 'age_50' : 6 , 'age_18' : 7 , 'age_45' : 8 , 'age_35' : 9 , 'occupation_10' : 10 , 'occupation_16' : 11 , 'occupation_12' : 12 , 'occupation_7' : 13 , 'occupation_1' : 14 , .....
實際的 dimension 為
1 2 3 4 In: print(len (wideGenerator.feature_dict_)) Out: 3659
deep 特徵處理 LabelEncoder >folded 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 class LabelEncoder (BaseEstimator, TransformerMixin ): def __init__ (self, columns_to_encode: List[str ] ): self.columns_to_encode = columns_to_encode def fit (self, df: pd.DataFrame, y=None ): df_ = df[self.columns_to_encode].copy() for col in self.columns_to_encode: df_[self.columns_to_encode] = df[self.columns_to_encode].astype( 'str' ) unique_column_vals = {col: df_[col].unique() for col in self.columns_to_encode} self.encoding_dict_ = dict () for k, v in unique_column_vals.items(): self.encoding_dict_[k] = {val: idx for idx, val in enumerate (v)} self.encoding_dict_[k]['unseen' ] = len (self.encoding_dict_[k]) return self def transform (self, df: pd.DataFrame ): try : self.encoding_dict_ except AttributeError: raise NotFittedError( "This LabelEncoder instance is not fitted yet. " "Call 'fit' with appropriate arguments before using this LabelEncoder." ) df_ = df.copy() df_[self.columns_to_encode] = df_[self.columns_to_encode].astype('str' ) for col, encoding_map in self.encoding_dict_.items(): original_value = [f for f in encoding_map.keys() if f != 'unseen' ] df_[col] = np.where(df_[col].isin( original_value), df_[col], 'unseen' ) df_[col] = df_[col].apply(lambda x: encoding_map[x]) return df_
class LabelEncoder 會將 id 類特徵重新編碼,給予一個連續編號
首先,觀察一下每個 id 類特徵的 distinct values
1 2 3 4 5 6 7 8 9 10 In: for col in X: print(col, len (X[col].unique())) Out: userId 6040 movieId 3706 gender 2 age 7 occupation 21 zipCode 3439
這讓我們大概有個底,知道每個 id 特徵要 embedding 到多少 ,下面是每個 id 類特徵 embedding 的 dimension:
1 2 3 4 5 6 7 8 9 10 category_embed_dim_mapping = { 'userId' : 50 , 'movieId' : 50 , 'gender' : 2 , 'age' :2 , 'occupation' : 5 , 'zipCode' : 20 } category_cols = list (category_embed_dim_mapping.keys()) continuous_cols = []
在推薦領域,通常連續特徵都可以被轉化成 id 類特徵 (分桶 + label),所以 continuous_cols 為空很正常
將原始特徵轉換成 deep side 的特徵
1 2 3 deep_generator = DeepFeaturesGenerator(category_cols, continuous_cols) df_deep = X[category_cols + continuous_cols].copy() x_deep = deep_generator.fit_transform(X)
看看 x_deep 的 dimension
1 2 3 4 5 In: print(x_deep.shape) Out: (1000209 , 6 )
符合預期,總共 6 個特徵,都是 label 化後的 id 特徵
來看看每個特徵的實際維度 (onehot encoding 展開後)
1 2 3 4 5 6 7 8 9 In: print(deep_generator.embed_cols_unique_labels_) Out: {'userId' : 6041 , 'movieId' : 3707 , 'gender' : 3 , 'age' : 8 , 'occupation' : 22 , 'zipCode' : 3440 }
利用 label 化來代替 onehot encoding,可以減少大量儲存空間
Wide & Deep Model Graph Wide Model wide model 很簡單,就是簡單的淺層 feedforward network
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class Wide (nn.Module ): def __init__ (self, wide_dim: int , predict_dim: int =1 ): super ().__init__() self.linear = nn.Embedding(wide_dim + 1 , predict_dim, padding_idx=0 ) self.bias = nn.Parameter(torch.zeros(predict_dim)) self._reset_parameters() def _reset_parameters (self ): nn.init.kaiming_normal_(self.linear.weight, a=math.sqrt(5 )) fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.linear.weight) bound = 1 / math.sqrt(fan_in) nn.init.uniform_(self.bias, -bound, bound) def forward (self, X: torch.Tensor ): return self.linear(X.long()).sum (dim=1 ) + self.bias
Deep Model deep network 比較複雜,對於每個 id 類特徵,都會保存專屬的 nn.Embedding
, 將高維希疏向量轉變成低維稠密 embeddings
Deep >folded 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 class Deep (nn.Module ): EMBEDDING_LAYER_PREFIX = "emb_layer" DENSE_LAYER_PREFIX = "dense_layer" def __init__ (self, columns_index: Dict[str , int ], embed_cols_info: List[Tuple[str , int , int ]], continuous_cols: List[str ], hidden_layer_neural: List[int ], hidden_layer_dropout: List[float ], embed_col_dropout: float =0.0 ): super ().__init__() self.columns_index = columns_index self.embed_cols_info = embed_cols_info self.continuous_cols = continuous_cols self.embed_layers = self._create_embed_layers(embed_cols_info) self.embed_dropout_layer = nn.Dropout(embed_col_dropout) self.hidden_layer_neural = self._update_hidden_layer_neural( hidden_layer_neural) self.dense_layer = self._create_dense_layer(hidden_layer_dropout) self.output_dim = hidden_layer_neural[-1 ] def _create_embed_layers (self, embed_cols_info: List[Tuple[str , int , int ]] ): return nn.ModuleDict({self.EMBEDDING_LAYER_PREFIX + '_' + col_name.replace("." , '_' ): nn.Embedding(num_label, dim) for col_name, num_label, dim in embed_cols_info}) def _create_dense_layer (self, hidden_layer_dropout ): dense_dequential = nn.Sequential() for i in range (1 , len (self.hidden_layer_neural)): dense_dequential.add_module( "{}_{}" .format (self.DENSE_LAYER_PREFIX, i - 1 ), self._create_dense_component( self.hidden_layer_neural[i-1 ], self.hidden_layer_neural[i], hidden_layer_dropout[i-1 ], True ) ) return dense_dequential def _update_hidden_layer_neural (self, hidden_layer_neurals: List[int ] ): embed_dim = sum ([embed[2 ] for embed in self.embed_cols_info]) continuous_dim = len (self.continuous_cols) return [embed_dim + continuous_dim] + hidden_layer_neurals def _create_dense_component (self, input_dim: int , output_dim: int , dropout_ratio: float =0.0 , batch_norm=False ): layers = [ nn.Linear(input_dim, output_dim), nn.LeakyReLU(inplace=True ) ] if batch_norm: layers.append(nn.BatchNorm1d(output_dim)) layers.append(nn.Dropout(dropout_ratio)) return nn.Sequential(*layers) def __get_embeding_layer (self, embed_col ): embed_col = self.EMBEDDING_LAYER_PREFIX + '_' + embed_col.replace('.' , '_' ) return self.embed_layers[embed_col] def forward (self, deep_input_x: torch.Tensor ): embed_x = [ self.__get_embeding_layer(col)(deep_input_x[:, self.columns_index[col]].long()) for col, _, _ in self.embed_cols_info ] embed_x = torch.cat(embed_x, 1 ) continuous_cols_idx = [self.columns_index[col] for col in self.continuous_cols] continuous_x = deep_input_x[:, continuous_cols_idx].float () x = torch.cat([embed_x, continuous_x], dim=1 ) return self.dense_layer(x)
self.embed_layers 存放每個 id 特徵的 embeddings 向量
所有的 id embedding 會在 dim=1
concatenate
Wide & Deep 建一個 nn.Module 管理 Wide and Deep
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 class WideDeep (nn.Module ): def __init__ (self, wide: nn.Module, deep: nn.Module ): super ().__init__() deep = nn.Sequential( deep, nn.Linear(deep.output_dim, 1 ) ) self.wide_deep = nn.ModuleDict({ "wide" : wide, "deep" : deep }) def forward (self, x_wide: torch.Tensor, x_deep: torch.Tensor ): wide_out = self.wide_deep['wide' ](x_wide) deep_out = self.wide_deep['deep' ](x_deep) out = wide_out + deep_out return out.view(-1 ) @torch.no_grad() def predict (self, x_wide: torch.Tensor, x_deep: torch.Tensor, threshold: int =0.5 ): logistic = self.predict_probs(x_wide, x_deep) return (logistic > threshold).int () @torch.no_grad() def predict_probs (self, x_wide: torch.Tensor, x_deep: torch.Tensor ): out = self.forward(x_wide, x_deep) return torch.sigmoid(out.view(-1 ).float ())
Building Model 實例化 wide deep 模型
1 2 3 4 5 6 7 8 embed_cols_info = [(col, deep_generator.embed_cols_unique_labels_[col], embed_dim)for col, embed_dim in category_embed_dim_mapping.items()] deep_column_idx = {col: i for i, col in enumerate (deep_generator.deep_cols)} hidden_layers = [ 512 , 256 ] drop_out = [0.25 , 0.2 ] wide = Wide(wide_dim=np.unique(x_wide).shape[0 ], predict_dim=1 ) deep = Deep(deep_column_idx, embed_cols_info, continuous_cols, hidden_layers, drop_out, 0.2 ) wide_deep = WideDeep(wide, deep)
看看 embed_cols_info
,存放格式 ('columns_name', 'input_dims', 'embedding_dims')
1 2 3 4 5 6 [('userId' , 6041 , 50 ), ('movieId' , 3707 , 50 ), ('gender' , 3 , 2 ), ('age' , 8 , 2 ), ('occupation' , 22 , 5 ), ('zipCode' , 3440 , 20 )]
userId onehot encoding 後會有 6041 維,但 embedding 後降到 100 維
Training Stage 原汁原味的 Wide & Deep ,wide 側使用 FTRL + L1 優化得到 sparse 效果 ; deep 側使用 adagrad optimizer。
但我們現在所用的 wide side 的 features 沒大到能體現 FTRL 的稀疏效果,所以 wide side and deep side 皆使用 adam optimizer
adam 訓練過程,請參閱 :
seed9D/hands-on-machine-learning
Online Ranking wide & deep 在推薦系統中做為排序模型使用,對一批召回物品打分決定順序。 參閱 實時推薦策略流程
通常訓練模型的平台跟模型 serving 的平台是不同語言開發的,例如用 python / spark 訓練模型, serving 用 java ,造成我們很難把模型上線。尤其是複雜的深度模型,要上線 serving 需要有強大的工程支援。
常見的思路是把模型的 embedding 跟 multi layer perceptron 分離,embedding 存放在 DB 透過 index 取回,線上只需運算簡單的 MLP,有效減少線上計算量。
以 google 的 wide & deep 為例紅框內為線上 inference 部分:
線上 embedding 跟 feedforward network 分離的好處就是 embedding 的訓練可以很複雜,inference 時只需訓練完的 embedding 作為輸入,模型即可運作。
演示的關係,下面都用 python 試做 online ranking,但要應用在生產得用另外語言自行開發。
在 online serving 階段,我們會需要三個 component
feature provider
embedding provider
提供訓練好的 id embedding,抽象成度夠高的話,算法工程師是不需要關心背後的儲存源的
ranker
Feature Provider feature provider 負責提供所有模型所需的原始特徵,取回特定特徵後得自行加工符合模型 input
1 2 3 4 5 6 7 8 class FeatureProvider : def __init__ (self, df: pd.DataFrame, index_col_name ): self.df = df.copy() self.index_col_name = index_col_name def query_features (self, index: List[int ], cols: List[str ] )-> pd.DataFrame: df = self.df return df[df[self.index_col_name].isin(index)][cols].copy()
我們產生 user_feature_provider 與 item_feature_provider 分別提供 user features 和 item features
1 2 3 4 5 6 7 8 user_df = pd.read_csv(user_data_path, sep="::" , header=None , engine="python" , names=user_columns) user_df = user_df.reset_index().rename(columns={'index' : 'userId' }) item_df = pd.read_csv(movie_data_path, sep="::" , header=None , engine="python" , names=movie_columns) item_df = item_df.reset_index().rename(columns={'index' : 'movieId' }) user_feature_provider = FeatureProvider(user_df, 'userId' ) item_feature_provider = FeatureProvider(item_df, 'movieId' )
試試取回 userId = [5, 10] 的 age 跟 gender
1 2 3 4 5 6 In: user_feature_provider.query_features(index=[5 , 10 ], cols=['age' , 'gender' ]) Out: age gender 25 M 35 F
Embedding Provider embedding provider 提供已經訓練完的 id embedding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 class EmbeddingProvider : def __init__ (self, embedding_dict: nn.ModuleDict, prefix='emb_layer' ): self.prefix= prefix self.embedding_dict = embedding_dict def query_embedding (self, batch_labels:np.array, label_order:List[str ] )-> torch.Tensor: label_order = [self.prefix + '_' + str (label) for label in label_order] batch_labels = torch.from_numpy(batch_labels).long() embed_X = [ self.embedding_dict[label](batch_labels[:, idx]) for idx, label in enumerate (label_order) ] return torch.cat(embed_X, 1 )
我們將wide &deep 訓練完的 embedding 取出放進去,這一步模擬線上 embedding provider 的初始化
1 embedding_provider = EmbeddingProvider(wide_deep.wide_deep['deep' ][0 ].embed_layers, 'emb_layer' )
print 看看有哪些 id embedding可用 :
1 2 3 4 5 6 7 8 9 10 11 In: print(embedding_provider.embedding_dict) Out: ModuleDict( (emb_layer_age): Embedding(8 , 2 ) (emb_layer_gender): Embedding(3 , 2 ) (emb_layer_movieId): Embedding(3707 , 50 ) (emb_layer_occupation): Embedding(22 , 5 ) (emb_layer_userId): Embedding(6041 , 50 ) (emb_layer_zipCode): Embedding(3440 , 20 ) )
有 age , gender movieId, occupation , userId, zipCode 的 embedding
Ranker ranker 對召回物品打分,在這裡我們的 ranker 實作 wide & deep inference,更嚴格來說是 wide & deep 的 MLP 部分
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 class WideDeepOnlineRanker (nn.Module ): def __init__ (self, offlineWideDeep: nn.Module ): super ().__init__() self.deep_dense = self._fetch_deep_dense(offlineWideDeep) self.wide_part = offlineWideDeep.wide_deep['wide' ] def _fetch_deep_dense (self, offlineWideDeep ): deep_dense = [] for dense_layer in offlineWideDeep.wide_deep['deep' ][0 ].dense_layer: new_layer = nn.Sequential(*[d for d in dense_layer if not isinstance (d, nn.Dropout)]) deep_dense.append(new_layer) deep_dense.append(offlineWideDeep.wide_deep['deep' ][1 ]) return nn.Sequential(*deep_dense) def forward (self, wide_x: torch.Tensor, deep_x_embedding: torch.Tensor ): wide_output = self.wide_part(wide_x.long()) deep_output = self.deep_dense(deep_x_embedding) return (wide_output + deep_output).view(-1 ) @torch.no_grad() def scoring (self, wide_x: torch.Tensor, deep_x_embedding: torch.Tensor ): self.eval () return self.forward(wide_x, deep_x_embedding).view(-1 ).float () @torch.no_grad() def predict_probs (self, wide_x: torch.Tensor, deep_x_embedding: torch.Tensor ): self.eval () out = self.forward(wide_x, deep_x_embedding) return torch.sigmoid(out.view(-1 )).float ()
ranker 在生產環境中開發比較複雜的,演示關係這邊直接用 pytorch nn.Module 模擬 ranker
只需要知道 ranker 的負責打分即可,內部實現因人而異
初始化 ranker,將離線訓練好的 wide & deep 輸入。
1 ranker = WideDeepOnlineRanker(wide_deep)
Recommendation Process 有了 feature provider, embedding provider, ranker 三大 component,我們就可以對召回物品打分排序了
模擬 user_id=100 與 30個召回商品:
1 2 3 4 5 6 7 8 In: user_id = 100 match_itemIds = np.random.choice(item_df['movieId' ], 30 ) print(match_itemIds) Out: array([1866 , 2421 , 2869 , 1217 , 3535 , 2535 , 1542 , 3569 , 1884 , 644 , 1784 , 3506 , 3907 , 1522 , 803 , 3148 , 1715 , 2055 , 3442 , 296 , 3588 , 3871 , 1785 , 628 , 1527 , 3401 , 1921 , 3449 , 438 , 842 ])
分別取回 user_id, 與 match_itemIds 模型所需特徵
1 2 3 4 5 query_user_features_cols = ['userId' , 'gender' , 'age' , 'occupation' , 'zipCode' ] query_item_features_cols = ['movieId' ] user_primitive_feature = user_feature_provider.query_features([user_id], query_user_features_cols) item_primitive_feature = item_feature_provider.query_features(list (match_itemIds), query_item_features_cols)
print 數個 item_primitive_feature 看看
1 2 3 4 5 6 7 8 9 10 11 movieId 293 296 434 438 623 628 639 644 793 803 831 842 1199 1217 1486 1522 1491 1527 1503 1542
第一個 columns 是 dataframe 本身的 index…
可以看到 item 相關特徵只有用到 movieId ….,事實上 movieLen 中還有 title, genre 可以用
title 特徵可以先過 word2vector 做 title embedding
genre 是 multi-label 的特徵,本質上是 multi-hot encoding ,涉及不定長度的 tensor input,處理起來稍微麻煩。 multi-label 的特徵在推薦系統中還滿常見的,演示的關係就 let it go 了
接下來要將原始特徵轉換成 wide & deep 的 input,線上線下的特徵處理務求一致。演示關係,這邊假設我們已經 implement 好 wide feature generator 和 deep feature generator
1 2 3 4 5 6 7 user_primitive_feature['join' ] = 1 item_primitive_feature['join' ] = 1 primitive_features = user_primitive_feature.merge(item_primitive_feature, on = ['join' ]).drop('join' , axis=1 ) processed_wide_features = wideGenerator.transform(primitive_features) processed_deep_features = deep_generator.transform(primitive_features) processed_wide_features = torch.from_numpy(processed_wide_features).long()
print 看看 processed_wide_features 和 processed_deep_features 的維度
1 2 3 4 In: processed_wide_features.shape, processed_deep_features.shape Out: ((30 , 7 ), (30 , 6 ))
wide 特徵用到 7 個; deep 特徵用到 6 個
processed_deep_features 的 6 個特徵都是 embedding 特徵,我們透過 embedding provider 取回各自的 embedding
1 2 embedding_label_order = ['userId' , 'movieId' , 'gender' , 'age' , 'occupation' , 'zipCode' ] embedding_features = embedding_provider.query_embedding(processed_deep_features, embedding_label_order)
print embedding_features dimension
1 2 3 4 5 In: print(embedding_features.shape) Out: torch.Size([30 , 129 ])
embedding provider 內部還貼心 concatenate 好了。 P.S. concatenate 功能應該要拆分出來,這不屬於 embedding provider 的職責
經過前面一大段特徵處理,終於可以送進 ranker 打分了
1 scores = ranker.predict_probs(processed_wide_features, embedding_features)
將 score 由高到低排序並 print 出對應的 movieId 和 movie title
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 In: sorted_index = np.argsort(scores.detach().numpy(), axis=0 )[::-1 ] for idx in sorted_index: movieId = match_itemIds[idx] title = item_feature_provider.query_features([movieId], ['title' ])['title' ].item() print("movieId:{} \t '{}' \t score:{}" .format (movieId, title, round (scores[idx].item(), 3 ))) Out: movieId:1866 'Big Hit, The (1998)' score:0.77 movieId:1542 'Brassed Off (1996)' score:0.702 movieId:2869 'Separation, The (La S�paration) (1994)' score:0.553 movieId:3506 'North Dallas Forty (1979)' score:0.546 movieId:3588 'King of Marvin Gardens, The (1972)' score:0.473 movieId:438 'Cowboy Way, The (1994)' score:0.456 movieId:1921 'Pi (1998)' score:0.453 movieId:3148 'Cider House Rules, The (1999)' score:0.385 movieId:1527 'Fifth Element, The (1997)' score:0.367 movieId:803 'Walking and Talking (1996)' score:0.34 movieId:1884 'Fear and Loathing in Las Vegas (1998)' score:0.315 movieId:3535 'American Psycho (2000)' score:0.315 movieId:296 'Pulp Fiction (1994)' score:0.313 movieId:842 'Tales from the Crypt Presents: Bordello of Blood (1996)' score:0.289 movieId:3401 'Baby... Secret of the Lost Legend (1985)' score:0.286 movieId:644 'Happy Weekend (1996)' score:0.268 movieId:3907 'Prince of Central Park, The (1999)' score:0.259 movieId:3449 'Good Mother, The (1988)' score:0.226 movieId:1784 'As Good As It Gets (1997)' score:0.162 movieId:1522 'Ripe (1996)' score:0.115 movieId:3569 'Idiots, The (Idioterne) (1998)' score:0.09 movieId:628 'Primal Fear (1996)' score:0.09 movieId:2535 'Earthquake (1974)' score:0.072 movieId:1217 'Ran (1985)' score:0.07 movieId:3442 'Band of the Hand (1986)' score:0.063 movieId:2421 'Karate Kid, Part II, The (1986)' score:0.063 movieId:2055 'Hot Lead and Cold Feet (1978)' score:0.062 movieId:1715 'Office Killer (1997)' score:0.055 movieId:1785 'King of New York (1990)' score:0.034 movieId:3871 'Shane (1953)' score:0.02
到此,我們完成了 online ranking 流程
Last but not Least 在 google 論文中為了讓 wide side 的 features weights 非常稀疏, optimizer 採用 FTRL with L1 regularization,此篇實作中的 features 展開來也才一萬出頭,所以使用 FTRL 優化效果不好。
在工業界排序模型中 wide 側的交叉特徵通常是上百萬甚至上千萬級別的 id cross features,這麼龐大的 wide side weight 對線上儲存及計算造成很大的壓力,這也是為什麼 google 選用 FTRL with L1 regularization 讓 weights 稀疏的原因,因為weight 稀疏意味著特徵選擇(選擇非 0 的 weight 特徵),有著減少線上查詢量和計算量的效果。
Reference