Thompson Sampling 推薦系統中簡單實用的 Exploring Strategy

2021-02-022021-02-03recommendation system14 minutes read (About 2027 words)0 visits

Exploring and Exploiting

Exploring and Exploiting (EE) 是推薦系統中歷久不衰的議題，如何幫助用戶發現更多感興趣的 entity 以及基於已有對用戶的認知推薦他感興趣的 entity，在推薦系統的實務上都得考慮。

具象化這個問題：在推薦系統中有$\text{}$ $\text{category A, category B, category C, category D, category E}$ 等五大類的 entity 集合，今天有個新用戶 $U$來了，我們要如何

知道他對哪個種類的 entity 比較感興趣？
人的興趣可以分成長期興趣跟短期興趣，在電商場景中，用戶短期興趣指他有立即需求的商品，我們如何快速抓到他的意圖，調整推薦系統的響應？
推薦哪些類目能帶給他意料之外的驚喜 ? 那些他沒預期，但我們推薦給他，能讓他感到滿意的 category。

Multi-armed bandit problem, K-armed bandit problem (MAP) 中的 Thompson Sampling，簡單又實用

推薦系統相關文章

Thompson Sampling

Thompson Sampling 利用了 beta distribution 是 bernoulli distribution 的 conjugacy prior，來更新 entity 被選中的 posterior probability distribution

從 Beta distribution 說起

$Beta(p|\alpha, \beta) \triangleq \cfrac{1}{B(\alpha, \beta)} \ p^{\alpha -1} \ (1-p)^{\beta - 1}$

beta function $B(\alpha, \beta)$ is a normalization term ，其作用是使 $\int^1_0 \cfrac{1}{B(\alpha, \beta)} \ p^{\alpha -1} \ (1-p)^{\beta - 1} dp = 1$
- $B(\alpha, \beta) = \int^1_0 p^{\alpha -1} (1-p)^{\beta -1} dp$

Beta distribution $beta(\alpha, \beta)$ 的期望值很簡潔

$E[p] = \cfrac{\alpha}{\alpha + \beta}$

我們知道期望值本身是結果的加權平均，如果把 $\alpha$ 視為成功次數， $\beta$ 視為失敗次數，那不就是平均成功率了嗎？

更神奇的是，平均成功率還可以隨著試驗的失敗跟成功次數變動，依然還是 beta distribution

$\cfrac{\alpha + n^{(1)}}{\alpha + \beta + n^{(1)} + n^{(0)}}$

$n^{(1)}$：表新增成功次數
$n^{(0)}$: 表新增失敗次數

也因為這個直覺的特性，Beta distribution 非常適合用在估計打擊率, 點擊率, 命中率 …等等 binary problem

以推薦系統 click through rate (CTR) 當例子

在推薦系統中，category $A$ 在用戶的點擊率 (ctr) 統計中，所有用戶對 category $A$ :

$\text{average ctr} = 0.33$
$\text{ctr variacne} = 0.00147$

以 $\text{mean=0.33 variacne=0.00147}$ 算出 $\alpha \approx50$ $\beta \approx100$， $\alpha =50$ $\beta=100$ 在本推薦系統中的意義是，$\text{category A}$ 平均每 150 次 impression ($\alpha + \beta$) 能產生 50 次 click ($\alpha$)，100 次看了不點 ($\beta$)。

畫出 PDF

圖中 PDF curve 的意義是，有個人叫做 “平均用戶”，”平均用戶” 對 $\text{category A}$ 最有可能的點擊率是 $0.33$，但不一定是 0.33, 可能比 0.33 高，可能比 0.33 低，但產生 0.33 這個點擊率的 likelihood $L(\theta| X=0.33)$ 最高
- 下圖是對 $beta(50, 100)$ sample 500 次，可以看出 $X=0.33$ 附近被 sample 到的次數的確較高

今天來了個新用戶 $U$，我們不知道他對 $\text{category A}$ 的喜好程度怎麼樣，但我們可以利用前面的 “平均用戶” 做為先驗： 150 impression 產生 50 次 click ($\alpha=50 \ , \beta=100$ )，再透過他後續跟 $\text{category A}$ 的互動修正出 for $\text{user U}$ 的 $\alpha_U \ \beta_U$。

假設我們給 $U$ 展示 $\text{category A}$ 100 次後，他 click了 60 次，看了不點 40 次，那他的 beta distribution 變成

$beta(50 + 60, 100 + 40 ) = beta(110, 140)$

可以發現橘線變得更尖，且往右移，此時 $mean =0.44$，表示 $user \ U$ 比＂平均用戶＂更加偏好 $\text{category A}$。

總結以上，一開始我們對於新用戶 $U$ 一無所知，不知道他對 $\text{category A}$ 的偏好，但我們透過已有的先驗，結合他跟推薦系統的互動，慢慢修正對他的認知：

$\cfrac{\alpha + n^{(1)}}{\alpha + \beta + n^{(1)} + n^{(0)}} = \cfrac{50 + 60}{50 + 100 + 60 + 40} = 0.44$

$n^{(1)}$：對 $\text{category A}$ 新的點擊行為
$n^{(0)}$: 對 $\text{category A}$ 新的＂看了未點＂的行為

於是，ctr 從原本 “最有可能” 0.33 修正到 “最有可能” 0.44 。

“最有可能”: 因爲一切都是 distribution 阿

這個神奇又簡潔的現象背後的數學原理，正是 beta distribution 的 conjugacy 特性。

Conjugate prior & Bayesian inference

prior $p(\theta)$ is conjugate to the likelihood function $p(X|\theta)$ when the posterior $p(\theta|X)$ has the same function form as the prior

$p(\theta|X) = \cfrac{p(X|\theta) p(\theta)}{p(X)} \Leftrightarrow \text{posterior} = \cfrac{\text{likelihood} \cdot \text{prior}}{\text{evidence}}$

$p(X)$ is the normalization term

$p(X) = \int_{\theta\in \Theta}p(X|\theta)p(\theta)d\theta$

即是

prior $p(\theta)$ 為 beta distribution $Beta(\theta|\alpha, \beta) = \cfrac{1}{B(\alpha, \beta)} \ \theta^{\alpha -1} \ (1-\theta)^{\beta - 1}$
likelihood function $p(X|\theta)$ 為 bernoulli distribution $Bern(c|\theta) = \theta^c(1-\theta)^{1-c}$

beta distribution 與 bernoulli distribution 都有類似的 form: $\theta^m(1-\theta)^n$ ，同時 posterior distribution $p(\theta|X)$ 也是 beta distribution

posterior $p(\theta|X)$ 也是 beta distribution 證明如下

Proof

假設

推薦系統中，對 $category \ A$ 曝光 $N$ 次，用戶 $U$ 點擊次數 $n^{(1)}$，未點擊次數 $n^{(0)}$，本質上是個 $N \ bernoulli \ trail$ ，所以其 likelihood function：

$p(C|p) =\prod_{i=1}^n p(C=c_i|p)= p^{n^{(1)}}(1-p)^{n^{(0)}}$ (忽略係數)
- $C$ 是 outcome, $c=1$ for positive ; $c=0$ for negative
$prior$ $p(p)$ 為 beta distribution :
$p(p|\alpha, \beta) = Beta(p|\alpha, \beta) = \cfrac{1}{B(\alpha, \beta)} \ p^{\alpha -1} \ (1-p)^{\beta - 1}$

則 $\text{posterior}$ $p(p|C,\alpha, \beta) = \cfrac{ p(C|p) \ p(p|\alpha, \beta)}{\int^1_0 p(C|p) \ p(p|\alpha, \beta) \ dp}$

分母項 $\int^1_0 p(C|p) \ p(p|\alpha, \beta) \ dp$ 作用為 normalize the distribution，通常用 $Z$ 代表：

$\begin{aligned} p(p|C,\alpha, \beta) &= \cfrac{ p(C|p) p(p|\alpha, \beta)}{\int^1_0 p(C|p) p(p|\alpha, \beta) dp} \\ &= \cfrac{p^{n^{(1)}} (1-p)^{n^{(0)}} \cfrac{1}{B(\alpha,\beta)} p^{\alpha -1} (1-p)^{\beta -1}}{Z} \\ &= \cfrac{p^{[n^{(1)} + \alpha] -1} (1 - p) ^{[n^{(0)} + \beta]-1}}{B(\alpha, \beta) Z} \end{aligned}$

$Z =\cfrac{1}{B(\alpha,\beta)} \int^1_0 p^{n^{(1)}} (1-p)^{n^{(0)}} p^{\alpha -1} (1-p)^{\beta -1} dp$

分母要 normalize 整個 probability distribution 使 $\int p(p|C,\alpha, \beta) dp= 1$

而新的 normalization 項為

$B(\alpha,\beta)Z = \int^1_0 p^{[n^{(1)} + \alpha] -1} (1 - p) ^{[n^{(0)} + \beta]-1}dp$

這不正是另一個 Beta function: $B(n^{(1)} + \alpha , n^{(0) } +\beta)$ ？？

所以 $p(p|C,\alpha, \beta)$ 最終化簡成

$\begin{aligned} p(p|C,\alpha, \beta) &= \cfrac{p^{[n^{(1)} + \alpha] -1} (1 - p) ^{[n^{(0)} + \beta]-1}}{B(n^{(1)} + \alpha , n^{(0) } +\beta)} \\ &= Beta (p|n^{(1)} +\alpha, n^{(0)} + \beta)\end{aligned}$

故得證 $\text{posterior}$ $p(p|C,\alpha, \beta)$ 也是 $\text{Beta distribution}$

Implement

一個簡單的實作方式是

先在線下計算好每個 category 的 ctr mean 跟 variance。
在實時推薦時，拿回某用戶近期對每個 category 交互數據 impression 與 click ，計算出新的 $\alpha \ \beta$。
有了每個類目的 $\alpha \ \beta$ 後，對每個類目的 $beta(\alpha, \beta)$ sampling，接著取出 sample 後 top K 的類目即可。
C2I 召回 …..

當然，你也可以不基於 category 維度計算 beta distribution，而是基於每一個 entity。不過如果 entity 數量上百萬，這顯然不切實際。

線下

統計每個 category CTR 的 variance and mean

Spark snippets

def calAvgAndVar(input: Dataset[Row], categoryCol: String): Dataset[Row] =
    input.select(categoryCol, ctrCol)
      .groupBy(categoryCol).agg(
	      fn.avg(fn.col(ctrCol)).as("ctr_avg"),
	      fn.variance(ctrCol).alias("ctr_var"))
      .na.drop
      .withColumnRenamed(categoryCol, "categoryId")

線上

計算每個 category 的初始 $\alpha_0 \ \beta_0$

ImmutablePair<Double, Double> calAlphaAndBeta(double ctrMean, double ctrVar) {
            double alpha = (((1 - ctrMean) / (ctrMean)) - 1 / ctrMean) * Math.pow(ctrMean, 2);
            double beta = alpha * ((1 / ctrMean) - 1);
            return ImmutablePair.of(alpha, beta);
  }

$\begin{aligned} \alpha &=\left(\frac{1-\mu}{\sigma^2}-\frac{1}{\mu}\right)\mu^2 \\ \beta &= \alpha\left(\frac{1}{\mu}-1\right) \end{aligned}$

取回用戶的近期 category 交互行為 impression and click，並計算新的 $\alpha_t,\ \beta_t$

/* left: alpha, right: beta */
ImmutablePair<Double, Double> prior = calAlphaAndBeta(double ctrMean, double ctrVar);

/* left: impression, right: click */
ImmutablePair<Integer, Integer> posteriorPair = posteriorData.getOrDefault(cateId, ImmutablePair.of(0, 0));

int clickCount = posteriorPair.getRight();
int impressionCount = posteriorPair.getLeft();
int impressionWithoutClick = (impressionCount - clickCount) > 0 ? (impressionCount - clickCount) : impressionCount;

double newAlpha = prior.getLeft() + clickCount;
double newBeta =  prior.getRight() + impressionWithoutClick;

對每個 category 的 beta distribution $beta(\alpha, \beta)$ sampling

import org.apache.commons.math3.distribution.BetaDistribution;
double calBetaProbability(double alpha, double beta) {
	  BetaDistribution betaDistribution = new BetaDistribution(alpha, beta);
    double rand = Math.random();
    return betaDistribution.inverseCumulativeProbability(rand);
}

sampling 利用 beta distribution 的 inverse cumulative distribution function (inverse CDF) sampling 出 random variable
- 參考 https://en.wikipedia.org/wiki/Inverse_transform_sampling

Reference

Multi-Armed Bandit With Thompson Sampling https://predictivehacks.com/multi-armed-bandit-with-thompson-sampling/
Conjugacy in Bayesian Inference http://gregorygundersen.com/blog/2019/03/16/conjugacy/
Understanding the beta distribution (using baseball statistics) http://varianceexplained.org/statistics/beta_distribution_and_baseball/
- 中文翻譯 : 如何通俗理解 beta 分布？ - 小杰的回答 - 知乎 https://www.zhihu.com/question/30269898/answer/123261564
https://en.wikipedia.org/wiki/Beta_distribution
Heinrich, G. (2005). Parameter estimation for text analysis
- 雖然是講 LDA，但前面從 ML MAP 一路推導到 Bayesian inference ，很詳細

Thompson Sampling 推薦系統中簡單實用的 Exploring Strategy

https://seed9d.github.io/Implement-Thompson-Sampling-in-Recommendation-System/

Author

seed9D

Posted on

2021-02-02

Updated on

2021-02-03

Licensed under

recommendation system

Thompson Sampling 推薦系統中簡單實用的 Exploring Strategy

Exploring and Exploiting

Thompson Sampling

從 Beta distribution 說起

以推薦系統 click through rate (CTR) 當例子

Conjugate prior & Bayesian inference

Proof

Implement

線下

統計每個 category CTR 的 variance and mean

線上

Reference

Author

Posted on

Updated on

Licensed under

Comments

Catalogue

Recents