文章目录
第一章 介绍
频率学派 VS 贝叶斯学派 (Frequentist v.s Bayesian)
- 频率学派认为一个变量的概率分布是确定的,然而贝叶斯学派认为变量的概率分布是不确定的,它会随着新的消息,新的观测而发生改变,先有根据已有的经验假设一个先验概率分布,而后利用观测值得出后验分布
Proponents of the frequentist approachconsider the s ource of
uncertainty to be the r andomness inherent in realizations of a r
andom variable. The probability distributions of variabl es are not
subject to uncertainty. In contrast, Bayesian statistics treats pr
obability distributions as uncertain and subject to modification as
new information becomes available. Uncertainty is implicitly
incorporated by probab ility updating. T he probability beliefs based
on the existing knowledge base take the form of the prior probability.
The posterior probability represents the updated beliefs.
第二章 贝叶斯框架 - 似然函数
泊松分布
-
泊松概率分布公式
p ( X = k ) = θ k k ! e − θ , k = 0 , 1 , 2 , … p(X=k)=\frac{\theta^{k}}{k !} e^{-\theta}, \quad k=0,1,2, \ldots p(X=k)=k!θke−θ,k=0,1,2,… -
假设有20个观测值 x 1 , x 2 , … , x 20 x_{1}, x_{2}, \ldots, x_{20} x1,x2,…,x20, 那么联合概率分布为:
L ( θ ∣ x 1 , x 2 , … , x 20 ) = ∏ i = 1 20 p ( X = x i ∣ θ ) = ∏ i = 1 20 θ x i x i ! e − θ = θ ∑ i = 1 20 x i ∏ i = 1 20 x i ! e − 20 θ \begin{aligned} L\left(\theta | x_{1}, x_{2}, \ldots, x_{20}\right) &=\prod_{i=1}^{20} p\left(X=x_{i} | \theta\right)=\prod_{i=1}^{20} \frac{\theta^{x_{i}}}{x_{i} !} e^{-\theta} \\ &=\frac{\theta^{\sum_{i=1}^{20} x_{i}}}{\prod_{i=1}^{20} x_{i} !} e^{-20 \theta} \end{aligned} L(θ∣x1,x2,…,x20)=i=1∏20p(X=xi∣θ)=i=1∏20xi!θxie−θ=∏i=120xi!θ∑i=120xie−20θ -
上式可以改为:
L ( θ ∣ x 1 , x 2 , … , x 20 ) ∝ θ Σ i = 1 20 x i e − 20 θ L\left(\theta | x_{1}, x_{2}, \ldots, x_{20}\right) \propto \theta^{\Sigma_{i=1}^{20} x_{i}} e^{-20 \theta} L(θ∣x1,x2,…,x20)∝θΣi=120xie−20θ -
利用极大似然估计法(maximum likelihood)得出 最大似然估计:
θ ^ = x ˉ = ∑ i = 1 20 x i 20 \widehat{\theta}=\bar{x}=\frac{\sum_{i=1}^{20} x_{i}}{20} θ =xˉ=20∑i=120xi
即为20个观测值的平均值 -
上述推断的一个隐含假设是20个观测值相互独立,互不干扰
正态分布
- 概率分布公式
f ( y ) = 1 2 π σ e − ( − x ) 2 2 σ 2 f(y)=\frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{(-x)^{2}}{2 \sigma^{2}}} f(y)=2πσ1e−2σ2(−x)2
贝叶斯公式
- P ( E ∣ D ) = P ( D ∣ E ) × P ( E ) P ( D ) P(E | D)=\frac{P(D | E) \times P(E)}{P(D)} P(E∣D)=P(D)P(D∣E)×P(E)
贝叶斯推断与二项分布
- Beta分布是二项分布的共轭先验
The beta distribution is the conjugate prior distribution for the
binomial parameter θ . This means that the posterior distribution of θ
is also a beta distribution (of course, with updated parameters)
- 有时候由于样本数据量太大,不同先验分布的选择其实不会造成后验分布的较大差异,因为这时候大样本可以掩盖掉先验信息的作用。
The two posterior estimates and the m aximum-likelihood estimate are the same for all practical purposes. The reason is that the sample
size is so large that the information contained in the data sample
‘‘swamps out’’ the prior information. In Chapter 3, we further
illustrate and comment on the role sample size plays in posterior
inference
第三章 先验信息与后验信息、以及预测推断
先验:
-
p
(
θ
∣
y
)
∝
L
(
θ
∣
y
)
π
(
θ
)
p(\theta | \boldsymbol{y}) \propto L(\theta | \boldsymbol{y}) \pi(\theta)
p(θ∣y)∝L(θ∣y)π(θ)
where:- θ = unknown parameter whose inference we are interested in.
- y = a vector (or a matrix) of recorded observations.
- π (θ ) = prior distribution of θ depending on one or more parameters, called hyperparameters.
- L (θ |y) = likelihood function for θ .
- p(θ |y) = posterior (updated) distribution of θ .
从上述公式可与i看出:
- 影响后验分布主要有两个因素,一是先验信息,二是观察到的数据
- 先验信息的影响效力通常会随着观测数据量变大而减弱
- 如果观测数据量很小,那么先验信息将会很大决定后验分布