# [Notes] Bayesian Probit Regression

Published:

Consider the Bayesian probit regression model

$Y_i\mid\beta_0,..,\beta_k \sim Bernoulli(\Phi(\beta_0+\beta_1x_{1i}+...+\beta_kx_{ki})), 1\leq i\leq n,$

where $\pmb{\beta}\sim N(\pmb{\mu_{\beta}},\pmb{\Sigma_{\beta}}).$ and $\Phi$ denotes a standard normal distribution.

Let $\pmb{a}=(a_1,..a_n)$ be auxiliary variables where $a_i\mid\pmb{\beta} \sim N((\pmb{X\beta})_i,1)$.

We have

$p(y_i \mid a_i)= I(a_i\geq 0)^{y_i}I(a_i\leq 0)^{1-y_i}, 1 \leq i \leq n.$

Consider a product restriction $p(\pmb{a,\beta})=q_{\pmb{a}}(\pmb{a})q_{\pmb{\beta}}(\pmb{\beta})$.

Using product density transforms gives

$q_a(a)=[\prod_{i=1}^n(\frac{I(a_i\geq 0)}{\Phi((\pmb{X\mu}_{q(\beta)_i}))})^{y_i}(\frac{I(a_i\leq0)}{1-\Phi((\pmb{X\mu}_{q(\beta)_i}))})^{1-y_i}]\times (2\pi)^{-n/2}exp(-\frac{1}{2}||\pmb{a-X\mu_{q(\beta)}}||^2)$

and $q_{\pmb{\beta}}(\pmb{\beta})$ is the $N(\mu_{q_{(\pmb{\beta})}},(\pmb{X^TX+\Sigma_{\beta}^{-1}})^{-1})$ density function.

An algorithm for obtaining the optimal parameters for $q_{\pmb{a}}$ and $q_{\pmb{\beta}}$ is as follows

Initialize:

(1) $\mu_{q(\pmb{a})}$ using truncated normal in $(n \times 1)$ shape.

(2) $log_\delta = sys.maxint$

While $log_\delta > tol$:

Do:

$\mu_q({\pmb{\beta}})= \pmb{(X^TX+\Sigma_{\beta}^{-1})^{-1}(X^T_{\mu_{q(a)}} + \Sigma_{\beta}^{-1}\mu_{\beta})}$ $\mu_q({\pmb{a}}) = \pmb{X\mu_{q_(\beta)} +\frac{\phi(X_{\mu_{q(\beta)}})}{\Phi(X\mu_{q(\beta)})^y(\Phi(X\mu_{q(\beta)})-1_n)^{1_n-y}}}$ $\log_\delta = \mid log(p^{(i)})-log(p^{(i-1)})\mid$

References

Ormerod, John T., and Matt P. Wand. “Explaining variational approximations.” The American Statistician 64.2 (2010): $140-153$.