# [Notes] Bayesian Probit Regression

** Published:**

Consider the Bayesian probit regression model

\[Y_i\mid\beta_0,..,\beta_k \sim Bernoulli(\Phi(\beta_0+\beta_1x_{1i}+...+\beta_kx_{ki})), 1\leq i\leq n,\]where \(\pmb{\beta}\sim N(\pmb{\mu_{\beta}},\pmb{\Sigma_{\beta}}).\) and \(\Phi\) denotes a standard normal distribution.

Let \(\pmb{a}=(a_1,..a_n)\) be auxiliary variables where \(a_i\mid\pmb{\beta} \sim N((\pmb{X\beta})_i,1)\).

We have

\[p(y_i \mid a_i)= I(a_i\geq 0)^{y_i}I(a_i\leq 0)^{1-y_i}, 1 \leq i \leq n.\]Consider a product restriction \(p(\pmb{a,\beta})=q_{\pmb{a}}(\pmb{a})q_{\pmb{\beta}}(\pmb{\beta})\).

Using product density transforms gives

\[q_a(a)=[\prod_{i=1}^n(\frac{I(a_i\geq 0)}{\Phi((\pmb{X\mu}_{q(\beta)_i}))})^{y_i}(\frac{I(a_i\leq0)}{1-\Phi((\pmb{X\mu}_{q(\beta)_i}))})^{1-y_i}]\times (2\pi)^{-n/2}exp(-\frac{1}{2}||\pmb{a-X\mu_{q(\beta)}}||^2)\]and $q_{\pmb{\beta}}(\pmb{\beta})$ is the $N(\mu_{q_{(\pmb{\beta})}},(\pmb{X^TX+\Sigma_{\beta}^{-1}})^{-1})$ density function.

An algorithm for obtaining the optimal parameters for $q_{\pmb{a}}$ and $q_{\pmb{\beta}}$ is as follows

Initialize:

(1) \(\mu_{q(\pmb{a})}\) using truncated normal in \((n \times 1)\) shape.

(2) \(log_\delta = sys.maxint\)

While \(log_\delta > tol\):

Do:

\[\mu_q({\pmb{\beta}})= \pmb{(X^TX+\Sigma_{\beta}^{-1})^{-1}(X^T_{\mu_{q(a)}} + \Sigma_{\beta}^{-1}\mu_{\beta})}\] \[\mu_q({\pmb{a}}) = \pmb{X\mu_{q_(\beta)} +\frac{\phi(X_{\mu_{q(\beta)}})}{\Phi(X\mu_{q(\beta)})^y(\Phi(X\mu_{q(\beta)})-1_n)^{1_n-y}}}\] \[\log_\delta = \mid log(p^{(i)})-log(p^{(i-1)})\mid\]**References**

Ormerod, John T., and Matt P. Wand. “Explaining variational approximations.” *The American Statistician* 64.2 (2010): \(140-153\).