织梦网站防黑怎么做,网站建设工程,网站建设seo推广,长沙装修公司联系电话转自https://blog.csdn.net/jackxu8/article/details/71308390#commentBox
源地址https://docs.pymc.io/notebooks/bayesian_neural_network_advi.html
PyMC3中的贝叶斯深网络
生成数据
产生一个简单的线性不可分的二分类问题的模拟数据。
%matplotlib inline
import pymc…转自https://blog.csdn.net/jackxu8/article/details/71308390#commentBox
源地址https://docs.pymc.io/notebooks/bayesian_neural_network_advi.html
PyMC3中的贝叶斯深网络
生成数据
产生一个简单的线性不可分的二分类问题的模拟数据。
%matplotlib inline
import pymc3 as pm
import theano.tensor as T
import theano
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style(white)
from sklearn import datasets
from sklearn.preprocessing import scale
from sklearn.cross_validation import train_test_split
from sklearn.datasets import make_moons
/Users/xujie/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.This module will be removed in 0.20., DeprecationWarning)X, Y make_moons(noise0.2, random_state0, n_samples1000)
X scale(X)
X_train, X_test, Y_train, Y_test train_test_split(X, Y, test_size.5)
fig, ax plt.subplots()
ax.scatter(X[Y0, 0], X[Y0, 1], colorb,edgecolorsk,alpha0.6, labelClass 0)
ax.scatter(X[Y1, 0], X[Y1, 1], colorr,edgecolorsk,alpha0.6, labelClass 1)
sns.despine(); ax.legend()
ax.set(xlabelX, ylabelY, titleToy binary classification data set); 模型定义
神经网络的基础单元是感知器和逻辑回归中的类似。将感知器并行排列然后堆叠起来获得隐层。这里使用2个隐层每层有5个神经元。使用正态分布来正则化权重。
ann_input theano.shared(X_train)
ann_output theano.shared(Y_train)n_hidden 5# Initialize random weights between each layer
init_1 np.random.randn(X.shape[1], n_hidden)
init_2 np.random.randn(n_hidden, n_hidden)
init_out np.random.randn(n_hidden)with pm.Model() as neural_network:# Weights from input to hidden layerweights_in_1 pm.Normal(w_in_1, 0, sd1, shape(X.shape[1], n_hidden), testvalinit_1)# Weights from 1st to 2nd layerweights_1_2 pm.Normal(w_1_2, 0, sd1, shape(n_hidden, n_hidden), testvalinit_2)# Weights from hidden layer to outputweights_2_out pm.Normal(w_2_out, 0, sd1, shape(n_hidden,), testvalinit_out)# Build neural-network using tanh activation functionact_1 T.tanh(T.dot(ann_input, weights_in_1))act_2 T.tanh(T.dot(act_1, weights_1_2))act_out T.nnet.sigmoid(T.dot(act_2, weights_2_out))# Binary classification - Bernoulli likelihoodout pm.Bernoulli(out, act_out,observedann_output)
/Users/xujie/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/theano/tensor/basic.py:2146: UserWarning: theano.tensor.round() changed its default from half_away_from_zero to half_to_even to have the same default as NumPy. Use the Theano flag warn.roundFalse to disable this warning.theano.tensor.round() changed its default from变分推理缩放模型的复杂性
现在可以使用MCMC采样器如NUTS进行采样将获得不错的效果但是这会非常缓慢。这里使用全新的ADVI变分推理算法更加快速对模型复杂度的适应性也更好。但是这是均值近似因此忽略后验中的相关性。
%%timewith neural_network:# Run ADVI which returns posterior means, standard deviations, and the evidence lower bound (ELBO)v_params pm.variational.advi(n50000)
Average ELBO -151.63: 100%|██████████| 50000/50000 [00:1300:00, 3790.27it/s]
Finished [100%]: Average ELBO -136.89CPU times: user 15.5 s, sys: 695 ms, total: 16.1 s
Wall time: 18 s画出目标函数(ELBO)可以看出随着优化的进行效果主见提升。
plt.plot(v_params.elbo_vals);
plt.ylabel(ELBO);
plt.xlabel(iteration); 为了观察方便这里使用sample_vp()对后验进行采样这个函数只是对正态分布进行采样因此和MCMC不一样。
with neural_network:trace pm.variational.sample_vp(v_params, draws5000)
100%|██████████| 5000/5000 [00:0000:00, 10687.44it/s]这里完成了模型训练然后使用posterior predictive check (PPC)对排除在外的数据点进行预测。使用sample_ppc()从后验中生成新数据从变分估计中采样。
# Replace shared variables with testing set
ann_input.set_value(X_test)
ann_output.set_value(Y_test)# Creater posterior predictive samples
ppc pm.sample_ppc(trace, modelneural_network, samples500)# Use probability of 0.5 to assume prediction of class 1
pred ppc[out].mean(axis0) 0.5
100%|██████████| 500/500 [00:0300:00, 139.67it/s]fig, ax plt.subplots()
ax.scatter(X_test[pred0, 0], X_test[pred0, 1], colorb,edgecolorsk,alpha0.6)
ax.scatter(X_test[pred1, 0], X_test[pred1, 1], colorr,edgecolorsk,alpha0.6)
sns.despine()
ax.set(titlePredicted labels in testing set, xlabelX, ylabelY); print(Accuracy {}%.format((Y_test pred).mean() * 100))
Accuracy 95.6%分类器分析
为了分析这个分类器的结果多整个输入空间上的网络输出做出评估。
grid np.mgrid[-3:3:100j,-3:3:100j]
grid_2d grid.reshape(2, -1).T
dummy_out np.ones(grid.shape[1], dtypenp.int8)
ann_input.set_value(grid_2d)
ann_output.set_value(dummy_out)# 对后验估计进行采样
ppc pm.sample_ppc(trace, modelneural_network, samples500)
100%|██████████| 500/500 [00:0400:00, 101.43it/s]绘制预测平面如下
cmap sns.diverging_palette(250, 12, s85, l25, as_cmapTrue)
fig, ax plt.subplots(figsize(10, 6))
contour ax.contourf(*grid, ppc[out].mean(axis0).reshape(100, 100), cmapcmap)
ax.scatter(X_test[pred0, 0], X_test[pred0, 1], colorb, edgecolorsk,alpha0.6)
ax.scatter(X_test[pred1, 0], X_test[pred1, 1], colorr, edgecolorsk,alpha0.6)
cbar plt.colorbar(contour, axax)
_ ax.set(xlim(-3, 3), ylim(-3, 3), xlabelX, ylabelY);
cbar.ax.set_ylabel(Posterior predictive mean probability of class label 0); 预测中的不确定性
目前未知这些工作都可以在非贝叶斯神经网络上完成。每个类别的后验均值可以用极大似然估计给出。然而贝叶斯神经网络还可以给出后验估计的标准差来评价后验中的不确定性。
cmap sns.cubehelix_palette(light1, as_cmapTrue)
fig, ax plt.subplots(figsize(10, 6))
contour ax.contourf(*grid, ppc[out].std(axis0).reshape(100, 100), cmapcmap)
ax.scatter(X_test[pred0, 0], X_test[pred0, 1], colorb, edgecolorsk,alpha0.6)
ax.scatter(X_test[pred1, 0], X_test[pred1, 1], colorr, edgecolorsk,alpha0.6)
cbar plt.colorbar(contour, axax)
_ ax.set(xlim(-3, 3), ylim(-3, 3), xlabelX, ylabelY);
cbar.ax.set_ylabel(Uncertainty (posterior predictive standard deviation)); 可以看出在越接近决策边界的地方预测的不确定性越高。而预测结果与不确定性相关联分析是许多应用领域的重要因素比如医学检测。为了更高的准确性可以让模型在不确定性高的地方进行更多的采样。
微批AVDIMini-batch AVDI可扩展数据大小
此前都是在所有数据上一次性训练模型。这样的训练方式对于数据集的大小不可扩展。然而可以在一小批数据上mini-batch of data进行训练随机梯度下降可以避免局部最小并获得更快的收敛。
from six.moves import zip# Set back to original data to retrain
ann_input.set_value(X_train)
ann_output.set_value(Y_train)# Tensors and RV that will be using mini-batches
minibatch_tensors [ann_input, ann_output]
minibatch_RVs [out]# Generator that returns mini-batches in each iteration
def create_minibatch(data):while True:# Return random data samples of set size 100 each iterationixs np.random.randint(len(data), size50)yield data[ixs]minibatches zip(create_minibatch(X_train),create_minibatch(Y_train),
)total_size len(Y_train)
%%timewith neural_network:# Run advi_minibatchv_params pm.variational.advi_minibatch(n50000, minibatch_tensorsminibatch_tensors, minibatch_RVsminibatch_RVs, minibatchesminibatches, total_sizetotal_size, learning_rate1e-2, epsilon1.0)
Average ELBO -123.12: 100%|██████████| 50000/50000 [00:1600:00, 3043.69it/s]
Finished minibatch ADVI: ELBO -121.76CPU times: user 20.5 s, sys: 371 ms, total: 20.9 s
Wall time: 23.9 swith neural_network: trace pm.variational.sample_vp(v_params, draws5000)
100%|██████████| 5000/5000 [00:0000:00, 9914.64it/s]mini-batch方式的运行时间更长但是收敛更快。
pm.traceplot(trace);