TF使用GPU计算
=========
### 1. 检测显卡是否支持 CUDA
---------------------------------
### 2. 相关参考
----------------
TensorFlow之CNN和GPU
http://suanfazu.com/t/tensorflow-cnn-gpu/13216
cnn下,GPU是CPU的10倍以上,也就是CPU跑5W 5张图要100s,而在GPU上只要9s多。TensorFlow目前只支持nvidia卡
windows下安装Tensorflow+GPU加速
http://www.jianshu.com/p/c245d46d43f0
ubuntu16.04下安装TensorFlow(GPU加速)
http://blog.csdn.net/zhaoyu106/article/details/52793183
Linux教程2:
http://blog.csdn.net/zhaoyu106/article/details/52861268
AWS安装好驱动的镜像:
Windows Server 2012 with NVIDIA GRID GPU Driver
https://aws.amazon.com/marketplace/pp/B00FYCCNJ0/ref=portal_asin_url
Windows Server 2008 R2 with NVIDIA GRID GPU Driver
https://aws.amazon.com/marketplace/pp/B00FYCBRE2/ref=portal_asin_url
Amazon Linux AMI with NVIDIA GRID and TESLA GPU Driver
https://aws.amazon.com/marketplace/pp/B00FYCDDTE/ref=portal_asin_url
AWS官方镜像GPU安装指南:http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html
### 3. 示例环境
-------------
Python 3.5 (64位) + Tensorflow-GPU版本
(目前只有3.5版本的GPU编译好的包)
~~~
pip install tensorflow-gpu
~~~
卸载此前的 Tensorflow 版本
~~~
pip uninstall tensorflow
~~~
Win下完整的安装步骤:

### 4. 相关检测
-----------------

### 5. GPU云计算资源
-------------------------------
NV GPU云计算提供商: http://www.nvidia.cn/object/gpu-cloud-computing-services-cn.html
AWS机型:


AWS竞价示例降低成本:


实际经验来讲,一般弗吉尼亚、俄勒冈比较便宜

俄勒冈地区报价: (单位均为 每小时 费用)

弗吉尼亚报价:

竞价历史记录,了解可用时间:

建议使用的 Win镜像 (带有Desktop的Win)

设定使用价格策略:

请求成功提示:

请求成功后进入 实例 即可进行远程连接操作

### 5. 实际跑 MINST 图像识别 官方示例
-------------------------------------------
版本低于某一计算能力,被TF框架忽略
~~~
C:\Users\XUN-2\Desktop\TF-DEV\TF_models\tutorials\image\mnist>python convolution
al.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stre
am_executor\dso_loader.cc:135] successfully opened CUDA library cublas64_80.dll
locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stre
am_executor\dso_loader.cc:126] Couldn't open CUDA library cudnn64_5.dll
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stre
am_executor\cuda\cuda_dnn.cc:3517] Unable to load cuDNN DSO
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stre
am_executor\dso_loader.cc:135] successfully opened CUDA library cufft64_80.dll l
ocally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stre
am_executor\dso_loader.cc:135] successfully opened CUDA library nvcuda.dll local
ly
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stre
am_executor\dso_loader.cc:135] successfully opened CUDA library curand64_80.dll
locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "BestSplits" device_type: "CPU"') fo
r unknown op: BestSplits
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "CountExtremelyRandomStats" device_t
ype: "CPU"') for unknown op: CountExtremelyRandomStats
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "FinishedNodes" device_type: "CPU"')
for unknown op: FinishedNodes
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "GrowTree" device_type: "CPU"') for
unknown op: GrowTree
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "ReinterpretStringToFloat" device_ty
pe: "CPU"') for unknown op: ReinterpretStringToFloat
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "SampleInputs" device_type: "CPU"')
for unknown op: SampleInputs
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "ScatterAddNdim" device_type: "CPU"'
) for unknown op: ScatterAddNdim
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "TopNInsert" device_type: "CPU"') fo
r unknown op: TopNInsert
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "TopNRemove" device_type: "CPU"') fo
r unknown op: TopNRemove
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "TreePredictions" device_type: "CPU"
') for unknown op: TreePredictions
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\framework\op_kernel.cc:943] OpKernel ('op: "UpdateFertileSlots" device_type: "C
PU"') for unknown op: UpdateFertileSlots
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce 710M
major: 2 minor: 1 memoryClockRate (GHz) 1.55
pciBusID 0000:01:00.0
Total memory: 1.00GiB
Free memory: 969.91MiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core
\common_runtime\gpu\gpu_device.cc:948] Ignoring visible gpu device (device: 0, name: GeForce 710M, pci bus id: 0000:01:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
Initialized!
Step 0 (epoch 0.00), 21.5 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 303.1 ms
Minibatch loss: 3.254, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.8%
Step 200 (epoch 0.23), 303.9 ms
Minibatch loss: 3.373, learning rate: 0.010000
Minibatch error: 9.4%
Validation error: 4.3%
Step 300 (epoch 0.35), 304.5 ms
Minibatch loss: 3.150, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 3.1%
Step 400 (epoch 0.47), 304.7 ms
Minibatch loss: 3.192, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 2.9%
Step 500 (epoch 0.58), 314.1 ms
Minibatch loss: 3.176, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 2.4%
Step 600 (epoch 0.70), 307.6 ms
Minibatch loss: 3.115, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 2.0%
Step 700 (epoch 0.81), 319.4 ms
Minibatch loss: 2.969, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 2.1%
Step 800 (epoch 0.93), 327.6 ms
Minibatch loss: 3.072, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 2.1%
Step 900 (epoch 1.05), 323.4 ms
Minibatch loss: 2.898, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.5%
Step 1000 (epoch 1.16), 307.3 ms
Minibatch loss: 2.855, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.9%
Step 1100 (epoch 1.28), 291.2 ms
Minibatch loss: 2.824, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.5%
~~~