win7环境下CUDA7.5的装配、配置与测试（VS2010）

win7环境下CUDA7.5的安装、配置与测试（VS2010）

WIN7 + CUDA7.5 + VS2010

当前配置：
系统：WIN7 64位
开发平台：VS2010
显卡：Nvidia GeForce GT 750M
CUDA版本：7.5
第一步
https://developer.nvidia.com/cuda-downloads
官方网址下载cuda的最新版本，当前为7.5，选择本地的安装类型。

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第二步
运行安装程序，会出现如下图所示情况，选择安装路径，这里的路径可以直接使用默认的路径，在安装完成后会自动删除。

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第三步
检查系统的兼容性，如果系统不存在问题，那么正式的安装就可以开始了

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第四步

安装正式开始，同意并继续

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第五步
选择自定义安装，并勾选里面的所有选项

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第六步

设置三个安装的路径，这里可以直接使用默认值（应该也可以修改，但是本人安装过程中使用的是默认值），然后就是静静地等待安装完成了

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第七步
安装完成后，配置环境变量，首先我们会看到系统中已经自动生成了两个环境变量CUDA_PATH和CUDA_PATH_V7_5，我们还需要手动添加如下几个变量
（1）CUDA_SDK_PATH = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5
（2）CUDA_LIB_PATH = %CUDA_PATH%\lib\x64
（3） CUDA_BIN_PATH = %CUDA_PATH%\bin
（4）CUDA_SDK_BIN_PATH = %CUDA_SDK_PATH%\bin\x64
（5）CUDA_SDK_LIB_PATH = %CUDA_SDK_PATH%\common\lib\x64
最后，要在系统变量PATH的末尾添加
;%CUDA_LIB_PATH%;%CUDA_BIN_PATH%;%CUDA_SDK_LIB_PATH%;%CUDA_SDK_BIN_PATH%;

第八步
重启计算机，使环境变量生效

第九步
打开VS2010并建立一个win32控制台项目
win7环境下CUDA7.5的装配、配置与测试（VS2010）

在附加选项中，将“空项目”选中
win7环境下CUDA7.5的装配、配置与测试（VS2010）

第十步

右键源文件-> 添加 -> 新建项

win7环境下CUDA7.5的装配、配置与测试（VS2010）

选择CUDA C/C++ File，添加，即可看到源文件，sample.cu（名字自取）

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第十一步
右键工程 -> 生成自定义

win7环境下CUDA7.5的装配、配置与测试（VS2010）

在弹出的对话框中选择CUDA 7.5(.targets, .props)

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第十二步
右键项目 -> 属性 -> 配置属性 -> VC++目录
win7环境下CUDA7.5的装配、配置与测试（VS2010）

添加以下两个包含目录
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\common\inc
win7环境下CUDA7.5的装配、配置与测试（VS2010）

再添加以下两个库目录
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\common\lib\x64
win7环境下CUDA7.5的装配、配置与测试（VS2010）

第十三步
右键项目 -> 属性 -> 配置属性 ->连接器 -> 常规 -> 附加库目录
win7环境下CUDA7.5的装配、配置与测试（VS2010）

添加以下目录
$(CUDA_PATH_V7_5)\lib\$(Platform)
win7环境下CUDA7.5的装配、配置与测试（VS2010）

第十四步
右键项目 -> 属性 -> 配置属性 ->连接器 -> 输入 -> 附加依赖项
win7环境下CUDA7.5的装配、配置与测试（VS2010）

添加以下库
cublas.lib
cublas_device.lib
cuda.lib
cudadevrt.lib
cudart.lib
cudart_static.lib
cufft.lib
cufftw.lib
curand.lib
cusparse.lib
nppc.lib
nppi.lib
npps.lib
nvblas.lib (32位系统请勿附加此库!)
nvcuvenc.lib（这个库存在一些问题，如果编译出错直接删掉即可）
nvcuvid.lib
OpenCL.lib

win7环境下CUDA7.5的装配、配置与测试（VS2010）

第十五步
右键sample.cu -> 属性
win7环境下CUDA7.5的装配、配置与测试（VS2010）

设置项类型为CUDA C/C++
win7环境下CUDA7.5的装配、配置与测试（VS2010）

第十六步
打开配置管理器
win7环境下CUDA7.5的装配、配置与测试（VS2010）

在平台处选择新建

选择X64平台

至此，平台搭建完毕，可以运行代码进行测试了！

测试代码如下
/*
* Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
*
* Please refer to the NVIDIA end user license agreement (EULA) associated
* with this source code for terms and conditions that govern your use of
* this software. Any use, reproduction, disclosure, or distribution of
* this software and related documentation outside the terms of the EULA
* is strictly prohibited.
*
*/
/* This sample queries the properties of the CUDA devices present in the system via CUDA Runtime API. */

// Shared Utilities (QA Testing)

// std::system includes
#include <memory>
#include <iostream>

#include <cuda_runtime.h>
#include <helper_cuda.h>

int *pArgc = NULL;
char **pArgv = NULL;

#if CUDART_VERSION < 5000

// CUDA-C includes
#include <cuda.h>

// This function wraps the CUDA Driver API into a template function
template <class T>
inline void getCudaAttribute(T *attribute, CUdevice_attribute device_attribute, int device)
{
CUresult error = cuDeviceGetAttribute(attribute, device_attribute, device);

if (CUDA_SUCCESS != error)
{
fprintf(stderr, "cuSafeCallNoSync() Driver API error = %04d from file <%s>, line %i.\n",
error, __FILE__, __LINE__);

// cudaDeviceReset causes the driver to clean up all state. While
// not mandatory in normal operation, it is good practice. It is also
// needed to ensure correct operation when the application is being
// profiled. Calling cudaDeviceReset causes all profile data to be
// flushed before the application exits
cudaDeviceReset();
exit(EXIT_FAILURE);
}
}

#endif /* CUDART_VERSION < 5000 */

////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int
main(int argc, char **argv)
{
pArgc = &argc;
pArgv = argv;

printf("%s Starting...\n\n", argv[0]);
printf(" CUDA Device Query (Runtime API) version (CUDART static linking)\n\n");

int deviceCount = 0;
cudaError_t error_id = cudaGetDeviceCount(&deviceCount);

if (error_id != cudaSuccess)
{
printf("cudaGetDeviceCount returned %d\n-> %s\n", (int)error_id, cudaGetErrorString(error_id));
printf("Result = FAIL\n");
exit(EXIT_FAILURE);
}

// This function call returns 0 if there are no CUDA capable devices.
if (deviceCount == 0)
{
printf("There are no available device(s) that support CUDA\n");
}
else
{
printf("Detected %d CUDA Capable device(s)\n", deviceCount);
}

int dev, driverVersion = 0, runtimeVersion = 0;

for (dev = 0; dev < deviceCount; ++dev)
{
cudaSetDevice(dev);
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);

printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);

// Console log
cudaDriverGetVersion(&driverVersion);
cudaRuntimeGetVersion(&runtimeVersion);
printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n", driverVersion/1000, (driverVersion%100)/10, runtimeVersion/1000, (runtimeVersion%100)/10);
printf(" CUDA Capability Major/Minor version number: %d.%d\n", deviceProp.major, deviceProp.minor);

char msg[256];
SPRINTF(msg, " Total amount of global memory: %.0f MBytes (%llu bytes)\n",
(float)deviceProp.totalGlobalMem/1048576.0f, (unsigned long long) deviceProp.totalGlobalMem);
printf("%s", msg);

printf(" (%2d) Multiprocessors, (%3d) CUDA Cores/MP: %d CUDA Cores\n",
deviceProp.multiProcessorCount,
_ConvertSMVer2Cores(deviceProp.major, deviceProp.minor),
_ConvertSMVer2Cores(deviceProp.major, deviceProp.minor) * deviceProp.multiProcessorCount);
printf(" GPU Max Clock rate: %.0f MHz (%0.2f GHz)\n", deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f);

#if CUDART_VERSION >= 5000
// This is supported in CUDA 5.0 (runtime API device properties)
printf(" Memory Clock rate: %.0f Mhz\n", deviceProp.memoryClockRate * 1e-3f);
printf(" Memory Bus Width: %d-bit\n", deviceProp.memoryBusWidth);

if (deviceProp.l2CacheSize)
{
printf(" L2 Cache Size: %d bytes\n", deviceProp.l2CacheSize);
}

#else
// This only available in CUDA 4.0-4.2 (but these were only exposed in the CUDA Driver API)
int memoryClock;
getCudaAttribute<int>(&memoryClock, CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE, dev);
printf(" Memory Clock rate: %.0f Mhz\n", memoryClock * 1e-3f);
int memBusWidth;
getCudaAttribute<int>(&memBusWidth, CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH, dev);
printf(" Memory Bus Width: %d-bit\n", memBusWidth);
int L2CacheSize;
getCudaAttribute<int>(&L2CacheSize, CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE, dev);

if (L2CacheSize)
{
printf(" L2 Cache Size: %d bytes\n", L2CacheSize);
}

#endif

printf(" Maximum Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d, %d), 3D=(%d, %d, %d)\n",
deviceProp.maxTexture1D , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1],
deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
printf(" Maximum Layered 1D Texture Size, (num) layers 1D=(%d), %d layers\n",
deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1]);
printf(" Maximum Layered 2D Texture Size, (num) layers 2D=(%d, %d), %d layers\n",
deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);

printf(" Total amount of constant memory: %lu bytes\n", deviceProp.totalConstMem);
printf(" Total amount of shared memory per block: %lu bytes\n", deviceProp.sharedMemPerBlock);
printf(" Total number of registers available per block: %d\n", deviceProp.regsPerBlock);
printf(" Warp size: %d\n", deviceProp.warpSize);
printf(" Maximum number of threads per multiprocessor: %d\n", deviceProp.maxThreadsPerMultiProcessor);
printf(" Maximum number of threads per block: %d\n", deviceProp.maxThreadsPerBlock);
printf(" Max dimension size of a thread block (x,y,z): (%d, %d, %d)\n",
deviceProp.maxThreadsDim[0],
deviceProp.maxThreadsDim[1],
deviceProp.maxThreadsDim[2]);
printf(" Max dimension size of a grid size (x,y,z): (%d, %d, %d)\n",
deviceProp.maxGridSize[0],
deviceProp.maxGridSize[1],
deviceProp.maxGridSize[2]);
printf(" Maximum memory pitch: %lu bytes\n", deviceProp.memPitch);
printf(" Texture alignment: %lu bytes\n", deviceProp.textureAlignment);
printf(" Concurrent copy and kernel execution: %s with %d copy engine(s)\n", (deviceProp.deviceOverlap ? "Yes" : "No"), deviceProp.asyncEngineCount);
printf(" Run time limit on kernels: %s\n", deviceProp.kernelExecTimeoutEnabled ? "Yes" : "No");
printf(" Integrated GPU sharing Host Memory: %s\n", deviceProp.integrated ? "Yes" : "No");
printf(" Support host page-locked memory mapping: %s\n", deviceProp.canMapHostMemory ? "Yes" : "No");
printf(" Alignment requirement for Surfaces: %s\n", deviceProp.surfaceAlignment ? "Yes" : "No");
printf(" Device has ECC support: %s\n", deviceProp.ECCEnabled ? "Enabled" : "Disabled");
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
printf(" CUDA Device Driver Mode (TCC or WDDM): %s\n", deviceProp.tccDriver ? "TCC (Tesla Compute Cluster Driver)" : "WDDM (Windows Display Driver Model)");
#endif
printf(" Device supports Unified Addressing (UVA): %s\n", deviceProp.unifiedAddressing ? "Yes" : "No");
printf(" Device PCI Domain ID / Bus ID / location ID: %d / %d / %d\n", deviceProp.pciDomainID, deviceProp.pciBusID, deviceProp.pciDeviceID);

const char *sComputeMode[] =
{
"Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)",
"Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device)",
"Prohibited (no host thread can use ::cudaSetDevice() with this device)",
"Exclusive Process (many threads in one process is able to use ::cudaSetDevice() with this device)",
"Unknown",
NULL
};
printf(" Compute Mode:\n");
printf(" < %s >\n", sComputeMode[deviceProp.computeMode]);
}

// If there are 2 or more GPUs, query to determine whether RDMA is supported
if (deviceCount >= 2)
{
cudaDeviceProp prop[64];
int gpuid[64]; // we want to find the first two GPUs that can support P2P
int gpu_p2p_count = 0;

for (int i=0; i < deviceCount; i++)
{
checkCudaErrors(cudaGetDeviceProperties(&prop[i], i));

// Only boards based on Fermi or later can support P2P
if ((prop[i].major >= 2)
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
// on Windows (64-bit), the Tesla Compute Cluster driver for windows must be enabled to support this
&& prop[i].tccDriver
#endif
)
{
// This is an array of P2P capable GPUs
gpuid[gpu_p2p_count++] = i;
}
}

// Show all the combinations of support P2P GPUs
int can_access_peer;

if (gpu_p2p_count >= 2)
{
for (int i = 0; i < gpu_p2p_count; i++)
{
for (int j = 0; j < gpu_p2p_count; j++)
{
if (gpuid[i] == gpuid[j])
{
continue;
}
checkCudaErrors(cudaDeviceCanAccessPeer(&can_access_peer, gpuid[i], gpuid[j]));
printf("> Peer access from %s (GPU%d) -> %s (GPU%d) : %s\n", prop[gpuid[i]].name, gpuid[i],
prop[gpuid[j]].name, gpuid[j] ,
can_access_peer ? "Yes" : "No");
}
}
}
}

// csv masterlog info
// *****************************
// exe and CUDA driver name
printf("\n");
std::string sProfileString = "deviceQuery, CUDA Driver = CUDART";
char cTemp[16];

// driver version
sProfileString += ", CUDA Driver Version = ";
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 10, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
#else
sprintf(cTemp, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
#endif
sProfileString += cTemp;

// Runtime version
sProfileString += ", CUDA Runtime Version = ";
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 10, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
#else
sprintf(cTemp, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
#endif
sProfileString += cTemp;

// Device count
sProfileString += ", NumDevs = ";
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 10, "%d", deviceCount);
#else
sprintf(cTemp, "%d", deviceCount);
#endif
sProfileString += cTemp;

// Print Out all device Names
for (dev = 0; dev < deviceCount; ++dev)
{
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 13, ", Device%d = ", dev);
#else
sprintf(cTemp, ", Device%d = ", dev);
#endif
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
sProfileString += cTemp;
sProfileString += deviceProp.name;
}

sProfileString += "\n";
printf("%s", sProfileString.c_str());

printf("Result = PASS\n");

// finish
// cudaDeviceReset causes the driver to clean up all state. While
// not mandatory in normal operation, it is good practice. It is also
// needed to ensure correct operation when the application is being
// profiled. Calling cudaDeviceReset causes all profile data to be
// flushed before the application exits
cudaDeviceReset();
exit(EXIT_SUCCESS);
}
编译时会报错
LNK1123：转到COFF期间失败：文件无效或损坏
win7环境下CUDA7.5的装配、配置与测试（VS2010）

解决办法
项目属性 -> 清单工具 -> 输入和输出，将嵌入清单置为否即可
win7环境下CUDA7.5的装配、配置与测试（VS2010）

编译成功！

win7环境下CUDA7.5的装配、配置与测试（VS2010）

参考文章

http://blog.csdn.net/listening5/article/details/50240147

win7环境下CUDA7.5的装配、配置与测试（VS2010）

相关推荐