win7环境下CUDA7.5的装配、配置与测试(VS2010)

win7环境下CUDA7.5的安装、配置与测试(VS2010)
WIN7 + CUDA7.5 + VS2010
当前配置:
系统:WIN7 64位
开发平台:VS2010
显卡:Nvidia GeForce GT 750M
CUDA版本:7.5
第一步
https://developer.nvidia.com/cuda-downloads
官方网址下载cuda的最新版本,当前为7.5,选择本地的安装类型。
 
win7环境下CUDA7.5的装配、配置与测试(VS2010)



第二步
运行安装程序,会出现如下图所示情况,选择安装路径,这里的路径可以直接使用默认的路径,在安装完成后会自动删除。
 

win7环境下CUDA7.5的装配、配置与测试(VS2010)


第三步
检查系统的兼容性,如果系统不存在问题,那么正式的安装就可以开始了
 

win7环境下CUDA7.5的装配、配置与测试(VS2010)


第四步

安装正式开始,同意并继续

win7环境下CUDA7.5的装配、配置与测试(VS2010)

 
第五步
选择自定义安装,并勾选里面的所有选项
 

win7环境下CUDA7.5的装配、配置与测试(VS2010) 

win7环境下CUDA7.5的装配、配置与测试(VS2010)

第六步

设置三个安装的路径,这里可以直接使用默认值(应该也可以修改,但是本人安装过程中使用的是默认值),然后就是静静地等待安装完成了

win7环境下CUDA7.5的装配、配置与测试(VS2010)

 
第七步
安装完成后,配置环境变量,首先我们会看到系统中已经自动生成了两个环境变量CUDA_PATH和CUDA_PATH_V7_5,我们还需要手动添加如下几个变量
(1)CUDA_SDK_PATH = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5
(2)CUDA_LIB_PATH = %CUDA_PATH%\lib\x64  
(3) CUDA_BIN_PATH = %CUDA_PATH%\bin    
(4)CUDA_SDK_BIN_PATH = %CUDA_SDK_PATH%\bin\x64    
(5)CUDA_SDK_LIB_PATH = %CUDA_SDK_PATH%\common\lib\x64
最后,要在系统变量PATH的末尾添加
;%CUDA_LIB_PATH%;%CUDA_BIN_PATH%;%CUDA_SDK_LIB_PATH%;%CUDA_SDK_BIN_PATH%;


第八步
重启计算机,使环境变量生效


第九步
打开VS2010并建立一个win32控制台项目
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
在附加选项中,将“空项目”选中
 win7环境下CUDA7.5的装配、配置与测试(VS2010)


第十步

右键源文件-> 添加 -> 新建项

win7环境下CUDA7.5的装配、配置与测试(VS2010)

 
选择CUDA C/C++ File,添加,即可看到源文件,sample.cu(名字自取)
 
win7环境下CUDA7.5的装配、配置与测试(VS2010)

第十一步
右键工程 -> 生成自定义

 

win7环境下CUDA7.5的装配、配置与测试(VS2010)

在弹出的对话框中选择CUDA 7.5(.targets, .props)
 
win7环境下CUDA7.5的装配、配置与测试(VS2010)

第十二步
右键项目 -> 属性 -> 配置属性 -> VC++目录
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
添加以下两个包含目录
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include  
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\common\inc
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
再添加以下两个库目录
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64    
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.5\common\lib\x64
 win7环境下CUDA7.5的装配、配置与测试(VS2010)


第十三步
右键项目 -> 属性 -> 配置属性 ->连接器 -> 常规 -> 附加库目录
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
添加以下目录
$(CUDA_PATH_V7_5)\lib\$(Platform)
 win7环境下CUDA7.5的装配、配置与测试(VS2010)


第十四步
右键项目 -> 属性 -> 配置属性 ->连接器 -> 输入 -> 附加依赖项
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
添加以下库
cublas.lib    
cublas_device.lib   
cuda.lib    
cudadevrt.lib   
cudart.lib    
cudart_static.lib   
cufft.lib   
cufftw.lib   
curand.lib   
cusparse.lib   
nppc.lib   
nppi.lib   
npps.lib    
nvblas.lib (32位系统请勿附加此库!)   
nvcuvenc.lib(这个库存在一些问题,如果编译出错直接删掉即可)
nvcuvid.lib   
OpenCL.lib
 
win7环境下CUDA7.5的装配、配置与测试(VS2010)

第十五步
右键sample.cu -> 属性
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
设置项类型为CUDA C/C++
 win7环境下CUDA7.5的装配、配置与测试(VS2010)


第十六步
打开配置管理器
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
在平台处选择新建
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
选择X64平台
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
至此,平台搭建完毕,可以运行代码进行测试了!




测试代码如下
    /*
     * Copyright 1993-2015 NVIDIA Corporation.  All rights reserved.
     *
     * Please refer to the NVIDIA end user license agreement (EULA) associated
     * with this source code for terms and conditions that govern your use of
     * this software. Any use, reproduction, disclosure, or distribution of
     * this software and related documentation outside the terms of the EULA
     * is strictly prohibited.
     *
     */
    /* This sample queries the properties of the CUDA devices present in the system via CUDA Runtime API. */


    // Shared Utilities (QA Testing)


    // std::system includes
    #include <memory>
    #include <iostream>


    #include <cuda_runtime.h>
    #include <helper_cuda.h>






    int *pArgc = NULL;
    char **pArgv = NULL;


    #if CUDART_VERSION < 5000


    // CUDA-C includes
    #include <cuda.h>


    // This function wraps the CUDA Driver API into a template function
    template <class T>
    inline void getCudaAttribute(T *attribute, CUdevice_attribute device_attribute, int device)
    {
        CUresult error =    cuDeviceGetAttribute(attribute, device_attribute, device);


        if (CUDA_SUCCESS != error)
        {
            fprintf(stderr, "cuSafeCallNoSync() Driver API error = %04d from file <%s>, line %i.\n",
                    error, __FILE__, __LINE__);


            // cudaDeviceReset causes the driver to clean up all state. While
            // not mandatory in normal operation, it is good practice.  It is also
            // needed to ensure correct operation when the application is being
            // profiled. Calling cudaDeviceReset causes all profile data to be
            // flushed before the application exits
            cudaDeviceReset();
            exit(EXIT_FAILURE);
        }
    }


    #endif /* CUDART_VERSION < 5000 */


    ////////////////////////////////////////////////////////////////////////////////
    // Program main
    ////////////////////////////////////////////////////////////////////////////////
    int
    main(int argc, char **argv)
    {
        pArgc = &argc;
        pArgv = argv;


        printf("%s Starting...\n\n", argv[0]);
        printf(" CUDA Device Query (Runtime API) version (CUDART static linking)\n\n");


        int deviceCount = 0;
        cudaError_t error_id = cudaGetDeviceCount(&deviceCount);


        if (error_id != cudaSuccess)
        {
            printf("cudaGetDeviceCount returned %d\n-> %s\n", (int)error_id, cudaGetErrorString(error_id));
            printf("Result = FAIL\n");
            exit(EXIT_FAILURE);
        }


        // This function call returns 0 if there are no CUDA capable devices.
        if (deviceCount == 0)
        {
            printf("There are no available device(s) that support CUDA\n");
        }
        else
        {
            printf("Detected %d CUDA Capable device(s)\n", deviceCount);
        }


        int dev, driverVersion = 0, runtimeVersion = 0;


        for (dev = 0; dev < deviceCount; ++dev)
        {
            cudaSetDevice(dev);
            cudaDeviceProp deviceProp;
            cudaGetDeviceProperties(&deviceProp, dev);


            printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);


            // Console log
            cudaDriverGetVersion(&driverVersion);
            cudaRuntimeGetVersion(&runtimeVersion);
            printf("  CUDA Driver Version / Runtime Version          %d.%d / %d.%d\n", driverVersion/1000, (driverVersion%100)/10, runtimeVersion/1000, (runtimeVersion%100)/10);
            printf("  CUDA Capability Major/Minor version number:    %d.%d\n", deviceProp.major, deviceProp.minor);


            char msg[256];
            SPRINTF(msg, "  Total amount of global memory:                 %.0f MBytes (%llu bytes)\n",
                    (float)deviceProp.totalGlobalMem/1048576.0f, (unsigned long long) deviceProp.totalGlobalMem);
            printf("%s", msg);


            printf("  (%2d) Multiprocessors, (%3d) CUDA Cores/MP:     %d CUDA Cores\n",
                   deviceProp.multiProcessorCount,
                   _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor),
                   _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor) * deviceProp.multiProcessorCount);
            printf("  GPU Max Clock rate:                            %.0f MHz (%0.2f GHz)\n", deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f);




    #if CUDART_VERSION >= 5000
            // This is supported in CUDA 5.0 (runtime API device properties)
            printf("  Memory Clock rate:                             %.0f Mhz\n", deviceProp.memoryClockRate * 1e-3f);
            printf("  Memory Bus Width:                              %d-bit\n",   deviceProp.memoryBusWidth);


            if (deviceProp.l2CacheSize)
            {
                printf("  L2 Cache Size:                                 %d bytes\n", deviceProp.l2CacheSize);
            }


    #else
            // This only available in CUDA 4.0-4.2 (but these were only exposed in the CUDA Driver API)
            int memoryClock;
            getCudaAttribute<int>(&memoryClock, CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE, dev);
            printf("  Memory Clock rate:                             %.0f Mhz\n", memoryClock * 1e-3f);
            int memBusWidth;
            getCudaAttribute<int>(&memBusWidth, CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH, dev);
            printf("  Memory Bus Width:                              %d-bit\n", memBusWidth);
            int L2CacheSize;
            getCudaAttribute<int>(&L2CacheSize, CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE, dev);


            if (L2CacheSize)
            {
                printf("  L2 Cache Size:                                 %d bytes\n", L2CacheSize);
            }


    #endif


            printf("  Maximum Texture Dimension Size (x,y,z)         1D=(%d), 2D=(%d, %d), 3D=(%d, %d, %d)\n",
                   deviceProp.maxTexture1D   , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1],
                   deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
            printf("  Maximum Layered 1D Texture Size, (num) layers  1D=(%d), %d layers\n",
                   deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1]);
            printf("  Maximum Layered 2D Texture Size, (num) layers  2D=(%d, %d), %d layers\n",
                   deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);




            printf("  Total amount of constant memory:               %lu bytes\n", deviceProp.totalConstMem);
            printf("  Total amount of shared memory per block:       %lu bytes\n", deviceProp.sharedMemPerBlock);
            printf("  Total number of registers available per block: %d\n", deviceProp.regsPerBlock);
            printf("  Warp size:                                     %d\n", deviceProp.warpSize);
            printf("  Maximum number of threads per multiprocessor:  %d\n", deviceProp.maxThreadsPerMultiProcessor);
            printf("  Maximum number of threads per block:           %d\n", deviceProp.maxThreadsPerBlock);
            printf("  Max dimension size of a thread block (x,y,z): (%d, %d, %d)\n",
                   deviceProp.maxThreadsDim[0],
                   deviceProp.maxThreadsDim[1],
                   deviceProp.maxThreadsDim[2]);
            printf("  Max dimension size of a grid size    (x,y,z): (%d, %d, %d)\n",
                   deviceProp.maxGridSize[0],
                   deviceProp.maxGridSize[1],
                   deviceProp.maxGridSize[2]);
            printf("  Maximum memory pitch:                          %lu bytes\n", deviceProp.memPitch);
            printf("  Texture alignment:                             %lu bytes\n", deviceProp.textureAlignment);
            printf("  Concurrent copy and kernel execution:          %s with %d copy engine(s)\n", (deviceProp.deviceOverlap ? "Yes" : "No"), deviceProp.asyncEngineCount);
            printf("  Run time limit on kernels:                     %s\n", deviceProp.kernelExecTimeoutEnabled ? "Yes" : "No");
            printf("  Integrated GPU sharing Host Memory:            %s\n", deviceProp.integrated ? "Yes" : "No");
            printf("  Support host page-locked memory mapping:       %s\n", deviceProp.canMapHostMemory ? "Yes" : "No");
            printf("  Alignment requirement for Surfaces:            %s\n", deviceProp.surfaceAlignment ? "Yes" : "No");
            printf("  Device has ECC support:                        %s\n", deviceProp.ECCEnabled ? "Enabled" : "Disabled");
    #if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
            printf("  CUDA Device Driver Mode (TCC or WDDM):         %s\n", deviceProp.tccDriver ? "TCC (Tesla Compute Cluster Driver)" : "WDDM (Windows Display Driver Model)");
    #endif
            printf("  Device supports Unified Addressing (UVA):      %s\n", deviceProp.unifiedAddressing ? "Yes" : "No");
            printf("  Device PCI Domain ID / Bus ID / location ID:   %d / %d / %d\n", deviceProp.pciDomainID, deviceProp.pciBusID, deviceProp.pciDeviceID);


            const char *sComputeMode[] =
            {
                "Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)",
                "Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device)",
                "Prohibited (no host thread can use ::cudaSetDevice() with this device)",
                "Exclusive Process (many threads in one process is able to use ::cudaSetDevice() with this device)",
                "Unknown",
                NULL
            };
            printf("  Compute Mode:\n");
            printf("     < %s >\n", sComputeMode[deviceProp.computeMode]);
        }


        // If there are 2 or more GPUs, query to determine whether RDMA is supported
        if (deviceCount >= 2)
        {
            cudaDeviceProp prop[64];
            int gpuid[64]; // we want to find the first two GPUs that can support P2P
            int gpu_p2p_count = 0;


            for (int i=0; i < deviceCount; i++)
            {
                checkCudaErrors(cudaGetDeviceProperties(&prop[i], i));


                // Only boards based on Fermi or later can support P2P
                if ((prop[i].major >= 2)
    #if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
                    // on Windows (64-bit), the Tesla Compute Cluster driver for windows must be enabled to support this
                    && prop[i].tccDriver
    #endif
                   )
                {
                    // This is an array of P2P capable GPUs
                    gpuid[gpu_p2p_count++] = i;
                }
            }


            // Show all the combinations of support P2P GPUs
            int can_access_peer;


            if (gpu_p2p_count >= 2)
            {
                for (int i = 0; i < gpu_p2p_count; i++)
                {
                    for (int j = 0; j < gpu_p2p_count; j++)
                    {
                        if (gpuid[i] == gpuid[j])
                        {
                            continue;
                        }
                        checkCudaErrors(cudaDeviceCanAccessPeer(&can_access_peer, gpuid[i], gpuid[j]));
                            printf("> Peer access from %s (GPU%d) -> %s (GPU%d) : %s\n", prop[gpuid[i]].name, gpuid[i],
                               prop[gpuid[j]].name, gpuid[j] ,
                               can_access_peer ? "Yes" : "No");
                    }
                }
            }
        }


        // csv masterlog info
        // *****************************
        // exe and CUDA driver name
        printf("\n");
        std::string sProfileString = "deviceQuery, CUDA Driver = CUDART";
        char cTemp[16];


        // driver version
        sProfileString += ", CUDA Driver Version = ";
    #if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
        sprintf_s(cTemp, 10, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
    #else
        sprintf(cTemp, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
    #endif
        sProfileString +=  cTemp;


        // Runtime version
        sProfileString += ", CUDA Runtime Version = ";
    #if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
        sprintf_s(cTemp, 10, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
    #else
        sprintf(cTemp, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
    #endif
        sProfileString +=  cTemp;


        // Device count
        sProfileString += ", NumDevs = ";
    #if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
        sprintf_s(cTemp, 10, "%d", deviceCount);
    #else
        sprintf(cTemp, "%d", deviceCount);
    #endif
        sProfileString += cTemp;


        // Print Out all device Names
        for (dev = 0; dev < deviceCount; ++dev)
        {
    #if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
            sprintf_s(cTemp, 13, ", Device%d = ", dev);
    #else
            sprintf(cTemp, ", Device%d = ", dev);
    #endif
            cudaDeviceProp deviceProp;
            cudaGetDeviceProperties(&deviceProp, dev);
            sProfileString += cTemp;
            sProfileString += deviceProp.name;
        }


        sProfileString += "\n";
        printf("%s", sProfileString.c_str());


        printf("Result = PASS\n");


        // finish
        // cudaDeviceReset causes the driver to clean up all state. While
        // not mandatory in normal operation, it is good practice.  It is also
        // needed to ensure correct operation when the application is being
        // profiled. Calling cudaDeviceReset causes all profile data to be
        // flushed before the application exits
        cudaDeviceReset();
        exit(EXIT_SUCCESS);
    }
编译时会报错 
LNK1123:转到COFF期间失败:文件无效或损坏
 win7环境下CUDA7.5的装配、配置与测试(VS2010)
解决办法
项目属性 -> 清单工具 -> 输入和输出,将嵌入清单置为否即可
 win7环境下CUDA7.5的装配、配置与测试(VS2010)


编译成功!

win7环境下CUDA7.5的装配、配置与测试(VS2010) 


参考文章

http://blog.csdn.net/listening5/article/details/50240147