I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few days on this I thought I will summarize my step by step approach which worked for me
- Install C++ distribution. I did it via Visual Studio 2022 Installer and installing packages under “Desktop Development with C++” and checking the option “Windows 10 SDK (10.0.20348.0) as shown in this image (https://i.stack.imgur.com/vLDy7.png). Install the packages.
- Download and Install Nvidia CUDA Toolkit (https://developer.nvidia.com/cuda-downloads)
- Ensure that CUDA_PATH variable is set in your environment variables
- In Visual Studio Code, set the following environment variables
$env:CMAKE_ARGS=”-DLLAMA_CUBLAS=on”
$env:CUDACXX=”C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe” - Finally run pip install llama-cpp-python –no-cache-dir –force-reinstall –upgrade
Then when running the python program, you will see that BLAS is set to 1
(https://i.stack.imgur.com/iKIkV.png)
Hope it helps the community too!!!