前言#

本文配置环境如下

1
GPU：RTX5060 Laptop
2
CPU：AMD Ryzen9 8945HX
3
System：Windows 11

由于系统环境配置极其复杂和相关文档很少很少，而且对于新显卡（RTX 50系列），市面上常规方案并不能很好的适配，所以笔者研究出本文通用，百分百解决问题的方案以供参考

错误原因和错误重现#

在YOLO训练中，我们通过使用Ultralytics的YOLO模型来进行训练，以下是测试训练项目架构

1
yolo/
2
│
3
├─ .venv/                # 虚拟环境
4
├─ dataset/              # 数据集
5
│   ├─ images/
6
│   │   ├─ train/
7
│   │   └─ val/
8
│   └─ labels/
9
│       ├─ train/
10
│       └─ val/
11
│
12
├─ models/               # 保存训练后的模型
13
│
14
├─ configs/
15
│   └─ data.yaml      # 数据集配置
16
│
17
├─ train.py              # 训练脚本
18
├─ detect.py             # 推理脚本
19
├─ .gitignore
20
└─ README.md

训练脚本

1
# ==============================================================
2
# File: train
3
# Author: Frees Ling
4
# Created: 2026/3/8
5
# Description:
6
# Version: 1.0
7
# ==============================================================
8
# from ultralytics import YOLO
9
# import torch
10
#
11
# def main():
12
#     # 检查 GPU
13
#     print("CUDA available:", torch.cuda.is_available())
14
#     if torch.cuda.is_available():
15
#         print("GPU:", torch.cuda.get_device_name(0))
16
#
17
#     # 加载预训练模型（推荐从小模型开始）
18
#     model = YOLO("yolov8n.pt")
19
#
20
#     # 开始训练
21
#     results = model.train(
22
#         data="data.yaml",      # 数据集配置文件
23
#         epochs=50,             # 训练轮数
24
#         imgsz=640,             # 图片尺寸
25
#         batch=16,              # batch size
26
#         device=0,              # 使用GPU (0表示第一张显卡)
27
#         workers=8,             # 数据加载线程
28
#         project="runs/train",  # 输出目录
29
#         name="yolo_custom",    # 本次训练名称
30
#         cache=True,            # 缓存数据集
31
#         amp=True               # 混合精度训练（GPU会更快）3
32
#     )
33
#
34
# if __name__ == "__main__":
35
#     main()
36
from ultralytics import  YOLO
37

38
model = YOLO("yolov8n.pt")
39

40
results = model.train(
41
    data = "data.yaml",
42
    epochs = 10,
43
    imgsz = 640,
44
    batch = 16,
45
    device = 0,
46
    workers = 12,
47
    project = "runs/train",
48
    name = "test",
49
    cache = True,
50
    amp = True
51
    # batch = 24
52
    # workers = 12
53
    # imgsz = 640
54
    # cache = True
55
    # amp = True
56
)

按照一般教程的环境安装方法，是下面这样的

1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

测试是这么做的

1
import torch
2

3
print(torch.__version__)
4
print(torch.cuda.is_available())
5
print(torch.cuda.get_device_name(0))

如果输出是这样的，那么就完成了

1
True
2
NVIDIA GeForce RTX 5060 Laptop GPU

但是事实上，我们的输出还真的是这样的，但是这种方法只适用于以往的显卡，并不适用新显卡（50系列），所以训练的时候会报以下错误

1
C:\Users\lenovo\Desktop\YOLO.venv\Scripts\python.exe C:\Users\lenovo\Desktop\YOLO\train.py
2
WARNING torchvision==0.20 is incompatible with torch==2.6.
3
Run 'pip install torchvision==0.21' to fix torchvision or 'pip install -U torch torchvision' to update both.
4
For a full compatibility table see https://github.com/pytorch/vision#installation
5
C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\torch\cuda_init_.py:235: UserWarning:
6
NVIDIA GeForce RTX 5060 Laptop GPU with CUDA capability sm_120 is not compatible with the current PyTorch installation.
7
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
8
If you want to use the NVIDIA GeForce RTX 5060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
9
warnings.warn(
10
Ultralytics 8.4.21 Python-3.10.11 torch-2.6.0.dev20241112+cu121 CUDA:0 (NVIDIA GeForce RTX 5060 Laptop GPU, 8151MiB)
11
engine\trainer: agnostic_nms=False, amp=True, angle=1.0, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=True, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, end2end=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=0.0, name=test, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=runs/train, rect=False, resume=False, retina_masks=False, rle=1.0, save=True, save_conf=False, save_crop=False, save_dir=C:\Users\lenovo\Desktop\YOLO\runs\detect\runs\train\test, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=True, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=12, workspace=None
12
Overriding model.yaml nc=80 with nc=1
13
               from  n    params  module                                       arguments
14
0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
15
1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
16
2 -1 1 7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True]
17
3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
18
4 -1 2 49664 ultralytics.nn.modules.block.C2f [64, 64, 2, True]
19
5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
20
6 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
21
7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
22
8 -1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True]
23
9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
24
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
25
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
26
12 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1]
27
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
28
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
29
15 -1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1]
30
16 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
31
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
32
18 -1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1]
33
19 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
34
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
35
21 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1]
36
22 [15, 18, 21] 1 751507 ultralytics.nn.modules.head.Detect [1, 16, None, [64, 128, 256]]
37
Model summary: 130 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs
38
Transferred 319/355 items from pretrained weights
39
Traceback (most recent call last):
40
File "C:\Users\lenovo\Desktop\YOLO\train.py", line 40, in <module>
41
results = model.train(
42
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\ultralytics\engine\model.py", line 777, in train
43
self.trainer.train()
44
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\ultralytics\engine\trainer.py", line 244, in train
45
self._do_train()
46
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\ultralytics\engine\trainer.py", line 366, in _do_train
47
self._setup_train()
48
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\ultralytics\engine\trainer.py", line 295, in _setup_train
49
self.model = self.model.to(self.device)
50
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\torch\nn\modules\module.py", line 1344, in to
51
return self._apply(convert)
52
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\ultralytics\nn\tasks.py", line 288, in _apply
53
self = super()._apply(fn)
54
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\torch\nn\modules\module.py", line 904, in _apply
55
module._apply(fn)
56
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\torch\nn\modules\module.py", line 904, in _apply
57
module._apply(fn)
58
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\torch\nn\modules\module.py", line 904, in _apply
59
module._apply(fn)
60
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\torch\nn\modules\module.py", line 931, in _apply
61
param_applied = fn(param)
62
File "C:\Users\lenovo\Desktop\YOLO.venv\lib\site-packages\torch\nn\modules\module.py", line 1330, in convert
63
return t.to(
64
RuntimeError: CUDA error: no kernel image is available for execution on the device
65
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
66
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
67
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.</module>
68
进程已结束，退出代码为 1

这个错误报的也很明确，就是显卡太新了，你的CUDA和Pytorch版本不匹配

有意思的是，这个问题的解决方案我居然什么地方都找不到

在苦苦折磨了两天之后，我开始思考，这个问题为什么会无法解决，以至于不论什么AI都无法给到我完美的GPU训练方案，基本上都是让我使用CPU训练

CPU训练，训练一轮在十万大体量数据集里需要至少五个小时，甚至更久，仅仅五十轮迭代就需要十天，所以寻找GPU训练的方案刻不容缓

题外话：明明问题已经很明确了，但是迟迟给不到有效的解决方案，这也说明了当代大众AI的一个弊端，并不具有独立思考能力，但是一想，也蛮恐怖的，如果真的会自己思考了，未来，会变得怎么样呢？

解决方案#

严格注意：本解决方案目前完美适用RTX50系列的显卡，并且一定要严格按照教程执行，否则容易引发烧显卡等非常严重的问题

首先根据错误报告理解为什么会报错，在YOLO的训练中，我们不难发现，我们所用的显卡对应安装的CUDA版本和Python安装的PyTorch版本是一一对应的，所以这既是解决办法也是问题的根源所在，我们下载的PyTorch版本和CUDA版本必须一一对应 以下是相关网站链接

PyTorch官网

PyTorch
根据自身电脑安装对应的PyTorch版本

如果找不到对应版本，还可以试一试安装旧版本↓

PyTorch旧版本

例如：
因为我下载的CUDA版本是3.0，所以我安装的PyTorch是13.0，这个不懂可以问AI

CUDA官网

CUDA Toolkit Archive
根据你自身的电脑安装对应的CUDA版本

例如：
我的电脑版本是Windows 11，RTX5060 Laptop，那么我需要下载的CUDA版本是CUDA 3.0

对于50系列的显卡，可能还需要安装一些驱动（更新），比如2026新的爆款游戏《生化危机：安魂曲》就需要更新驱动来支持

NVIDIA驱动下载如果英语不好，当然也有中文网站
NVIDIA驱动下载（中文）

GeGForce 驱动程序
根据自身电脑填表选择需要安装的驱动

这里值得注意的是，在2026年初，出现了新版本驱动更新后烧显卡的情况，这里驱动选择需要非常注意，以下视频含有各个驱动更新的内容以及BUG，可以参考一二再下载所需要的驱动更新

NVIDIA驱动更新视频

后续训练#

可以通过以下代码来参考是否完成训练（直接贴上去了，懒得改了，但是已经改成测试版本了）

1
# ==============================================================
2
# File: train
3
# Author: Frees Ling
4
# Created: 2026/3/8
5
# Description: Robust training script that falls back to CPU if GPU CUDA
6
#              kernels are incompatible with the installed PyTorch.
7
# Version: 1.1
8
# ==============================================================
9
from ultralytics import YOLO
10
import torch
11
import traceback
12
import sys
13

14

15
def train():
16
    # Report CUDA availability and device details
17
    print("torch.__version__:", torch.__version__)
18
    cuda_available = torch.cuda.is_available()
19
    print("CUDA available:", cuda_available)
20
    if cuda_available:
21
        try:
22
            name = torch.cuda.get_device_name(0)
23
        except Exception:
24
            name = "<unknown>"
25
        try:
26
            cap = torch.cuda.get_device_capability(0)
27
        except Exception:
28
            cap = None
29
        print(f"GPU: {name} compute_capability={cap}")
30

31
    model = YOLO("yolov8n.pt")
32

33
    # Default training args (attempt GPU first if available)
34
    train_args = dict(
35
        data="data.yaml",
36
        epochs=1,#测试
37
        imgsz=640,
38
        batch=16,
39
        device=0 if cuda_available else 'cpu',
40
        workers=12,
41
        project="runs/train",
42
        name="test",
43
        cache='disk',
44
        amp=True if cuda_available else False,
45
    )
46

47
    print("Training args:", train_args)
48

49
    try:
50
        print("Starting training...")
51
        results = model.train(**train_args)
52
        print("Training finished successfully.")
53
        return results
54
    except Exception as e:
55
        # Inspect exception to decide whether to retry on CPU
56
        err_str = str(e)
57
        print("Training failed with exception:", err_str)
58

59
        # Heuristics to detect CUDA / kernel compatibility errors
60
        cuda_error_indicators = [
61
            'no kernel image',
62
            'not compatible',
63
            'cuda capability',
64
            'CUDA error',
65
            'cudaErrorNoKernelImageForDevice',
66
            'AcceleratorError',
67
        ]
68

69
        if any(ind.lower() in err_str.lower() for ind in cuda_error_indicators):
70
            print("Detected a CUDA compatibility/kernel error. Retrying on CPU with amp disabled...")
71
            train_args['device'] = 'cpu'
72
            train_args['amp'] = False
73
            # Lower workers on CPU to avoid too many threads (optional)
74
            if train_args.get('workers', 0) > 4:
75
                train_args['workers'] = 4
76
            print("Retry Training args:", train_args)
77
            try:
78
                results = model.train(**train_args)
79
                print("CPU training finished successfully.")
80
                return results
81
            except Exception as e2:
82
                print("Retry on CPU also failed:", e2)
83
                traceback.print_exc()
84
                sys.exit(1)
85
        else:
86
            # Not a recognized CUDA issue - re-raise with traceback
87
            traceback.print_exc()
88
            sys.exit(1)
89

90

91
if __name__ == "__main__":
92
    train()

如果成功，应该会显示类似输出↓

1
C:\Users\Lenovo\Desktop\Code\YOLO\.venv\Scripts\python.exe C:\Users\Lenovo\Desktop\Code\YOLO\train.py
2
torch.__version__: 2.10.0+cu130
3
CUDA available: True
4
GPU: NVIDIA GeForce RTX 5070 Ti compute_capability=(12, 0)
5
Training args: {'data': 'data.yaml', 'epochs': 50, 'imgsz': 640, 'batch': 16, 'device': 0, 'workers': 12, 'project': 'runs/train', 'name': 'test', 'cache': 'disk', 'amp': True}
6
Starting training...
7
Ultralytics 8.4.21  Python-3.10.11 torch-2.10.0+cu130 CUDA:0 (NVIDIA GeForce RTX 5070 Ti, 16303MiB)
8
engine\trainer: agnostic_nms=False, amp=True, angle=1.0, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=disk, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, end2end=None, epochs=50, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=0.0, name=test, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=runs/train, rect=False, resume=False, retina_masks=False, rle=1.0, save=True, save_conf=False, save_crop=False, save_dir=C:\Users\Lenovo\Desktop\Code\YOLO\runs\detect\runs\train\test, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=True, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=12, workspace=None
9
Overriding model.yaml nc=80 with nc=1
10

11
                   from  n    params  module                                       arguments
12
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
13
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
14
  2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
15
  3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
16
  4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
17
  5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
18
  6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
19
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
20
  8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
21
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
22
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
23
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]
24
 12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
25
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
26
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]
27
 15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
28
 16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
29
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
30
 18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
31
 19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
32
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
33
 21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
34
 22        [15, 18, 21]  1    751507  ultralytics.nn.modules.head.Detect           [1, 16, None, [64, 128, 256]]
35
Model summary: 130 layers, 3,011,043 parameters, 3,011,027 gradients, 8.2 GFLOPs
36

37
Transferred 319/355 items from pretrained weights
38
Freezing layer 'model.22.dfl.conv.weight'
39
AMP: running Automatic Mixed Precision (AMP) checks...
40
AMP: checks passed
41
train: Fast image access  (ping: 0.10.1 ms, read: 831.1733.7 MB/s, size: 72.1 KB)
42
train: Scanning C:\Users\Lenovo\Desktop\Code\YOLO\train\labels.cache... 98798 images, 140 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 98798/98798  0.0s
43
train: Caching images (69.5GB Disk): 100% ━━━━━━━━━━━━ 98798/98798 15.2Kit/s 6.5s
44
val: Fast image access  (ping: 0.00.0 ms, read: 331.4105.1 MB/s, size: 18.5 KB)
45
val: Scanning C:\Users\Lenovo\Desktop\Code\YOLO\valid\labels.cache... 2048 images, 3 backgrounds, 0 corrupt: 100% ━━━━━━━━━━━━ 2048/2048  0.0s
46
val: Caching images (1.5GB Disk): 100% ━━━━━━━━━━━━ 2048/2048 11.3Kit/s 0.2s
47
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
48
optimizer: MuSGD(lr=0.01, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
49
Plotting labels to C:\Users\Lenovo\Desktop\Code\YOLO\runs\detect\runs\train\test\labels.jpg...
50
Image sizes 640 train, 640 val
51
Using 12 dataloader workers
52
Logging results to C:\Users\Lenovo\Desktop\Code\YOLO\runs\detect\runs\train\test
53
Starting training for 50 epochs...
54

55
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
56
       1/50      2.07G      1.149     0.9735      1.119         32        640: 100% ━━━━━━━━━━━━ 6175/6175 4.1it/s 25:10
57
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 64/64 6.0it/s 10.6s
58
                   all       2048       2195      0.971      0.925      0.962      0.656
59

60
......

以上，就是关于YOLO训练中新显卡存在的问题解决办法