活动介绍

yolov5报错RuntimeError: Numpy is not available

时间: 2025-08-02 13:00:59 AIGC 浏览: 48
### 解决 YOLOv5 报错 `RuntimeError: Numpy is not available` 在使用 YOLOv5 时,如果遇到 `RuntimeError: Numpy is not available` 的错误,这通常与 NumPy 库的版本不兼容或安装问题有关。以下是详细的解决方案: 1. **检查 NumPy 安装** 首先确保 NumPy 已正确安装。可以通过以下命令验证: ```bash pip show numpy ``` 如果 NumPy 未安装,请使用以下命令安装: ```bash pip install numpy ``` 2. **降级 NumPy 版本** 如果 NumPy 的版本过新,可能会导致与 YOLOv5 的兼容性问题。YOLOv5 官方推荐使用 NumPy 1.18.5 或 1.20.x 版本。可以通过以下命令降级 NumPy: ```bash pip install numpy==1.20.3 ``` 3. **检查环境配置** 确保 YOLOv5 所依赖的其他库(如 PyTorch)与当前环境中的 NumPy 版本兼容。可以通过以下命令安装 YOLOv5 的依赖: ```bash pip install -r requirements.txt ``` 4. **修复代码中的 NumPy 引用** 在某些情况下,YOLOv5 的代码中可能会使用已弃用或删除的 NumPy 属性(如 `np.int`)。需要手动将这些引用更改为 `np.int_`。例如: ```python # 错误写法 import numpy as np data = np.int(10) # 正确写法 import numpy as np data = np.int_(10) ``` 这种修改可能需要在多个文件中进行,确保所有涉及 `np.int` 的地方都已更新。 5. **清理和重建环境** 如果上述方法无法解决问题,可以尝试清理当前的 Python 环境并重新安装依赖。使用虚拟环境(如 `conda` 或 `venv`)可以有效隔离依赖并避免冲突。以下是使用 `conda` 的示例: ```bash conda create -n yolov5_env python=3.8 conda activate yolov5_env pip install -r requirements.txt ``` 6. **检查硬件和驱动支持** 如果错误与 GPU 相关(如 `cuDNN` 或 `CUDA` 问题),请确保已正确安装 NVIDIA 驱动和相关的深度学习框架。可以通过以下命令检查 CUDA 和 cuDNN 的版本: ```bash nvcc --version cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 ``` 通过以上步骤,应该可以解决 YOLOv5 中因 NumPy 不可用而导致的运行时错误。如果问题仍然存在,建议检查 YOLOv5 的官方文档或社区讨论以获取更多支持。
阅读全文

相关推荐

Traceback (most recent call last): File "D:\Download\codeseg\code\demo_test_image.py", line 100, in <module> model.load_model("./weights/yolov8s-seg.pt") File "D:\Download\codeseg\code\model.py", line 57, in load_model self.model(torch.zeros(1, 3, *[self.imgsz] * 2).to(self.device). File "D:\Download\codeseg\code\ultralytics\engine\model.py", line 102, in __call__ return self.predict(source, stream, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Download\codeseg\code\ultralytics\engine\model.py", line 243, in predict return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Download\codeseg\code\ultralytics\engine\predictor.py", line 196, in __call__ return list(self.stream_inference(source, model, *args, **kwargs)) # merge list of Result into one ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\20126\.conda\envs\pytorch\Lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context response = gen.send(None) ^^^^^^^^^^^^^^ File "D:\Download\codeseg\code\ultralytics\engine\predictor.py", line 263, in stream_inference self.results = self.postprocess(preds, im, im0s) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Download\codeseg\code\ultralytics\models\yolo\segment\predict.py", line 42, in postprocess orig_imgs = ops.convert_torch2numpy_batch(orig_imgs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Download\codeseg\code\ultralytics\utils\ops.py", line 785, in convert_torch2numpy_batch return (batch.permute(0, 2, 3, 1).contiguous() * 255).clamp(0, 255).to(torch.uint8).cpu().numpy() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Numpy is not available

(yolov11) dell@dell-Precision-7920-Tower:~/ultralytics$ yolo predict model=yolo11s-seg.pt source='bus.jpg' A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2. Traceback (most recent call last): File "/home/dell/anaconda3/envs/yolov11/bin/yolo", line 5, in <module> from ultralytics.cfg import entrypoint File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/__init__.py", line 11, in <module> from ultralytics.models import NAS, RTDETR, SAM, YOLO, YOLOE, FastSAM, YOLOWorld File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/models/__init__.py", line 3, in <module> from .fastsam import FastSAM File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/models/fastsam/__init__.py", line 3, in <module> from .model import FastSAM File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/models/fastsam/model.py", line 6, in <module> from ultralytics.engine.model import Model File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/engine/model.py", line 12, in <module> from ultralytics.engine.results import Results File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/engine/results.py", line 16, in <module> from ultralytics.data.augment import LetterBox File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/data/__init__.py", line 3, in <module> from .base import BaseDataset File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/data/base.py", line 16, in <module> from ultralytics.data.utils import FORMATS_HELP_MSG, HELP_URL, IMG_FORMATS, check_file_speeds File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/data/utils.py", line 18, in <module> from ultralytics.nn.autobackend import check_class_names File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/nn/__init__.py", line 3, in <module> from .tasks import ( File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 13, in <module> from ultralytics.nn.autobackend import check_class_names File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/nn/autobackend.py", line 70, in <module> class AutoBackend(nn.Module): File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/nn/autobackend.py", line 138, in AutoBackend device: torch.device = torch.device("cpu"), /home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/nn/autobackend.py:138: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) device: torch.device = torch.device("cpu"), Ultralytics 8.3.174 🚀 Python-3.10.18 torch-2.0.0+cu118 CUDA:0 (NVIDIA GeForce RTX 4070, 12001MiB) YOLO11s-seg summary (fused): 113 layers, 10,097,776 parameters, 0 gradients, 35.5 GFLOPs Traceback (most recent call last): File "/home/dell/anaconda3/envs/yolov11/bin/yolo", line 8, in <module> sys.exit(entrypoint()) File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/cfg/__init__.py", line 983, in entrypoint getattr(model, mode)(**overrides) # default args from model File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/engine/model.py", line 555, in predict return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream) File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 247, in predict_cli for _ in gen: # sourcery skip: remove-empty-nested-block, noqa File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 326, in stream_inference im = self.preprocess(im0s) File "/home/dell/anaconda3/envs/yolov11/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 167, in preprocess im = torch.from_numpy(im) RuntimeError: Numpy is not available 什么问题,清解决

Successfully uninstalled torchvision-0.7.0+cu101 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. wf-pytorch-yolo-v4 0.2.2 requires googledrivedownloader>=0.4, which is not installed. wf-pytorch-yolo-v4 0.2.2 requires scikit-image>=0.16.2, which is not installed. wf-pytorch-yolo-v4 0.2.2 requires wf-pycocotools>=2.0.1.1, which is not installed. timm 1.0.7 requires huggingface_hub, which is not installed. timm 1.0.7 requires safetensors, which is not installed. visdom 0.2.4 requires jsonpatch, which is not installed. tensorboardx 2.6.2.2 requires protobuf>=3.20, but you have protobuf 3.18.0 which is incompatible. torchaudio 0.7.2 requires torch==1.7.1, but you have torch 2.4.1 which is incompatible. Successfully installed filelock-3.16.1 fsspec-2025.3.0 mpmath-1.3.0 networkx-3.1 numpy-1.24.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-10.4.0 py-cpuinfo-9.0.0 sympy-1.13.3 torch-1.7.1+cu101 torchvision-0.19.1 tqdm-4.67.1 triton-3.0.0 ultralytics-8.3.161 ultralytics-thop-2.0.14 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 24.1 -> 25.0.1 [notice] To update, run: pip install --upgrade pip

import numpy as np import cv2 from ultralytics import YOLO import torch import time import queue from threading import Thread import psutil import multiprocessing import multiprocessing from plate_recognition.plate_rec import init_model, get_plate_result import GPUtil import pynvml import argparse import statistics from collections import defaultdict import os import threading import platform import subprocess import signal from concurrent.futures import ThreadPoolExecutor # 全局配置 CLASSES = ['danger', 'car_danger', 'headstock', 'light', 'number', '1number', 'double_number'] DETECTOR_MODEL_PATH = './weights/best.engine' # DETECTOR_MODEL_PATH = './weights/best_fp32.engine' TEXT_MODEL_PATH = './weights/plate_rec.pth' def show_frame(frame_data, stream_id, latency_queue): frame, capture_time = frame_data window_name = f"Stream {stream_id}" if frame is not None and frame.size > 0: cv2.imshow(window_name, frame) if cv2.waitKey(1) & 0xFF == ord('q'): return latency = time.time() - capture_time latency_queue.put((stream_id, latency)) def display_process(display_queue, stream_id, latency_queue): window_name = f"Stream {stream_id}" cv2.namedWindow(window_name, cv2.WINDOW_NORMAL) cv2.resizeWindow(window_name, 800, 600) try: while True: frame_data = display_queue.get() if frame_data is None: break frame, capture_time = frame_data if frame is not None and frame.size > 0: cv2.imshow(window_name, frame) latency = time.time() - capture_time latency_queue.put((stream_id, latency)) if cv2.waitKey(1) & 0xFF == ord('q'): break finally: cv2.destroyWindow(window_name) print(f"显示进程{stream_id}退出") class EnhancedResourceMonitor: def __init__(self, gpu_id, process_mgr, interval=0.5): self.gpu_id = gpu_id self.interval = interval self.running = False self.lock = threading.Lock() self.data = defaultdict(list) self.process_mgr = process_mgr # 进程管理器引用 # GPU硬件信息 self.gpu_arch = "Ada Lovelace" self.sm_count = 56 self.peak_tflops = 35.6 self.cores_per_sm = 128 def start(self): pynvml.nvmlInit() self.handle = pynvml.nvmlDeviceGetHandleByIndex(self.gpu_id) self.running = True self.thread = Thread(target=self._monitor, daemon=True) self.thread.start() def _monitor(self): while self.running: try: # 监控所有工作进程 worker_stats = [] for p in self.process_mgr.processes: try: proc = psutil.Process(p.pid) with proc.oneshot(): worker_stats.append({ 'cpu': proc.cpu_percent(), 'mem': proc.memory_info().rss / (1024 ** 2), 'threads': proc.num_threads() }) except (psutil.NoSuchProcess, psutil.AccessDenied): continue # GPU监控 util = pynvml.nvmlDeviceGetUtilizationRates(self.handle) mem_info = pynvml.nvmlDeviceGetMemoryInfo(self.handle) clock_mhz = pynvml.nvmlDeviceGetClockInfo(self.handle, pynvml.NVML_CLOCK_SM) # 计算实际算力 current_tflops = (self.sm_count * (clock_mhz / 1000) * self.cores_per_sm * 2) / 1000 util_percent = (current_tflops / self.peak_tflops) * 100 # 记录数据 with self.lock: if worker_stats: self.data['worker_cpu'].append(sum(s['cpu'] for s in worker_stats)) self.data['worker_mem'].append(sum(s['mem'] for s in worker_stats)) self.data['worker_threads'].append(sum(s['threads'] for s in worker_stats)) self.data['gpu_util'].append(util.gpu) self.data['gpu_mem'].append(mem_info.used / (1024 ** 2)) self.data['gpu_tflops'].append(current_tflops) except Exception as e: print(f"监控错误: {str(e)}") time.sleep(self.interval) def stop(self): self.running = False if hasattr(self, 'thread'): self.thread.join() pynvml.nvmlShutdown() return self._generate_report() def _generate_report(self): report = "\n[程序资源报告]\n" # 进程统计 if self.data.get('worker_threads'): report += f"- 工作进程数: {len(self.process_mgr.processes)}\n" report += f"- 总线程数: {max(self.data['worker_threads'])}\n" report += f"- 峰值CPU使用: {max(self.data['worker_cpu']):.1f}%\n" report += f"- 峰值内存占用: {max(self.data['worker_mem']):.1f}MB\n" # GPU统计 if self.data.get('gpu_tflops'): avg_tflops = statistics.mean(self.data['gpu_tflops']) report += "\n[GPU资源]\n" report += f"- 平均利用率: {statistics.mean(self.data['gpu_util']):.1f}%\n" report += f"- 峰值显存: {max(self.data['gpu_mem']):.1f}MB\n" report += f"- 平均算力: {avg_tflops:.1f} TFLOPS\n" report += f"- 算力利用率: {avg_tflops/self.peak_tflops*100:.1f}%\n" return report class ResourceMonitor: def __init__(self, gpu_id, interval=0.5): self.gpu_id = gpu_id self.interval = interval self.running = False self.data = defaultdict(list) self.gpu_arch = "Ada Lovelace" self.sm_count = 56 # RTX 4070 SUPER有56个SM self.peak_tflops = 35.6 # 理论算力35.6 TFLOPS self.cores_per_sm = 128 # Ada架构每个SM有128个CUDA核心 self.lock = threading.Lock() # 添加锁 self.main_pid = os.getpid() # 记录主进程PID def start(self): pynvml.nvmlInit() self.handle = pynvml.nvmlDeviceGetHandleByIndex(self.gpu_id) self.running = True self.thread = Thread(target=self._monitor, daemon=True) self.thread.start() def _monitor(self): while self.running: try: # 改进的进程监控 main_process = psutil.Process(self.main_pid) with main_process.oneshot(): # 原子化读取 process_cpu = main_process.cpu_percent(interval=0.1) # 更短间隔 process_mem = main_process.memory_info().rss / (1024 ** 2) process_threads = main_process.num_threads() # 确保不会记录到0值 if process_cpu == 0 and len(self.data['process_cpu']) > 0: process_cpu = self.data['process_cpu'][-1] * 0.9 # 使用上次值的90% # 记录数据 with self.lock: self.data['process_cpu'].append(process_cpu) self.data['process_memory'].append(process_mem) self.data['process_threads'].append(process_threads) # 系统进程统计 process_count = len(list(psutil.process_iter())) cpu_percent = psutil.cpu_percent() mem = psutil.virtual_memory() # 获取当前Python进程的资源使用 current_process = psutil.Process() process_cpu = current_process.cpu_percent() process_mem = current_process.memory_info().rss / (1024 ** 2) # MB process_threads = current_process.num_threads() # GPU监控 util = pynvml.nvmlDeviceGetUtilizationRates(self.handle) mem_info = pynvml.nvmlDeviceGetMemoryInfo(self.handle) graphics_clock = pynvml.nvmlDeviceGetClockInfo(self.handle, pynvml.NVML_CLOCK_GRAPHICS) sm_clock = pynvml.nvmlDeviceGetClockInfo(self.handle, pynvml.NVML_CLOCK_SM) power_usage = pynvml.nvmlDeviceGetPowerUsage(self.handle) / 1000 # 瓦特 total_mem = sum(p.memory_info().rss for p in psutil.process_iter(['pid', 'name']) if 'python' in p.info['name'].lower()) / (1024**2) # MB # 获取当前GPU时钟频率 clock_mhz = pynvml.nvmlDeviceGetClockInfo( self.handle, pynvml.NVML_CLOCK_SM ) # 收集数据 self.data['system_processes'].append(process_count) self.data['system_cpu'].append(cpu_percent) self.data['system_memory'].append(mem.used / (1024**3)) # GB self.data['process_cpu'].append(process_cpu) self.data['process_memory'].append(process_mem) self.data['process_threads'].append(process_threads) self.data['gpu_util'].append(util.gpu) self.data['gpu_mem'].append(mem_info.used / (1024**2)) # MB self.data['gpu_power'].append(power_usage) self.data['gpu_clock_graphics'].append(graphics_clock) self.data['gpu_clock_sm'].append(sm_clock) # 实时算力计算 (TFLOPS = SM数 * 时钟频率(GHz) * 每SM核心数 * 2 / 1000) current_tflops = (self.sm_count * (clock_mhz / 1000) * self.cores_per_sm * 2) / 1000 util_percent = (current_tflops / self.peak_tflops) * 100 self.data['gpu_tflops'].append(current_tflops) self.data['gpu_sm_clock'].append(clock_mhz) self.data['gpu_util_actual'].append(util_percent) except Exception as e: print(f"算力监控错误: {e}") except (psutil.NoSuchProcess, pynvml.NVMLError) as e: print(f"监控错误(忽略): {str(e)}") except Exception as e: print(f"意外的监控错误: {str(e)}") time.sleep(self.interval) def stop(self): self.running = False self.thread.join() pynvml.nvmlShutdown() return self._generate_report() def _generate_report(self): if not any(len(v) > 0 for v in self.data.values()): return "无监控数据" report = "\n[资源使用报告]\n" report += "\n[算力分析 - RTX 4070 SUPER]\n" report += f"- GPU架构: {self.gpu_arch}\n" report += f"- 流式多处理器(SM): {self.sm_count}\n" report += f"- CUDA核心: {self.sm_count * self.cores_per_sm}\n" report += f"- 理论峰值算力: {self.peak_tflops} TFLOPS\n" if self.data.get('gpu_tflops'): avg_tflops = statistics.mean(self.data['gpu_tflops']) max_tflops = max(self.data['gpu_tflops']) avg_clock = statistics.mean(self.data['gpu_sm_clock']) report += "\n[实际运行数据]\n" report += f"- 平均SM时钟: {avg_clock} MHz\n" report += f"- 平均算力: {avg_tflops:.1f} TFLOPS\n" report += f"- 峰值算力: {max_tflops:.1f} TFLOPS\n" report += f"- 算力利用率: {avg_tflops/self.peak_tflops*100:.1f}%\n" # 瓶颈分析 avg_util = statistics.mean(self.data['gpu_util']) if avg_util > 90 and util_percent < 70: report += "\n[警告] 高GPU利用率但低算力利用率,可能存在内存带宽瓶颈\n" elif avg_tflops < 0.7 * self.peak_tflops: report += "\n[提示] 算力未充分利用,建议检查:\n" report += " • 批次大小(batch size)是否过小\n" report += " • 模型是否存在大量分支操作\n" # 系统级统计 report += "[系统资源]\n" system_metrics = { 'system_processes': ('系统进程数', '{:.0f}'), 'system_cpu': ('系统CPU使用率(%)', '{:.1f}'), 'system_memory': ('系统内存使用(GB)', '{:.2f}') } for key, (name, fmt) in system_metrics.items(): values = self.data.get(key, []) if values: report += ( f"{name}:\n" f" 平均值: {fmt.format(statistics.mean(values))}\n" f" 最大值: {fmt.format(max(values))}\n" f" 最小值: {fmt.format(min(values))}\n" f" 采样数: {len(values)}\n\n" ) # 进程级统计 report += "[主进程资源]\n" process_metrics = { 'process_cpu': ('进程CPU使用率(%)', '{:.1f}'), 'process_memory': ('进程内存使用(MB)', '{:.1f}'), 'process_threads': ('程内部的线程数量', '{:.0f}') } for key, (name, fmt) in process_metrics.items(): values = self.data.get(key, []) if values: report += ( f"{name}:\n" f" 平均值: {fmt.format(statistics.mean(values))}\n" f" 最大值: {fmt.format(max(values))}\n" f" 最小值: {fmt.format(min(values))}\n" f" 采样数: {len(values)}\n\n" ) # GPU统计 report += "[GPU资源]\n" gpu_metrics = { 'gpu_util': ('GPU利用率(%)', '{:.1f}'), 'gpu_mem': ('显存使用(MB)', '{:.1f}'), 'gpu_power': ('GPU功耗(W)', '{:.1f}'), 'gpu_clock_graphics': ('图形时钟(MHz)', '{:.0f}'), 'gpu_clock_sm': ('SM时钟(MHz)', '{:.0f}') } for key, (name, fmt) in gpu_metrics.items(): values = self.data.get(key, []) if values: report += ( f"{name}:\n" f" 平均值: {fmt.format(statistics.mean(values))}\n" f" 最大值: {fmt.format(max(values))}\n" f" 最小值: {fmt.format(min(values))}\n" f" 采样数: {len(values)}\n\n" ) return report class VideoProcessor: def __init__(self, device): # 添加CUDA初始化 torch.cuda.empty_cache() # 加载模型前设置优化选项 torch.backends.cudnn.benchmark = True torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.allow_tf32 = True # 加载模型 self.model = YOLO(DETECTOR_MODEL_PATH, task='detect') self.plate_rec_model = init_model(device, TEXT_MODEL_PATH) self.rtsp_url = "rtsp://admin:guoxinzhike901@192.168.1.108:554/cam/realmonitor?channel=1&subtype=0" self.max_retries = 3 # 预热GPU # with torch.no_grad(): # dummy_input = torch.randn(1, 3, 640, 640).to(device) # _ = self.model(dummy_input) self.device = device self.frame_count = 0 self.plate_text_cache = {} def _reconnect(self): cap = cv2.VideoCapture(self.rtsp_url, cv2.CAP_FFMPEG) cap.set(cv2.CAP_PROP_RTSP_TRANSPORT, cv2.CAP_RTSP_TRANSPORT_TCP) return cap # 在VideoProcessor类中添加中文显示支持 def process_frame(self, frame): # 增强版中文显示函数(带错误处理和字体回退) def put_chinese_text(img, text, position, font_scale, color, thickness): """支持中文显示的增强函数""" try: from PIL import Image, ImageDraw, ImageFont img_pil = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) draw = ImageDraw.Draw(img_pil) # 尝试加载字体(优先使用自定义字体,失败则回退系统字体) try: font_path = os.path.join("fonts", "platech.ttf") font = ImageFont.truetype(font_path, int(font_scale * 30)) except: font = ImageFont.load_default() print("警告:使用默认字体,中文显示可能不正常") draw.text(position, text, font=font, fill=color) return cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR) except Exception as e: print(f"文本渲染失败,使用OpenCV默认显示: {str(e)}") cv2.putText(img, text, position, cv2.FONT_HERSHEY_SIMPLEX, font_scale, color, thickness) return img self.frame_count += 1 # 1. 输入帧验证 if frame is None or frame.size == 0: print("错误:接收到空帧") return frame # 2. 模型推理(添加详细日志) try: results = self.model.track( frame, persist=True, imgsz=640, verbose=False # 关闭YOLO内置输出 ) # 调试输出 print(f"帧 {self.frame_count}: 检测到 {len(results)} 个结果") except Exception as e: print(f"模型推理错误: {str(e)}") return frame # 3. 结果解析与渲染 class_colors = { 'danger': (0, 0, 255), 'car_danger': (0, 165, 255), 'headstock': (255, 0, 0), 'light': (255, 255, 0), 'number': (0, 255, 0), '1number': (0, 255, 255), 'double_number': (128, 0, 128) } for result in results: # 验证检测结果有效性 if not hasattr(result, 'boxes') or result.boxes is None: print("警告:结果中未包含有效检测框") continue for box in result.boxes: try: # 解析检测框数据 cls_id = int(box.cls[0].item()) class_name = CLASSES[cls_id] x1, y1, x2, y2 = map(int, box.xyxy[0].tolist()) conf = box.conf[0].item() track_id = int(box.id[0].item()) if box.id is not None else None # 车牌特殊处理 if class_name == 'number' and (track_id not in self.plate_text_cache or self.frame_count % 5 == 0): plate_img = frame[y1:y2, x1:x2] if plate_img.size > 0: plate_text = get_plate_result(plate_img, self.device, self.plate_rec_model) or "识别失败" self.plate_text_cache[track_id] = plate_text try: if track_id not in self.plate_text_cache or self.frame_count % 5 == 0: plate_img = frame[y1:y2, x1:x2] if plate_img.size > 0: plate_text = get_plate_result(plate_img, self.device, self.plate_rec_model) or "识别失败" self.plate_text_cache[track_id] = plate_text else: plate_text = "无效区域" display_text = f"{self.plate_text_cache.get(track_id, '加载中...')} ID:{track_id} {conf:.2f}" except Exception as e: print(f"车牌处理异常: {str(e)}") display_text = f"车牌识别错误 ID:{track_id}" else: display_text = f"{class_name} {conf:.2f}" + (f" ID:{track_id}" if track_id else "") # 渲染检测框和文本 color = class_colors.get(class_name, (255, 255, 255)) cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2) # 文本位置修正(确保不超出画面) y_text = max(y1 - 10, 10) frame = put_chinese_text(frame, display_text, (x1, y_text), 0.7, color, 2) except Exception as e: print(f"单检测框处理错误: {str(e)}") continue return frame def display_thread(display_queue, stream_id, latency_queue): window_name = f"Stream {stream_id}" cv2.namedWindow(window_name, cv2.WINDOW_NORMAL) cv2.resizeWindow(window_name, 800, 600) try: while True: frame_data = display_queue.get() if frame_data is None: break frame, capture_time = frame_data if frame is not None and frame.size > 0: cv2.imshow(window_name, frame) latency = time.time() - capture_time latency_queue.put((stream_id, latency)) if cv2.waitKey(1) & 0xFF == ord('q'): break finally: cv2.destroyWindow(window_name) print(f"显示线程{stream_id}退出") class StreamSimulator: def __init__(self, source_url, num_streams, shared_frame_queue): self.source_url = source_url self.num_streams = num_streams self.shared_frame_queue = shared_frame_queue self.display_queues = [multiprocessing.Queue(maxsize=2000) for _ in range(num_streams)] # 使用 multiprocessing.Queue self.stop_flag = multiprocessing.Event() self.capture_process = None def start(self): self.capture_process = multiprocessing.Process(target=self._capture_and_distribute) self.capture_process.start() def stop(self): self.stop_flag.set() if self.capture_process: self.capture_process.join(timeout=5) if self.capture_process.is_alive(): self.capture_process.terminate() print("强制终止捕获进程") def _capture_and_distribute(self): rtsp_url = self.source_url cap = cv2.VideoCapture(rtsp_url, cv2.CAP_FFMPEG) cap.set(cv2.CAP_PROP_BUFFERSIZE, 1) cap.set(cv2.CAP_PROP_FPS, 15) skip_frames = 2 # 每 2 帧处理 1 帧 frame_count = 0 try: while not self.stop_flag.is_set(): ret, frame = cap.read() if not ret: print("帧读取失败,重连中...") cap.release() time.sleep(2) cap = cv2.VideoCapture(rtsp_url, cv2.CAP_FFMPEG) continue frame_count += 1 if frame_count % skip_frames == 0: for i in range(self.num_streams): try: if not self.shared_frame_queue.full(): self.shared_frame_queue.put((frame.copy(), i, time.time()), block=False) else: print(f"共享帧队列已满,丢弃旧帧") self.shared_frame_queue.get_nowait() self.shared_frame_queue.put((frame.copy(), i, time.time()), block=False) except Exception as e: print(f"帧队列操作警告: {type(e).__name__}") finally: cap.release() self.shared_frame_queue.put(None) for q in self.display_queues: q.put(None) def dispatch_process(result_queue, display_queues): frame_buffer = [] while True: data = result_queue.get() if data is None: break # 检查是否为 'stats' 二元组 if isinstance(data, tuple) and len(data) == 2 and data[0] == 'stats': continue # 跳过中间统计数据 # 检查是否为帧数据(三元组) elif isinstance(data, tuple) and len(data) == 3: processed_frame, stream_id, capture_time = data frame_buffer.append((processed_frame, stream_id, capture_time)) frame_buffer.sort(key=lambda x: x[2]) # 按时间戳排序 for frame, sid, _ in frame_buffer: if not display_queues[sid].full(): display_queues[sid].put((frame, capture_time), block=False) else: print(f"显示队列 {sid} 已满,丢帧") frame_buffer = [] # 清空缓冲区 # 检查是否为字典(最终统计数据) elif isinstance(data, dict): continue # 跳过最终统计数据 else: print(f"警告:未知数据类型: {type(data)}") def display_process(display_queue, stream_id, latency_queue): window_name = f"Stream {stream_id}" cv2.namedWindow(window_name, cv2.WINDOW_NORMAL) cv2.resizeWindow(window_name, 800, 600) frame_count = 0 try: while True: frame_data = display_queue.get() if frame_data is None: break frame, capture_time = frame_data if frame is not None and frame.size > 0: cv2.imshow(window_name, frame) frame_count += 1 if frame_count % 10 == 0: latency = time.time() - capture_time latency_queue.put((stream_id, latency)) if cv2.getWindowProperty(window_name, cv2.WND_PROP_VISIBLE) < 1: break if cv2.waitKey(1) & 0xFF == ord('q'): break finally: cv2.destroyWindow(window_name) print(f"显示进程{stream_id}退出") def worker_process(input_queue, gpu_id, result_queue, stats_queue, monitor_interval=5): import numpy as np # 显式导入 numpy print(f"In worker process {os.getpid()}, np is {np}, type(np.empty((1,))) = {type(np.empty((1,)))}") from collections import defaultdict torch.set_num_threads(1) cv2.setNumThreads(1) torch.cuda.set_device(gpu_id) device = torch.device(f"cuda:{gpu_id}" if torch.cuda.is_available() else "cpu") processor = VideoProcessor(device) start_time = time.time() stats = { 'frame_count': 0, 'avg_fps': 0, 'max_gpu_mem': 0, 'process_time': 0, 'stream_id': None } frame_counts_per_stream = defaultdict(int) try: while True: frame_data = input_queue.get() if frame_data is None: break frame, stream_id, capture_time = frame_data stats['stream_id'] = stream_id # 单帧处理 start_process = time.time() results = processor.model.track(frame, imgsz=640, verbose=False) # 处理单帧 stats['process_time'] += time.time() - start_process processed_frame = processor.process_frame(frame) # 移除 cProfile if processed_frame is not None and processed_frame.size > 0: stats['frame_count'] += 1 frame_counts_per_stream[stream_id] += 1 result_queue.put((processed_frame, stream_id, capture_time)) # 定期更新统计信息 if stats['frame_count'] % monitor_interval == 0: duration = time.time() - start_time stats['avg_fps'] = stats['frame_count'] / duration if torch.cuda.is_available(): mem = torch.cuda.max_memory_allocated() / (1024 ** 2) stats['max_gpu_mem'] = max(stats['max_gpu_mem'], mem) stats['worker_pid'] = os.getpid() stats['frame_counts_per_stream'] = dict(frame_counts_per_stream) stats_queue.put(('stats', stats.copy())) except Exception as e: print(f"工作进程错误: {e}") finally: stats['worker_pid'] = os.getpid() stats['frame_counts_per_stream'] = dict(frame_counts_per_stream) stats_queue.put(stats) def get_gpu_info(): """获取GPU信息""" pynvml.nvmlInit() gpu_count = pynvml.nvmlDeviceGetCount() gpus = [] for i in range(gpu_count): handle = pynvml.nvmlDeviceGetHandleByIndex(i) name = pynvml.nvmlDeviceGetName(handle) mem = pynvml.nvmlDeviceGetMemoryInfo(handle) gpus.append({ 'id': i, 'name': name.decode('utf-8') if isinstance(name, bytes) else name, 'total_mem': mem.total / (1024 ** 2) }) pynvml.nvmlShutdown() return gpus import os import argparse def monitor_resources(gpu_id, interval=5): """资源监控线程""" pynvml.nvmlInit() handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_id) while True: # GPU监控 util = pynvml.nvmlDeviceGetUtilizationRates(handle) mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle) # CPU监控 cpu_percent = psutil.cpu_percent() mem = psutil.virtual_memory() print(f"\n[资源监控] GPU: {util.gpu}% 显存: {mem_info.used/1024**2:.1f}/{mem_info.total/1024**2:.1f}MB | " f"CPU: {cpu_percent}% 内存: {mem.used/1024**3:.1f}/{mem.total/1024**3:.1f}GB") time.sleep(interval) class DynamicProcessManager: def __init__(self, num_workers): self.num_workers = num_workers self.processes = [] self.result_queues = [] def start_workers(self, input_queue, gpu_id, result_queue, stats_queue): for i in range(self.num_workers): p = multiprocessing.Process( target=worker_process, args=(input_queue, gpu_id, result_queue, stats_queue) ) self.processes.append(p) p.start() def stop_workers(self): for p in self.processes: if p.is_alive(): p.terminate() try: p.join(timeout=1) except: pass if p.is_alive(): if platform.system() == "Windows": subprocess.run(['taskkill', '/F', '/PID', str(p.pid)], check=False) else: os.kill(p.pid, signal.SIGKILL) print(f"强制终止进程 {p.pid}") self.processes = [] def get_gpu_info(): pynvml.nvmlInit() gpu_count = pynvml.nvmlDeviceGetCount() gpus = [] for i in range(gpu_count): handle = pynvml.nvmlDeviceGetHandleByIndex(i) name = pynvml.nvmlDeviceGetName(handle) mem = pynvml.nvmlDeviceGetMemoryInfo(handle) gpus.append({ 'id': i, 'name': name.decode('utf-8') if isinstance(name, bytes) else name, 'total_mem': mem.total / (1024 ** 2) }) pynvml.nvmlShutdown() return gpus class ProgramMonitor: def __init__(self, gpu_id, process_manager, result_queue, stats_queue, args): self.gpu_id = gpu_id self.result_queue = result_queue self.stats_queue = stats_queue self.process_manager = process_manager self.args = args self.running = False self.stop_flag = threading.Event() self.data = { 'process': defaultdict(list), 'workers': defaultdict(list), 'gpu': defaultdict(list), 'fps_per_stream': defaultdict(list), 'total_fps': [], 'worker_stats': [], 'cpu_per_core': [], 'mem_bandwidth': [] } self.lock = threading.Lock() self.gpu_info = { 'arch': "Ada Lovelace", 'sm_count': 56, 'cores_per_sm': 128, 'peak_tflops': 35.6 } self.total_frame_counts = defaultdict(int) self.last_frame_counts = defaultdict(lambda: defaultdict(int)) self.start_time = None self.stop_time = None self.last_mem_time = time.time() self.last_mem_bytes = psutil.virtual_memory().used def start(self): pynvml.nvmlInit() self.handle = pynvml.nvmlDeviceGetHandleByIndex(self.gpu_id) self.running = True self.start_time = time.time() self.thread = Thread(target=self._monitor, daemon=True) self.thread.start() def _monitor(self): last_cpu_times = {} while not self.stop_flag.is_set(): try: # Process stats from stats_queue try: data = self.stats_queue.get_nowait() if isinstance(data, tuple) and data[0] == 'stats': stats = data[1] worker_pid = stats['worker_pid'] frame_counts_per_stream = stats['frame_counts_per_stream'] with self.lock: for stream_id, count in frame_counts_per_stream.items(): delta = count - self.last_frame_counts[worker_pid][stream_id] self.total_frame_counts[stream_id] += delta self.last_frame_counts[worker_pid][stream_id] = count except queue.Empty: pass # Main process monitoring main_process = psutil.Process(os.getpid()) with main_process.oneshot(): current_cpu_time = main_process.cpu_times() pid = main_process.pid if pid in last_cpu_times: cpu_usage = self._calculate_cpu_usage(last_cpu_times[pid], current_cpu_time) self.data['process']['cpu'].append(cpu_usage) last_cpu_times[pid] = current_cpu_time self.data['process']['mem'].append(main_process.memory_info().rss / (1024 ** 2)) self.data['process']['threads'].append(main_process.num_threads()) # Worker processes monitoring for p in self.process_manager.processes: try: proc = psutil.Process(p.pid) with proc.oneshot(): current_cpu_time = proc.cpu_times() pid = p.pid if pid in last_cpu_times: cpu_usage = self._calculate_cpu_usage(last_cpu_times[pid], current_cpu_time) self.data['workers']['cpu'].append(cpu_usage) last_cpu_times[pid] = current_cpu_time self.data['workers']['mem'].append(proc.memory_info().rss / (1024 ** 2)) self.data['workers']['threads'].append(proc.num_threads()) except (psutil.NoSuchProcess, psutil.AccessDenied): continue # Memory bandwidth monitoring current_time = time.time() current_mem_bytes = psutil.virtual_memory().used time_delta = current_time - self.last_mem_time if time_delta > 0: mem_bandwidth = (current_mem_bytes - self.last_mem_bytes) / time_delta / (1024 ** 2) with self.lock: self.data['mem_bandwidth'].append(mem_bandwidth) self.last_mem_time = current_time self.last_mem_bytes = current_mem_bytes # CPU per core monitoring cpu_per_core = psutil.cpu_percent(percpu=True) with self.lock: self.data['cpu_per_core'].append(cpu_per_core) self._monitor_gpu() except Exception as e: print(f"监控错误: {str(e)}") time.sleep(0.5) def _monitor_gpu(self): try: util = pynvml.nvmlDeviceGetUtilizationRates(self.handle) mem_info = pynvml.nvmlDeviceGetMemoryInfo(self.handle) clock_mhz = pynvml.nvmlDeviceGetClockInfo(self.handle, pynvml.NVML_CLOCK_SM) current_tflops = (self.gpu_info['sm_count'] * (clock_mhz / 1000) * self.gpu_info['cores_per_sm'] * 2) / 1000 with self.lock: self.data['gpu']['util'].append(util.gpu) self.data['gpu']['mem'].append(mem_info.used / (1024 ** 2)) self.data['gpu']['tflops'].append(current_tflops) except pynvml.NVMLError as e: print(f"GPU监控错误: {str(e)}") def stop(self): self.stop_time = time.time() self.running = False self.stop_flag.set() if self.thread.is_alive(): self.thread.join(timeout=2) report = self.generate_report() pynvml.nvmlShutdown() return report def generate_report(self): report = "\n=== 程序资源使用报告 ===\n" # System information (unchanged) report += "\n[系统信息]\n" report += f"- CPU核心数: {psutil.cpu_count(logical=False)}物理/{psutil.cpu_count()}逻辑\n" report += f"- 系统内存: {psutil.virtual_memory().total / (1024**3):.1f}GB\n" report += f"- 系统CPU使用率: {psutil.cpu_percent(interval=1):.1f}%\n" report += f"- 系统内存使用: {psutil.virtual_memory().used / (1024**3):.1f}GB / {psutil.virtual_memory().total / (1024**3):.1f}GB\n" gpu_name_raw = pynvml.nvmlDeviceGetName(self.handle) gpu_name = gpu_name_raw.decode('utf-8') if isinstance(gpu_name_raw, bytes) else gpu_name_raw total_gpu_mem = pynvml.nvmlDeviceGetMemoryInfo(self.handle).total / (1024 ** 2) report += f"- GPU型号: {gpu_name}\n" report += f"- GPU总显存: {total_gpu_mem:.1f}MB\n" # Main process stats (unchanged) if self.data['process']['cpu']: report += "\n[主进程资源]\n" report += f"- 平均CPU使用率: {statistics.mean(self.data['process']['cpu']):.1f}%\n" report += f"- 峰值CPU使用率: {max(self.data['process']['cpu']):.1f}%\n" report += f"- 平均内存占用: {statistics.mean(self.data['process']['mem']):.1f}MB\n" report += f"- 峰值内存占用: {max(self.data['process']['mem']):.1f}MB\n" report += f"- 线程数: {max(self.data['process']['threads'])}\n" # Worker processes stats (unchanged except for FPS section) if self.data['workers']['cpu']: num_workers = min(self.args.streams * 4, psutil.cpu_count(logical=True) * 2) num_samples = len(self.data['workers']['cpu']) // num_workers if num_samples > 0: worker_cpu_per_sample = [self.data['workers']['cpu'][i*num_workers:(i+1)*num_workers] for i in range(num_samples)] worker_mem_per_sample = [self.data['workers']['mem'][i*num_workers:(i+1)*num_workers] for i in range(num_samples)] worker_threads_per_sample = [self.data['workers']['threads'][i*num_workers:(i+1)*num_workers] for i in range(num_samples)] avg_worker_cpu = statistics.mean([statistics.mean(sample) for sample in worker_cpu_per_sample]) total_worker_cpu = statistics.mean([sum(sample) for sample in worker_cpu_per_sample]) avg_worker_mem = statistics.mean([statistics.mean(sample) for sample in worker_mem_per_sample]) total_worker_mem = statistics.mean([sum(sample) for sample in worker_mem_per_sample]) max_total_worker_threads = max([sum(sample) for sample in worker_threads_per_sample]) report += f"\n[工作进程资源 ({num_workers}个)]\n" report += f"- 平均CPU使用率(每个进程): {avg_worker_cpu:.1f}%\n" report += f"- 总CPU使用率: {total_worker_cpu:.1f}%\n" report += f"- 平均内存占用(每个进程): {avg_worker_mem:.1f}MB\n" report += f"- 总内存占用: {total_worker_mem:.1f}MB\n" report += f"- 总线程数(峰值): {max_total_worker_threads}\n" # Video stream performance with accurate FPS if self.total_frame_counts: elapsed_time = self.stop_time - self.start_time report += "\n[视频流性能]\n" for stream_id in range(self.args.streams): if stream_id in self.total_frame_counts: avg_fps = self.total_frame_counts[stream_id] / elapsed_time report += f"- 视频流 {stream_id}: 平均 FPS {avg_fps:.1f}\n" total_frames = sum(self.total_frame_counts.values()) total_fps = total_frames / elapsed_time report += f"- 总吞吐量: {total_fps:.1f} FPS\n" # CPU per core (unchanged) if self.data.get('cpu_per_core'): avg_cpu_per_core = [statistics.mean([sample[i] for sample in self.data['cpu_per_core']]) for i in range(len(self.data['cpu_per_core'][0]))] overall_avg_cpu = statistics.mean(avg_cpu_per_core) report += "\n[CPU 硬件线程利用率]\n" for i, avg in enumerate(avg_cpu_per_core): report += f"- 逻辑处理器 {i}: {avg:.1f}%\n" report += f"- 16 个硬件线程平均利用率: {overall_avg_cpu:.1f}%\n" # Total process stats (unchanged) if self.data['process']['cpu'] and self.data['workers']['cpu']: num_display_processes = self.args.streams total_cpu = statistics.mean(self.data['process']['cpu']) + total_worker_cpu total_mem = statistics.mean(self.data['process']['mem']) + total_worker_mem total_threads = max(self.data['process']['threads']) + max_total_worker_threads total_processes = 1 + num_workers + num_display_processes + 1 report += "\n[所有进程总计]\n" report += f"- 总CPU使用率: {total_cpu:.1f}%\n" report += f"- 总内存占用: {total_mem:.1f}MB\n" report += f"- 总线程数: {total_threads}\n" report += f"- 总进程数: {total_processes}(1个主进程 + {num_workers}个工作进程 + {num_display_processes}个显示进程 + 1个分发进程)\n" # GPU stats (unchanged) if self.data['gpu']['tflops']: avg_tflops = statistics.mean(self.data['gpu']['tflops']) util_percent = min((avg_tflops / self.gpu_info['peak_tflops']) * 100, 100.0) report += "\n[GPU资源]\n" report += f"- 平均利用率: {statistics.mean(self.data['gpu']['util']):.1f}%\n" report += f"- 峰值显存: {max(self.data['gpu']['mem']):.1f}MB\n" report += f"- 平均算力: {avg_tflops:.1f}/{self.gpu_info['peak_tflops']} TFLOPS\n" report += f"- 算力利用率: {util_percent:.1f}%\n" # Memory bandwidth (unchanged) if self.data.get('mem_bandwidth'): avg_mem_bandwidth = statistics.mean(self.data['mem_bandwidth']) max_mem_bandwidth = max(self.data['mem_bandwidth']) report += "\n[存储器带宽]\n" report += f"- 平均内存带宽: {avg_mem_bandwidth:.1f} MB/s\n" report += f"- 峰值内存带宽: {max_mem_bandwidth:.1f} MB/s\n" return report def _calculate_cpu_usage(self, prev_times, curr_times): """ 计算基于前后的 CPU 时间的使用率百分比。 参数: prev_times: 上一次的 CPU 时间(psutil.cpu_times 对象) curr_times: 当前的 CPU 时间(psutil.cpu_times 对象) 返回: CPU 使用率(百分比) """ delta_user = curr_times.user - prev_times.user delta_system = curr_times.system - prev_times.system delta_total = (curr_times.user + curr_times.system) - (prev_times.user + prev_times.system) if delta_total > 0: cpu_usage = ((delta_user + delta_system) / delta_total) * 100 else: cpu_usage = 0.0 return cpu_usage # _monitor_gpu and _calculate_cpu_usage remain unchanged def main(): parser = argparse.ArgumentParser() parser.add_argument('--streams', type=int, default=1) parser.add_argument('--source', type=str, default="") parser.add_argument('--gpu_id', type=int, default=0) args = parser.parse_args() camera_config = { 'username': 'admin', 'password': 'guoxinzhike901' } source_url = args.source if args.source else \ f"rtsp://{camera_config['username']}:{camera_config['password']}@192.168.1.108/" gpus = get_gpu_info() print("\n[硬件配置]") print(f"- CPU核心: {psutil.cpu_count(logical=False)}物理/{psutil.cpu_count()}逻辑") print(f"- 内存: {psutil.virtual_memory().total / (1024**3):.1f}GB") print(f"- 使用GPU {args.gpu_id}: {gpus[args.gpu_id]['name']}") print(f" 显存: {gpus[args.gpu_id]['total_mem']:.1f}MB") os.environ['OMP_NUM_THREADS'] = '1' os.environ['MKL_NUM_THREADS'] = '1' print(f"\n[测试配置]") print(f"- 模拟视频流数: {args.streams}") print(f"- 视频源: {source_url}") # 创建共享队列 frame_queue_size = max(2000, 200 * args.streams) shared_frame_queue = multiprocessing.Queue(maxsize=frame_queue_size) display_queue_size = max(50, 20 * args.streams) shared_result_queue = multiprocessing.Queue(maxsize=2000) stats_queue = multiprocessing.Queue() # New queue for stats # 固定工作进程数为 16 num_workers = 1 #min(args.streams * 8, psutil.cpu_count(logical=True) * 2) process_mgr = DynamicProcessManager(num_workers) simulator = StreamSimulator(source_url, args.streams, shared_frame_queue) monitor = ProgramMonitor(args.gpu_id, process_mgr, shared_result_queue, stats_queue, args) monitor.args = args # 传递 args latency_queue = multiprocessing.Queue() # 启动工作进程 process_mgr.start_workers(shared_frame_queue, args.gpu_id, shared_result_queue, stats_queue) # 启动分发进程 dispatch_p = multiprocessing.Process( target=dispatch_process, args=(shared_result_queue, simulator.display_queues), daemon=True ) dispatch_p.start() simulator.start() monitor.start() # 启动显示进程 display_threads = [] for i in range(args.streams): t = Thread(target=display_thread, args=(simulator.display_queues[i], i+1, latency_queue)) display_threads.append(t) t.start() time.sleep(0.5) print("\n[测试开始] 程序将运行30秒...") start_time = time.time() end_time = start_time + 60*5 try: while time.time() < end_time: time.sleep(1) remaining = int(end_time - time.time()) if remaining % 10 == 0 or remaining <= 5: print(f"剩余时间: {remaining}秒") finally: runtime = time.time() - start_time print(f"\n[测试完成] 实际运行时间: {runtime:.1f}秒") print("停止模拟器...") simulator.stop() print("生成报告并停止监控...") report = monitor.stop() print("停止工作进程...") process_mgr.stop_workers() # 停止显示线程 for q in simulator.display_queues: q.put(None) for t in display_threads: t.join() # 停止分发进程 shared_result_queue.put(None) dispatch_p.join(timeout=5) if dispatch_p.is_alive(): dispatch_p.terminate() # 收集延迟测量 latencies = [] while not latency_queue.empty(): try: stream_id, latency = latency_queue.get_nowait() latencies.append(latency) except queue.Empty: break if latencies: min_latency = min(latencies) max_latency = max(latencies) avg_latency = sum(latencies) / len(latencies) report += f"\n[延迟统计]\n" report += f"- 测量次数: {len(latencies)}\n" report += f"- 最低延迟: {min_latency:.3f}秒\n" report += f"- 最高延迟: {max_latency:.3f}秒\n" report += f"- 平均延迟: {avg_latency:.3f}秒\n" else: report += "\n[延迟统计]\n- 无延迟数据\n" if torch.cuda.is_available(): torch.cuda.empty_cache() print(report) if __name__ == '__main__': multiprocessing.set_start_method('spawn') # multiprocessing.set_start_method('fork') # Linux 默认方法 main() # 测试4路视频流 # python det_ocr_shipinliu_pre.py --streams 1 --gpu_id 0 """ === 程序资源使用报告 === === 程序资源使用报告 === [系统信息] - CPU核心数: 10物理/16逻辑 - 系统内存: 63.8GB - 系统CPU使用率: 14.1% - 系统内存使用: 26.3GB / 63.8GB - GPU型号: NVIDIA GeForce RTX 4070 SUPER - GPU总显存: 12282.0MB [主进程资源] - 平均CPU使用率: 16.3% - 峰值CPU使用率: 28.1% - 平均内存占用: 385.1MB - 峰值内存占用: 385.7MB - 线程数: 9 [工作进程资源 (16个)] - 平均CPU使用率(每个进程): 22.1% - 总CPU使用率: 354.2% - 平均内存占用(每个进程): 801.5MB - 总内存占用: 12823.3MB - 总线程数(峰值): 304 [所有进程总计] - 总CPU使用率: 370.5% - 总内存占用: 13208.4MB - 总线程数: 313 - 总进程数: 19(1个主进程 + 16个工作进程 + 1个显示进程 + 1个分发进程) [GPU资源] - 平均利用率: 31.3% - 峰值显存: 8226.7MB - 平均算力: 22.7/35.6 TFLOPS - 算力利用率: 63.8% [延迟统计] - 测量次数: 67 - 最低延迟: 0.024秒 - 最高延迟: 2.499秒 - 平均延迟: 0.287秒 """ # python det_ocr_shipinliu_pre.py --streams 2 --gpu_id 0 """ === 程序资源使用报告 === [系统信息] - CPU核心数: 10物理/16逻辑 - 系统内存: 63.8GB - 系统CPU使用率: 9.8% - 系统内存使用: 26.3GB / 63.8GB - GPU型号: NVIDIA GeForce RTX 4070 SUPER - GPU总显存: 12282.0MB [主进程资源] - 平均CPU使用率: 15.3% - 峰值CPU使用率: 40.6% - 平均内存占用: 386.4MB - 峰值内存占用: 387.1MB - 线程数: 9 [工作进程资源 (16个)] - 平均CPU使用率(每个进程): 20.8% - 总CPU使用率: 333.1% - 平均内存占用(每个进程): 960.3MB - 总内存占用: 15364.2MB - 总线程数(峰值): 328 [所有进程总计] - 总CPU使用率: 348.4% - 总内存占用: 15750.6MB - 总线程数: 337 - 总进程数: 20(1个主进程 + 16个工作进程 + 2个显示进程 + 1个分发进程) [GPU资源] - 平均利用率: 50.5% - 峰值显存: 8328.6MB - 平均算力: 12.6/35.6 TFLOPS - 算力利用率: 35.4% [延迟统计] - 测量次数: 327 - 最低延迟: 0.027秒 - 最高延迟: 0.757秒 - 平均延迟: 0.080秒 """ # python det_ocr_shipinliu_pre.py --streams 3 --gpu_id 0 """ [系统信息] - CPU核心数: 10物理/16逻辑 - 系统内存: 63.8GB - 系统CPU使用率: 9.5% - 系统内存使用: 26.2GB / 63.8GB - GPU型号: NVIDIA GeForce RTX 4070 SUPER - GPU总显存: 12282.0MB [主进程资源] - 平均CPU使用率: 26.2% - 峰值CPU使用率: 53.1% - 平均内存占用: 386.1MB - 峰值内存占用: 386.6MB - 线程数: 9 [工作进程资源 (16个)] - 平均CPU使用率(每个进程): 43.9% - 总CPU使用率: 702.5% - 平均内存占用(每个进程): 1018.8MB - 总内存占用: 16301.3MB - 总线程数(峰值): 322 [所有进程总计] - 总CPU使用率: 728.7% - 总内存占用: 16687.5MB - 总线程数: 331 - 总进程数: 21(1个主进程 + 16个工作进程 + 3个显示进程 + 1个分发进程) [GPU资源] - 平均利用率: 52.2% - 峰值显存: 7861.9MB - 平均算力: 18.9/35.6 TFLOPS - 算力利用率: 53.1% [延迟统计] - 测量次数: 327 - 最低延迟: 0.030秒 - 最高延迟: 3.756秒 - 平均延迟: 1.077秒 """ # python det_ocr_shipinliu_pre.py --streams 4 --gpu_id 0 cpu100 """ === 程序资源使用报告 === [系统信息] - CPU核心数: 10物理/16逻辑 - 系统内存: 63.8GB - 系统CPU使用率: 58.6% - 系统内存使用: 36.3GB / 63.8GB - GPU型号: NVIDIA GeForce RTX 4070 SUPER - GPU总显存: 12282.0MB [主进程资源] - 平均CPU使用率: 28.0% - 峰值CPU使用率: 53.1% - 平均内存占用: 386.4MB - 峰值内存占用: 386.8MB - 线程数: 9 [工作进程资源 (16个)] - 平均CPU使用率(每个进程): 48.0% - 总CPU使用率: 768.7% - 平均内存占用(每个进程): 1585.2MB - 总内存占用: 25363.6MB - 总线程数(峰值): 320 [所有进程总计] - 总CPU使用率: 796.7% - 总内存占用: 25750.1MB - 总线程数: 329 - 总进程数: 22(1个主进程 + 16个工作进程 + 4个显示进程 + 1个分发进程) [GPU资源] - 平均利用率: 52.9% - 峰值显存: 7991.3MB - 平均算力: 20.2/35.6 TFLOPS - 算力利用率: 56.8% [延迟统计] - 测量次数: 327 - 最低延迟: 1.480秒 - 最高延迟: 14.222秒 - 平均延迟: 8.113秒 """ # python det_ocr_shipinliu_pre.py --streams 5 --gpu_id 0 """ """ # python det_ocr_shipinliu_pre.py --streams 16 --gpu_id 0 """ """ # python det_ocr_shipinliu_pre.py --streams 20 --gpu_id 0 """ """ (yolov8_bt) (base) zhang@zhang:~/danger/yolov7_crnn_ocr_detection$ python det_ocr_shipinliu_pre.py --streams 1 --gpu_id 0 [硬件配置] - CPU核心: 10物理/16逻辑 - 内存: 62.6GB - 使用GPU 0: NVIDIA GeForce RTX 4070 SUPER 显存: 12282.0MB [测试配置] - 模拟视频流数: 1 - 视频源: rtsp://admin:guoxinzhike901@192.168.1.108/ [测试开始] 程序将运行30秒... In worker process 35804, np is <module 'numpy' from '/home/zhang/miniconda3/envs/yolov8_bt/lib/python3.9/site-packages/numpy/__init__.py'>, type(np.empty((1,))) = <class 'numpy.ndarray'> Loading weights/best.engine for TensorRT inference... [06/07/2025-18:54:11] [TRT] [I] Loaded engine size: 39 MiB [06/07/2025-18:54:11] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. [06/07/2025-18:54:13] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +33, now: CPU 0, GPU 33 (MiB) [06/07/2025-18:54:13] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +39, now: CPU 0, GPU 72 (MiB) 剩余时间: 290秒 剩余时间: 280秒 剩余时间: 270秒 (yolov8_bt) (base) zhang@zhang:~/danger/yolov7_crnn_ocr_detection$ python -c "import numpy as np; print(np.__version__)" 1.23.0 代码运行后报错

因为可视化之后的图片中红框很大,绿框完全没有包含病害,所以修改了锚框的大小 import numpy as np import torch import torchvision from torch.optim import lr_scheduler from torchvision.models.detection import fasterrcnn_resnet50_fpn from torchvision.models.detection.rpn import AnchorGenerator from torchvision.transforms import Compose, ToTensor, Normalize from torch.utils.data import DataLoader import torch.optim as optim from pycocotools.coco import COCO from pycocotools.cocoeval import COCOeval import os import matplotlib.pyplot as plt import cv2 from torch.utils.data import Dataset os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE" # ===================== 1. 修正后的CocoDataset ===================== class CocoDataset(Dataset): def __init__(self, img_dir, ann_file, transform=None): self.img_dir = img_dir self.coco = COCO(ann_file) self.img_ids = self.coco.getImgIds() self.transform = transform self.resize_size = (640, 640) def resize_image(self, img, target): orig_h, orig_w = img.shape[:2] new_w, new_h = self.resize_size # 等比例缩放+补边,避免小目标变形 scale = min(new_w / orig_w, new_h / orig_h) resize_w = int(orig_w * scale) resize_h = int(orig_h * scale) img = cv2.resize(img, (resize_w, resize_h)) pad_w = (new_w - resize_w) // 2 pad_h = (new_h - resize_h) // 2 img = cv2.copyMakeBorder(img, pad_h, pad_h, pad_w, pad_w, cv2.BORDER_CONSTANT, value=0) # 修正标注框 if len(target['boxes']) > 0: boxes = target['boxes'].numpy() boxes[:, [0, 2]] *= scale boxes[:, [1, 3]] *= scale boxes[:, [0, 2]] += pad_w boxes[:, [1, 3]] += pad_h valid = (boxes[:, 2] - boxes[:, 0] > 1) & (boxes[:, 3] - boxes[:, 1] > 1) valid &= (boxes[:, 0] >= 0) & (boxes[:, 1] >= 0) & (boxes[:, 2] <= new_w) & (boxes[:, 3] <= new_h) boxes = boxes[valid] target['boxes'] = torch.as_tensor(boxes, dtype=torch.float32) target['labels'] = target['labels'][valid] target['height'] = torch.tensor(new_h) target['width'] = torch.tensor(new_w) return img, target def __getitem__(self, idx): img_id = self.img_ids[idx] img_info = self.coco.loadImgs(img_id)[0] img_path = f"{self.img_dir}/{img_info['file_name']}" img = cv2.imread(img_path) if img is None: raise FileNotFoundError(f"图像不存在:{img_path}") img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) ann_ids = self.coco.getAnnIds(imgIds=img_id) anns = self.coco.loadAnns(ann_ids) boxes, labels = [], [] for ann in anns: if ann['category_id'] != 1: continue x1, y1, w, h = ann['bbox'] if w <= 0 or h <= 0: continue x2 = x1 + w y2 = y1 + h boxes.append([x1, y1, x2, y2]) labels.append(1) boxes = torch.as_tensor(boxes, dtype=torch.float32) if boxes else torch.empty((0, 4)) labels = torch.as_tensor(labels, dtype=torch.int64) if labels else torch.empty(0) target = {'boxes': boxes, 'labels': labels, 'image_id': torch.tensor([img_id])} img, target = self.resize_image(img, target) if self.transform: img = self.transform(img) return img, target def __len__(self): return len(self.img_ids) # ===================== 2. 可视化函数(修正反归一化) ===================== def visualize_predictions(image, targets, outputs, idx=0): img = image[idx].permute(1, 2, 0).cpu().numpy() img = (img * np.array([0.229, 0.224, 0.225])) + np.array([0.485, 0.456, 0.406]) img = np.clip(img, 0, 1) fig, ax = plt.subplots(1, 1, figsize=(12, 12)) ax.imshow(img) # 绘制真实框 gt_boxes = targets[idx]['boxes'].cpu() for box in gt_boxes: x1, y1, x2, y2 = box rect = plt.Rectangle((x1, y1), x2 - x1, y2 - y1, fill=False, color='green', linewidth=2) ax.add_patch(rect) # 绘制预测框 if len(outputs) > idx: pred = outputs[idx] keep = pred['scores'] > 0.1 # 降低阈值,显示更多小目标 pred_boxes = pred['boxes'][keep].cpu() pred_labels = pred['labels'][keep].cpu() pred_scores = pred['scores'][keep].cpu() for box, label, score in zip(pred_boxes, pred_labels, pred_scores): x1, y1, x2, y2 = box rect = plt.Rectangle((x1, y1), x2 - x1, y2 - y1, fill=False, color='red', linewidth=2) ax.text(x1, y1, f'{label}:{score:.2f}', color='red', fontsize=12) ax.add_patch(rect) plt.title("Green=GT, Red=Pred") plt.axis('off') plt.show() # ===================== 3. 主训练逻辑(无权重加载) ===================== os.makedirs('checkpoints', exist_ok=True) # 数据加载 transform = Compose([ToTensor(), Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) train_dataset = CocoDataset( img_dir=r'D:\Yolov8\coco_dataset1\images\train', ann_file=r'D:\Yolov8\coco_dataset1\annotations\train.json', transform=transform ) val_dataset = CocoDataset( img_dir=r'D:\Yolov8\coco_dataset1\images\val', ann_file=r'D:\Yolov8\coco_dataset1\annotations\val.json', transform=transform ) def collate_fn(batch): return tuple(zip(*batch)) train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True, collate_fn=collate_fn) val_loader = DataLoader(val_dataset, batch_size=1, shuffle=False, collate_fn=collate_fn) # 模型初始化(自定义小目标锚框,仅主干网络预训练) num_classes = 2 anchor_sizes = ((20, 40, 60),) # 适配小目标尺寸 aspect_ratios = ((0.5, 1.0, 2.0),) # 适配小目标宽高比 anchor_generator = AnchorGenerator(sizes=anchor_sizes, aspect_ratios=aspect_ratios) # 关键:仅主干网络加载预训练,RPN层随机初始化(无参数冲突) model = fasterrcnn_resnet50_fpn( pretrained=True, # 仅backbone预训练 rpn_anchor_generator=anchor_generator, min_size=640, max_size=640 ) in_features = model.roi_heads.box_predictor.cls_score.in_features model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes) device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') model = model.to(device) print(f"使用设备:{device}") # 优化器(统一学习率,适配小目标) optimizer = optim.SGD( model.parameters(), lr=1e-3, momentum=0.9, weight_decay=5e-4, nesterov=True ) scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5) # 更频繁的学习率衰减 # 训练循环(20轮,确保小目标特征学习充分) num_epochs = 20 for epoch in range(num_epochs): model.train() total_loss = 0.0 total_cls_loss = 0.0 total_reg_loss = 0.0 for batch_idx, (images, targets) in enumerate(train_loader): images = [img.to(device) for img in images] targets = [{k: v.to(device) for k, v in t.items()} for t in targets] loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) optimizer.zero_grad() losses.backward() optimizer.step() total_loss += losses.item() total_cls_loss += loss_dict['loss_classifier'].item() total_reg_loss += loss_dict['loss_box_reg'].item() if (batch_idx + 1) % 10 == 0: print( f"Epoch [{epoch + 1}/{num_epochs}], " f"Batch [{batch_idx + 1}/{len(train_loader)}], " f"Loss: {losses.item():.4f}, " f"Clf Loss: {loss_dict['loss_classifier'].item():.4f}, " f"Reg Loss: {loss_dict['loss_box_reg'].item():.4f}" ) avg_loss = total_loss / len(train_loader) avg_cls = total_cls_loss / len(train_loader) avg_reg = total_reg_loss / len(train_loader) print(f"Epoch {epoch + 1} - Avg Loss: {avg_loss:.4f}, Cls: {avg_cls:.4f}, Reg: {avg_reg:.4f}") scheduler.step() # 保存模型(仅保存当前训练的权重,无冲突) torch.save(model.state_dict(), f'checkpoints/faster_rcnn_epoch_{epoch + 1}.pth') # ===================== 4. 评估+可视化 ===================== print("\n开始评估模型...") model.eval() results = [] # 可视化验证集第一个样本 print("可视化验证集预测结果...") images, targets = next(iter(val_loader)) images = [img.to(device) for img in images] with torch.no_grad(): outputs = model(images) for output in outputs: print("预测标签:", output['labels'].cpu().numpy()) print("预测置信度:", output['scores'].cpu().numpy()) print("预测框坐标:", output['boxes'].cpu().numpy()) visualize_predictions(images, targets, outputs, idx=0) # 生成评估结果 with torch.no_grad(): for images, targets in val_loader: images = [img.to(device) for img in images] outputs = model(images) for output, target in zip(outputs, targets): img_id = target['image_id'].item() boxes = output['boxes'].cpu().numpy() scores = output['scores'].cpu().numpy() labels = output['labels'].cpu().numpy() for box, score, label in zip(boxes, scores, labels): if score < 0.01: # 极低阈值,保留所有可能框 continue x1, y1, x2, y2 = box w = x2 - x1 h = y2 - y1 if w <= 0 or h <= 0: continue results.append({ 'image_id': img_id, 'category_id': int(label), 'bbox': [float(x1), float(y1), float(w), float(h)], 'score': float(score) }) if len(results) == 0: print("⚠️ 警告:无有效预测结果!") else: cocoGt = val_dataset.coco if 'info' not in cocoGt.dataset: cocoGt.dataset['info'] = {'version': '1.0'} try: cocoDt = cocoGt.loadRes(results) cocoEval = COCOeval(cocoGt, cocoDt, iouType='bbox') cocoEval.evaluate() cocoEval.accumulate() cocoEval.summarize() print("✅ 评估完成") print(f"mAP@0.5:0.95 = {cocoEval.stats[0]:.4f}") print(f"mAP@0.5 = {cocoEval.stats[1]:.4f}") print(f"Recall@0.5 = {cocoEval.stats[6]:.4f}") except Exception as e: print("❌ 评估失败:", str(e)) 报错C:\Users\YangGuang\.conda\envs\pytorch\python.exe D:\Yolov8\.github\Faster-Rcnn\faster-rcnn.py loading annotations into memory... Done (t=0.00s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! C:\Users\YangGuang\.conda\envs\pytorch\Lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( C:\Users\YangGuang\.conda\envs\pytorch\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=FasterRCNN_ResNet50_FPN_Weights.COCO_V1. You can also use weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Traceback (most recent call last): File "D:\Yolov8\.github\Faster-Rcnn\faster-rcnn.py", line 163, in <module> model = fasterrcnn_resnet50_fpn( ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\YangGuang\.conda\envs\pytorch\Lib\site-packages\torchvision\models\_utils.py", line 142, in wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\YangGuang\.conda\envs\pytorch\Lib\site-packages\torchvision\models\_utils.py", line 228, in inner_wrapper return builder(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\YangGuang\.conda\envs\pytorch\Lib\site-packages\torchvision\models\detection\faster_rcnn.py", line 577, in fasterrcnn_resnet50_fpn model.load_state_dict(weights.get_state_dict(progress=progress, check_hash=True)) File "C:\Users\YangGuang\.conda\envs\pytorch\Lib\site-packages\torch\nn\modules\module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for FasterRCNN: size mismatch for rpn.head.cls_logits.weight: copying a param with shape torch.Size([3, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([9, 256, 1, 1]). size mismatch for rpn.head.cls_logits.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([9]). size mismatch for rpn.head.bbox_pred.weight: copying a param with shape torch.Size([12, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([36, 256, 1, 1]). size mismatch for rpn.head.bbox_pred.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([36]). 进程已结束,退出代码为 1

D:\create\conda1\envs\python37\python.exe D:\create\programm\yolov5-master\train.py github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5 train: weights=yolov5s.pt, cfg=models/yolov5s.yaml, data=data\sign.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=300, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=data\hyps, resume_evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False YOLOv5 2025-8-14 Python-3.13.5 torch-2.6.0+cu126 CUDA:0 (NVIDIA GeForce RTX 4060 Laptop GPU, 8188MiB) hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 runs in Comet TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/ from n params module arguments 0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2] 1 -1 1 18560 models.common.Conv [32, 64, 3, 2] 2 -1 1 18816 models.common.C3 [64, 64, 1] 3 -1 1 73984 models.common.Conv [64, 128, 3, 2] 4 -1 2 115712 models.common.C3 [128, 128, 2] 5 -1 1 295424 models.common.Conv [128, 256, 3, 2] 6 -1 3 625152 models.common.C3 [256, 256, 3] 7 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 8 -1 1 1182720 models.common.C3 [512, 512, 1] 9 -1 1 656896 models.common.SPPF [512, 512, 5] 10 -1 1 131584 models.common.Conv [512, 256, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 361984 models.common.C3 [512, 256, 1, False] 14 -1 1 33024 models.common.Conv [256, 128, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 90880 models.common.C3 [256, 128, 1, False] 18 -1 1 147712 models.common.Conv [128, 128, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 296448 models.common.C3 [256, 256, 1, False] 21 -1 1 590336 models.common.Conv [256, 256, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 1182720 models.common.C3 [512, 512, 1, False] 24 [17, 20, 23] 1 64728 models.yolo.Detect [19, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] YOLOv5s summary: 214 layers, 7070872 parameters, 7070872 gradients, 16.1 GFLOPs Transferred 342/349 items from yolov5s.pt D:\create\programm\yolov5-master\models\common.py:906: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. with amp.autocast(autocast): D:\create\programm\yolov5-master\models\common.py:906: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. with amp.autocast(autocast): AMP: checks passed optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 60 weight(decay=0.0005), 60 bias train: Scanning D:\create\programm\datasets\labels\train.cache... 493 images, 7 backgrounds, 0 corrupt: 100%|██████████| 500/500 [00:00<?, ?it/s] val: Scanning D:\create\programm\datasets\labels\val.cache... 7 images, 5 backgrounds, 0 corrupt: 100%|██████████| 12/12 [00:00<?, ?it/s] OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ AutoAnchor: 6.00 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset Plotting labels to runs\train\exp16\labels.jpg... OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/ 进程已结束,退出代码为 3

import torch import torch.nn as nn from utils.trainer import model_init_ from utils.build import check_cfg, build_from_cfg import os import glob from torchvision import transforms, datasets from PIL import Image, ImageDraw, ImageFont import time from graphic.RawDataProcessor import generate_images import imageio import sys import cv2 import numpy as np from torch.utils.data import DataLoader try: from DetModels import YOLOV5S from DetModels.yolo.basic import LoadImages, Profile, Path, non_max_suppression, Annotator, scale_boxes, colorstr, \ Colors, letterbox except ImportError: pass # Current directory and metric directory current_dir = os.path.dirname(os.path.abspath(__file__)) METRIC = os.path.join(current_dir, './metrics') sys.path.append(METRIC) sys.path.append(current_dir) sys.path.append('utils/DetModels/yolo') try: from .metrics.base_metric import EVAMetric except ImportError: pass from logger import colorful_logger # Supported image and raw data extensions image_ext = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff'] raw_data_ext = ['.iq', '.dat'] class Classify_Model(nn.Module): """ A class representing a classification model for performing inference and benchmarking using a pre-trained model. Attributes: - logger (colorful_logger): Logger for logging messages with color. - cfg (str): Path to configuration dictionary. - device (str): Device to use for inference (CPU or GPU). - model (torch.nn.Module): Pre-trained model. - save_path (str): Path to save the results. - save (bool): Flag to indicate whether to save the results. """ def __init__(self, cfg: str = '../configs/exp1_test.yaml', weight_path: str = '../default.path', save: bool = True, ): """ Initializes the Classify_Model. Parameters: - cfg (str): Path to configuration dictionary. - weight_path (str): Path to the pre-trained model weights. - save (bool): Flag to indicate whether to save the results. """ super().__init__() self.logger = self.set_logger if check_cfg(cfg): self.logger.log_with_color(f"Using config file: {cfg}") self.cfg = build_from_cfg(cfg) if self.cfg['device'] == 'cuda': if torch.cuda.is_available(): self.logger.log_with_color("Using GPU for inference") self.device = self.cfg['device'] else: self.logger.log_with_color("Using CPU for inference") self.device = "cpu" if os.path.exists(weight_path): self.logger.log_with_color(f"Using weight file: {weight_path}") self.weight_path = weight_path else: raise FileNotFoundError(f"weight path: {weight_path} does not exist") self.model = self.load_model self.model.to(self.device) self.model.eval() self.save_path = None self.save = save self.confidence_threshold = self.cfg.get('confidence_threshold', 0.49) self.logger.log_with_color(f"Using confidence threshold: {self.confidence_threshold * 100}%") def inference(self, source='../example/', save_path: str = '../result'): """ Performs inference on the given source data. Parameters: - source (str): Path to the source data. - save_path (str): Path to save the results. """ torch.no_grad() if self.save: if not os.path.exists(save_path): os.mkdir(save_path) self.save_path = save_path self.logger.log_with_color(f"Saving results to: {save_path}") if not os.path.exists(source): self.logger.log_with_color(f"Source {source} dose not exit") # dir detect if os.path.isdir(source): data_list = glob.glob(os.path.join(source, '*')) for data in data_list: # detect images in dir if is_valid_file(data, image_ext): self.ImgProcessor(data) # detect raw datas in dir elif is_valid_file(data, raw_data_ext): self.RawdataProcess(data) else: continue # detect single image elif is_valid_file(source, image_ext): self.ImgProcessor(source) # detect single pack of raw data elif is_valid_file(source, raw_data_ext): self.RawdataProcess(source) def forward(self, img): """ Forward pass through the model. Parameters: - img (torch.Tensor): Input image tensor. Returns: - probability (float): Confidence probability of the predicted class. - predicted_class_name (str): Name of the predicted class. """ self.model.eval() temp = self.model(img) probabilities = torch.softmax(temp, dim=1) predicted_class_index = torch.argmax(probabilities, dim=1).item() predicted_class_name = get_key_from_value(self.cfg['class_names'], predicted_class_index) probability = probabilities[0][predicted_class_index].item() * 100 return probability, predicted_class_name @property def load_model(self): """ Loads the pre-trained model. Returns: - model (torch.nn.Module): Loaded model. """ self.logger.log_with_color(f"Using device: {self.device}") # model = model_init_(self.cfg['model'], self.cfg['num_classes'], pretrained=True) model = model_init_(self.cfg['model'], self.cfg['num_classes'], pretrained_path=None) if os.path.exists(self.weight_path): self.logger.log_with_color(f"Loading init weights from: {self.weight_path}") # state_dict = torch.load(self.weight_path, map_location=self.device) state_dict = torch.load(self.weight_path, map_location=self.device, weights_only=True) model.load_state_dict(state_dict) self.logger.log_with_color(f"Successfully loaded pretrained weights from: {self.weight_path}") else: self.logger.log_with_color(f"init weights file not found at: {self.weight_path}. Skipping weight loading.") return model def ImgProcessor(self, source): """ Performs inference on spectromgram data. Parameters: - source (str): Path to the image. """ start_time = time.time() name = os.path.basename(source)[:-4] origin_image = Image.open(source).convert('RGB') preprocessed_image = self.preprocess(source) # 提取文件名(仅保留文件名,不含路径) filename = os.path.basename(source) temp = self.model(preprocessed_image) probabilities = torch.softmax(temp, dim=1) # # 新增:获取最大概率和对应类别索引 max_prob, predicted_class_index = torch.max(probabilities, dim=1) max_prob_val = max_prob.item() # 转换为浮点数' # 核心:计算unknown置信度为1 - 最高置信度(转换为百分比) unknown_prob = (1 - max_prob_val) * 100 # 已知类别置信度为模型输出值(转换为百分比) known_prob = max_prob_val * 100 # predicted_class_index = torch.argmax(probabilities, dim=1).item() # predicted_class_name = get_key_from_value(self.cfg['class_names'], predicted_class_index) if max_prob_val < self.confidence_threshold: predicted_class_name = 'unknown' current_prob = unknown_prob # 使用1-置信度 else: predicted_class_name = get_key_from_value(self.cfg['class_names'], predicted_class_index.item()) current_prob = known_prob # 使用模型原始置信度 end_time = time.time() self.logger.log_with_color(f"Inference time: {(end_time - start_time) / 100 :.8f} sec") # self.logger.log_with_color(f"{source} contains Drone: {predicted_class_name}, " # f"confidence1: {probabilities[0][predicted_class_index].item() * 100 :.2f} %," # f" start saving result") #这个版本是对未知机型置信度做了处理 # self.logger.log_with_color(f"{source} contains Drone: {predicted_class_name}, confidence: {current_prob:.2f}%") # 仅输出:文件名、机型、置信度(简化格式) self.logger.log_with_color(f"{filename}, contains Drone: {predicted_class_name}, {current_prob:.2f}%, 推理时间: {(end_time - start_time):.6f} sec") if self.save: # res = self.add_result(res=predicted_class_name, # probability=probabilities[0][predicted_class_index].item() * 100, # image=origin_image) res = self.add_result(res=predicted_class_name, probability=current_prob, image=origin_image) res.save(os.path.join(self.save_path, name + '.jpg')) def RawdataProcess(self, source): """ Transforming raw data into a video and performing inference on video. Parameters: - source (str): Path to the raw data. """ res = [] images = generate_images(source) name = os.path.splitext(os.path.basename(source)) for image in images: temp = self.model(self.preprocess(image)) probabilities = torch.softmax(temp, dim=1) predicted_class_index = torch.argmax(probabilities, dim=1).item() predicted_class_name = get_key_from_value(self.cfg['class_names'], predicted_class_index) _ = self.add_result(res=predicted_class_name, probability=probabilities[0][predicted_class_index].item() * 100, image=image) res.append(_) imageio.mimsave(os.path.join(self.save_path, name + '.mp4'), res, fps=5) def add_result(self, res, image, position=(40, 40), font="arial.ttf", font_size=45, text_color=(255, 0, 0), probability=0.0 ): """ Adds the inference result to the image. Parameters: - res (str): Inference result. - image (PIL.Image): Input image. - position (tuple): Position to add the text. - font (str): Font file path. - font_size (int): Font size. - text_color (tuple): Text color. - probability (float): Confidence probability. Returns: - image (PIL.Image): Image with added result. """ draw = ImageDraw.Draw(image) font = ImageFont.truetype("C:/Windows/Fonts/simhei.ttf", font_size) draw.text(position, res + f" {probability:.2f}%", fill=text_color, font=font) return image @property def set_logger(self): """ Sets up the logger. Returns: - logger (colorful_logger): Logger instance. """ logger = colorful_logger('Inference') return logger def preprocess(self, img): transform = transforms.Compose([ transforms.Resize((self.cfg['image_size'], self.cfg['image_size'])), transforms.ToTensor(), ]) image = Image.open(img).convert('RGB') preprocessed_image = transform(image) preprocessed_image = preprocessed_image.to(self.device) preprocessed_image = preprocessed_image.unsqueeze(0) return preprocessed_image def benchmark(self, data_path, save_path=None): """ Performs benchmarking on the given data and calculates evaluation metrics. Parameters: - data_path (str): Path to the benchmark data. Returns: - metrics (dict): Dictionary containing evaluation metrics. """ snrs = os.listdir(data_path) if not save_path: save_path = os.path.join(data_path, 'benchmark result') if not os.path.exists(save_path): os.mkdir(save_path) if not os.path.exists(save_path): os.mkdir(save_path) #根据得到映射关系写下面的,我得到的是★ 最佳映射 pred → gt: {0: 2, 1: 1, 2: 3, 3: 4, 4: 0} #MAP_P2G=torch.tensor([2,1,3,4,0],device=self.cfg['device']) #INV_MAP=torch.argsort(MAP_P2G) with torch.no_grad(): for snr in snrs: CMS = os.listdir(os.path.join(data_path, snr)) for CM in CMS: stat_time = time.time() self.model.eval() _dataset = datasets.ImageFolder( root=os.path.join(data_path, snr, CM), transform=transforms.Compose([ transforms.Resize((self.cfg['image_size'], self.cfg['image_size'])), transforms.ToTensor(),]) ) dataset = DataLoader(_dataset, batch_size=self.cfg['batch_size'], shuffle=self.cfg['shuffle']) print("Starting Benchmark...") correct = 0 total = 0 probabilities = [] total_labels = [] classes_name = tuple(self.cfg['class_names'].keys()) cm_raw = np.zeros((5, 5), dtype=int) for images, labels in dataset: images, labels = images.to(self.cfg['device']), labels.to(self.cfg['device']) outputs = self.model(images) #outputs=outputs[:,INV_MAP] #probs =torch.softmax(outputs,dim=1) for output in outputs: probabilities.append(list(torch.softmax(output, dim=0))) _, predicted = outputs.max(1) for p, t in zip(predicted.cpu(), labels.cpu()): cm_raw[p,t]+=1 cm_raw[p, t] += 1 # 行 = pred, 列 = gt total += labels.size(0) correct += predicted.eq(labels).sum().item() total_labels.append(labels) _total_labels = torch.concat(total_labels, dim=0) _probabilities = torch.tensor(probabilities) metrics = EVAMetric(preds=_probabilities.to(self.cfg['device']), labels=_total_labels, num_classes=self.cfg['num_classes'], tasks=('f1', 'precision', 'CM'), topk=(1, 3, 5), save_path=save_path, classes_name=classes_name, pic_name=f'{snr}_{CM}') metrics['acc'] = 100 * correct / total s = (f'{snr} ' + f'CM: {CM} eva result:' + ' acc: ' + f'{metrics["acc"]}' + ' top-1: ' + f'{metrics["Top-k"]["top1"]}' + ' top-1: ' + f'{metrics["Top-k"]["top1"]}' + ' top-2 ' + f'{metrics["Top-k"]["top2"]}' + ' top-3 ' + f'{metrics["Top-k"]["top3"]}' + ' mAP: ' + f'{metrics["mAP"]["mAP"]}' + ' macro_f1: ' + f'{metrics["f1"]["macro_f1"]}' + ' micro_f1 : ' + f' {metrics["f1"]["micro_f1"]}\n') txt_path = os.path.join(save_path, 'benchmark_result.txt') colorful_logger(f'cost {(time.time()-stat_time)/60} mins') with open(txt_path, 'a') as file: file.write(s) print(f'{CM} Done!') print(f'{snr} Done!') row_ind, col_ind = linear_sum_assignment(-cm_raw) # 取负→最大化对角线 mapping_pred2gt = {int(r): int(c) for r, c in zip(row_ind, col_ind)} print("\n★ 最佳映射 pred → gt:", mapping_pred2gt) # 若要保存下来以后用: import json json.dump(mapping_pred2gt, open('class_to_idx_pred2gt.json', 'w')) print("映射已保存到 class_to_idx_pred2gt.json") class Detection_Model: """ A common interface for initializing and running different detection models. This class provides methods to initialize and run object detection models such as YOLOv5 and Faster R-CNN. It allows for easy switching between different models by providing a unified interface. Attributes: - S1model: The initialized detection model (e.g., YOLOv5S). - model_name: The name of the detection model to be used. - weight_path: The path to the pre-trained model weights. Methods: - __init__(self, cfg=None, model_name=None, weight_path=None): Initializes the detection model based on the provided configuration or parameters. If a configuration dictionary cfg is provided, it will be used to set the model name and weight path. Otherwise, the model_name and weight_path parameters can be specified directly. - yolov5_detect(self, source='../example/source/', save_dir='../res', imgsz=(640, 640), conf_thres=0.6, iou_thres=0.45, max_det=1000, line_thickness=3, hide_labels=True, hide_conf=False): Runs YOLOv5 object detection on the specified source. - source: Path to the input image or directory containing images. - save_dir: Directory to save the detection results. - imgsz: Image size for inference (height, width). - conf_thres: Confidence threshold for filtering detections. - iou_thres: IoU threshold for non-maximum suppression. - max_det: Maximum number of detections per image. - line_thickness: Thickness of the bounding box lines. - hide_labels: Whether to hide class labels in the output. - hide_conf: Whether to hide confidence scores in the output. - faster_rcnn_detect(self, source='../example/source/', save_dir='../res', weight_path='../example/detect/', imgsz=(640, 640), conf_thres=0.25, iou_thres=0.45, max_det=1000, line_thickness=3, hide_labels=False, hide_conf=False): Placeholder method for running Faster R-CNN object detection. This method is currently not implemented and should be replaced with the actual implementation. """ def __init__(self, cfg=None, model_name=None, weight_path=None): if cfg: model_name = cfg['model_name'] weight_path = cfg['weight_path'] if model_name == 'yolov5': self.S1model = YOLOV5S(weights=weight_path) self.S1model.inference = self.yolov5_detect # ToDo elif model_name == 'faster_rcnn': self.S1model = YOLOV5S(weights=weight_path) self.S1model.inference = self.yolov5_detect else: if model_name == 'yolov5': self.S1model = YOLOV5S(weights=weight_path) self.S1model.inference = self.yolov5_detect # ToDo elif model_name == 'faster_rcnn': self.S1model = YOLOV5S(weights=weight_path) self.S1model.inference = self.yolov5_detect def yolov5_detect(self, source='../example/source/', save_dir='../res', imgsz=(640, 640), conf_thres=0.6, iou_thres=0.45, max_det=1000, line_thickness=3, hide_labels=True, hide_conf=False, ): color = Colors() detmodel = self.S1model stride, names = detmodel.stride, detmodel.names torch.no_grad() # Run inference if isinstance(source, np.ndarray): detmodel.eval() im = letterbox(source, imgsz, stride=stride, auto=True)[0] # padded resize im = im.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB im = np.ascontiguousarray(im) # contiguous im = torch.from_numpy(im).to(detmodel.device) im = im.float() # uint8 to fp16/32 im /= 255 # 0 - 255 to 0.0 - 1.0 if len(im.shape) == 3: im = im[None] # expand for batch dim # Inference pred = detmodel(im) # NMS pred = non_max_suppression(pred, conf_thres, iou_thres, agnostic=False, max_det=max_det) # Process predictions for i, det in enumerate(pred): # per image annotator = Annotator(source, line_width=line_thickness, example=str(names)) if len(det): # Rescale boxes from img_size to im0 size det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], source.shape).round() # Print results for c in det[:, 5].unique(): n = (det[:, 5] == c).sum() # detections per class for *xyxy, conf, cls in reversed(det): c = int(cls) # integer class label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}') annotator.box_label(xyxy, label, color=color(c + 2, True)) # Stream results im0 = annotator.result() # Save results (image with detections) return im0 else: # Ensure the save directory exists os.makedirs(save_dir, exist_ok=True) dataset = LoadImages(source, img_size=imgsz, stride=stride) seen, windows, dt = 0, [], (Profile(), Profile(), Profile()) for path, im, im0s, s in dataset: im = torch.from_numpy(im).to(detmodel.device) im = im.float() # uint8 to fp16/32 im /= 255 # 0 - 255 to 0.0 - 1.0 if len(im.shape) == 3: im = im[None] # expand for batch dim # Inference pred = detmodel(im) # NMS pred = non_max_suppression(pred, conf_thres, iou_thres, agnostic=False, max_det=max_det) # Process predictions for i, det in enumerate(pred): # per image seen += 1 p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0) p = Path(p) # to Path save_path = str(save_dir + p.name) # im.jpg s += '%gx%g ' % im.shape[2:] # print string annotator = Annotator(im0, line_width=line_thickness, example=str(names)) if len(det): # Rescale boxes from img_size to im0 size det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round() # Print results for c in det[:, 5].unique(): n = (det[:, 5] == c).sum() # detections per class s += f"{n} {names[int(c)]}{'s' * (n > 1)}, " # add to string for *xyxy, conf, cls in reversed(det): c = int(cls) # integer class label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}') annotator.box_label(xyxy, label, color=color(c + 2, True)) # Stream results im0 = annotator.result() # Save results (image with detections) if save_dir == 'buffer': return im0 else: cv2.imwrite(save_path, im0) del im0 # Release memory after saving # Print results print(f"Results saved to {colorstr('bold', save_dir)}") #ToDo def faster_rcnn_detect(self, source='../example/source/', save_dir='../res', weight_path='../example/detect/', imgsz=(640, 640), conf_thres=0.25, iou_thres=0.45, max_det=1000, line_thickness=3, hide_labels=False, hide_conf=False, ): pass def is_valid_file(path, total_ext): """ Checks if the file has a valid extension. Parameters: - path (str): Path to the file. - total_ext (list): List of valid extensions. Returns: - bool: True if the file has a valid extension, False otherwise. """ last_element = os.path.basename(path) if any(last_element.lower().endswith(ext) for ext in total_ext): return True else: return False def get_key_from_value(d, value): """ Gets the key from a dictionary based on the value. Parameters: - d (dict): Dictionary. - value: Value to find the key for. Returns: - key: Key corresponding to the value, or None if not found. """ for key, val in d.items(): if val == value: return key return None def preprocess_image_yolo(im0, imgsz, stride, detmodel): im = letterbox(im0, imgsz, stride=stride, auto=True)[0] # padded resize im = im.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB im = np.ascontiguousarray(im) # contiguous im = torch.from_numpy(im).to(detmodel.device) im = im.float() # uint8 to fp16/32 im /= 255 # 0 - 255 to 0.0 - 1.0 if len(im.shape) == 3: im = im[None] # expand for batch dim return im def process_predictions_yolo(det, im, im0, names, line_thickness, hide_labels, hide_conf, color): annotator = Annotator(im0, line_width=line_thickness, example=str(names)) if len(det): # Rescale boxes from img_size to im0 size det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round() # Print results for c in det[:, 5].unique(): n = (det[:, 5] == c).sum() # detections per class for *xyxy, conf, cls in reversed(det): c = int(cls) # integer class label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}') annotator.box_label(xyxy, label, color=color(c + 2, True)) # Stream results im0 = annotator.result() return im0 # Usage----------------------------------------------------------------------------------------------------------------- def main(): """ cfg = '' weight_path = '' source = '' save_path = '' test = Classify_Model(cfg=cfg, weight_path=weight_path) test.inference(source=source, save_path=save_path) # test.benchmark() """ """ source = '' weight_path = '' save_dir = '' test = Detection_Model(model_name='yolov5', weight_path=weight_path) test.yolov5_detect(source=source, save_dir=save_dir,) """ if __name__ == '__main__': main() 报错部分代码是这样的,我该怎么做,才能让我的推理正常跑通呢

cyt@TT:~/ultralytics-main$ python3 mytrain.py Ultralytics 8.3.205 🚀 Python-3.10.12 torch-2.8.0+cu128 CUDA:0 (NVIDIA GeForce RTX 4060 Laptop GPU, 7915MiB) engine/trainer: agnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=-1, bgr=0.0, box=7.5, cache=ram, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/home/cyt/ultralytics-main/ultralytics/cfg/datasets/elderly.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=100, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=/home/cyt/ultralytics-main/yolo11n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=myresult2, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=results, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=/home/cyt/ultralytics-main/results/myresult2, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=True, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=1, workspace=None A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2. Traceback (most recent call last): File "/home/cyt/ultralytics-main/mytrain.py", line 5, in <module> model.train( File "/home/cyt/ultralytics-main/ultralytics/engine/model.py", line 795, in train self.trainer = (trainer or self._smart_load("trainer"))(overrides=args, _callbacks=self.callbacks) File "/home/cyt/ultralytics-main/ultralytics/models/yolo/detect/train.py", line 65, in __init__ super().__init__(cfg, overrides, _callbacks) File "/home/cyt/ultralytics-main/ultralytics/engine/trainer.py", line 158, in __init__ self.data = self.get_dataset() File "/home/cyt/ultralytics-main/ultralytics/engine/trainer.py", line 634, in get_dataset data = check_det_dataset(self.args.data) File "/home/cyt/ultralytics-main/ultralytics/data/utils.py", line 480, in check_det_dataset check_font("Arial.ttf" if is_ascii(data["names"]) else "Arial.Unicode.ttf") # download fonts File "/home/cyt/ultralytics-main/ultralytics/utils/__init__.py", line 500, in decorated return f(*args, **kwargs) File "/home/cyt/ultralytics-main/ultralytics/utils/checks.py", line 325, in check_font from matplotlib import font_manager # scope for faster 'import ultralytics' File "/usr/lib/python3/dist-packages/matplotlib/__init__.py", line 109, in <module> from . import _api, _version, cbook, docstring, rcsetup File "/usr/lib/python3/dist-packages/matplotlib/rcsetup.py", line 27, in <module> from matplotlib.colors import Colormap, is_color_like File "/usr/lib/python3/dist-packages/matplotlib/colors.py", line 56, in <module> from matplotlib import _api, cbook, scale File "/usr/lib/python3/dist-packages/matplotlib/scale.py", line 23, in <module> from matplotlib.ticker import ( File "/usr/lib/python3/dist-packages/matplotlib/ticker.py", line 136, in <module> from matplotlib import transforms as mtransforms File "/usr/lib/python3/dist-packages/matplotlib/transforms.py", line 46, in <module> from matplotlib._path import ( AttributeError: _ARRAY_API not found Traceback (most recent call last): File "/home/cyt/ultralytics-main/ultralytics/engine/trainer.py", line 634, in get_dataset data = check_det_dataset(self.args.data) File "/home/cyt/ultralytics-main/ultralytics/data/utils.py", line 480, in check_det_dataset check_font("Arial.ttf" if is_ascii(data["names"]) else "Arial.Unicode.ttf") # download fonts File "/home/cyt/ultralytics-main/ultralytics/utils/__init__.py", line 500, in decorated return f(*args, **kwargs) File "/home/cyt/ultralytics-main/ultralytics/utils/checks.py", line 325, in check_font from matplotlib import font_manager # scope for faster 'import ultralytics' File "/usr/lib/python3/dist-packages/matplotlib/__init__.py", line 109, in <module> from . import _api, _version, cbook, docstring, rcsetup File "/usr/lib/python3/dist-packages/matplotlib/rcsetup.py", line 27, in <module> from matplotlib.colors import Colormap, is_color_like File "/usr/lib/python3/dist-packages/matplotlib/colors.py", line 56, in <module> from matplotlib import _api, cbook, scale File "/usr/lib/python3/dist-packages/matplotlib/scale.py", line 23, in <module> from matplotlib.ticker import ( File "/usr/lib/python3/dist-packages/matplotlib/ticker.py", line 136, in <module> from matplotlib import transforms as mtransforms File "/usr/lib/python3/dist-packages/matplotlib/transforms.py", line 46, in <module> from matplotlib._path import ( ImportError: numpy.core.multiarray failed to import The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/cyt/ultralytics-main/mytrain.py", line 5, in <module> model.train( File "/home/cyt/ultralytics-main/ultralytics/engine/model.py", line 795, in train self.trainer = (trainer or self._smart_load("trainer"))(overrides=args, _callbacks=self.callbacks) File "/home/cyt/ultralytics-main/ultralytics/models/yolo/detect/train.py", line 65, in __init__ super().__init__(cfg, overrides, _callbacks) File "/home/cyt/ultralytics-main/ultralytics/engine/trainer.py", line 158, in __init__ self.data = self.get_dataset() File "/home/cyt/ultralytics-main/ultralytics/engine/trainer.py", line 638, in get_dataset raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e RuntimeError: Dataset '/home/cyt/ultralytics-main/ultralytics/cfg/datasets/elderly.yaml' error ❌ numpy.core.multiarray failed to import

import torch import torch.nn as nn import torchvision from torchvision.datasets import VOCDetection from torchvision import transforms from torch.utils.data import Dataset import albumentations as A from albumentations.pytorch import ToTensorV2 import numpy as np import matplotlib.pyplot as plt from datasets import load_dataset import os import torch.nn.functional as F #----------------------------------------------------------------------------------------------------------------------# def collate_fn(batch): """ 安全地将一批样本组合成一个 batch。 图像堆叠,目标保留为列表。 """ images = [] targets = [] for img, target in batch: assert img.is_contiguous(), "Image tensor is not contiguous!" # 可选调试 images.append(img) targets.append(target) # 堆叠图像 → [B,C,H,W] images = torch.stack(images, dim=0) return images, targets #----------------------------------------------------------------------------------------------------------------------# #模型构建 #基础组件 class Conv(nn.Module): def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=None, groups=1): super().__init__() if padding is None: padding = (kernel_size - 1) // 2 self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, groups=groups, bias=False) self.bn = nn.BatchNorm2d(out_channels) self.act = nn.SiLU() # YOLOv8 使用 SiLU 激活函数 def forward(self, x): return self.act(self.bn(self.conv(x))) #基础组件 class C2f(nn.Module): def __init__(self, in_channels, out_channels, num_blocks=2, shortcut=False): super().__init__() self.out_channels = out_channels hidden_channels = int(out_channels * 0.5) self.conv1 = Conv(in_channels, hidden_channels * 2, 1) self.conv2 = Conv((hidden_channels * 2 + hidden_channels * num_blocks), out_channels, 1) self.blocks = nn.ModuleList() for _ in range(num_blocks): self.blocks.append(Conv(hidden_channels, hidden_channels, 3)) self.shortcut = shortcut def forward(self, x): y = list(self.conv1(x).chunk(2, 1)) # Split into two halves for block in self.blocks: y.append(block(y[-1])) return self.conv2(torch.cat(y, dim=1)) #基础组件 class SPPF(nn.Module): def __init__(self, in_channels, out_channels, k=5): super().__init__() hidden_channels = in_channels // 2 self.conv1 = Conv(in_channels, hidden_channels, 1) self.pool = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2) self.conv2 = Conv(hidden_channels * 4, out_channels, 1) def forward(self, x): x = self.conv1(x) pool1 = self.pool(x) pool2 = self.pool(pool1) pool3 = self.pool(pool2) return self.conv2(torch.cat([x, pool1, pool2, pool3], dim=1)) #主干网络 class Backbone(nn.Module): def __init__(self): super().__init__() self.stage1 = nn.Sequential( Conv(3, 64, 3, 2), Conv(64, 128, 3, 2), C2f(128, 128, 3) ) self.stage2 = nn.Sequential( Conv(128, 256, 3, 2), C2f(256, 256, 6) ) self.stage3 = C2f(256, 512, 6) # 不再下采样,保持 80x80 self.stage4 = nn.Sequential( Conv(512, 1024, 3, 2), C2f(1024, 1024, 3), SPPF(1024, 1024) ) def forward(self, x): x = self.stage1(x) x = self.stage2(x) # [B,256,80,80] c3 = self.stage3(x) # [B,512,80,80] c4 = self.stage4(c3) # [B,1024,40,40] c5 = F.max_pool2d(c4, 2) # [B,1024,20,20] return c3, c4, c5 def forward(self, x): x = self.stage1(x) x = self.stage2(x) c3 = self.stage3(x) # [B, 512, 80, 80] c4 = self.stage4(c3) # [B,1024, 40, 40] c5 = F.max_pool2d(c4, 2) # [B,1024, 20, 20] → 模拟 stage5 下采样 return c3, c4, c5 #特征融合 class Neck(nn.Module): def __init__(self): super().__init__() # Top-down path self.conv1 = Conv(1024, 512, 1) # c4 → p4 self.upsample = nn.Upsample(scale_factor=2, mode='nearest') self.c2f1 = C2f(512 + 512, 512, 3) # p4_up + c3 → p3 self.conv3 = Conv(512, 512, 3, 2) # p3 → down to p4 level self.c2f3 = C2f(512 + 512, 1024, 3) # p3_down + p4 → p4_out self.conv4 = Conv(1024, 1024, 3, 2) # p4_out → down to p5 level self.c2f4 = C2f(1024 + 1024, 1024, 3) # p4_out_down + c5 → p5_out def forward(self, c3, c4, c5): # Top-down p4 = self.conv1(c4) # [B,512,40,40] p4_up = self.upsample(p4) # [B,512,80,80] p4_cat = torch.cat([p4_up, c3], dim=1) # [B,1024,80,80] p3 = self.c2f1(p4_cat) # [B,512,80,80] # Bottom-up: from p3 to p4 p3_down = self.conv3(p3) # [B,512,40,40] p3_down_cat = torch.cat([p3_down, p4], dim=1) # [B,1024,40,40] p4_out = self.c2f3(p3_down_cat) # [B,1024,40,40] # Bottom-up: from p4_out to p5 p4_down = self.conv4(p4_out) # [B,1024,20,20] p4_down_cat = torch.cat([p4_down, c5], dim=1) # [B,2048,20,20] p5_out = self.c2f4(p4_down_cat) # [B,1024,20,20] return p3, p4_out, p5_out #解耦检测头 class DecoupledHead(nn.Module): def __init__(self, in_channels, num_classes=80): super().__init__() # 分离的 3×3 卷积分支 self.cls_conv = nn.Conv2d(in_channels, in_channels, 3, padding=1) self.reg_conv = nn.Conv2d(in_channels, in_channels, 3, padding=1) # 预测层 self.cls_pred = nn.Conv2d(in_channels, num_classes, 1) self.reg_pred = nn.Conv2d(in_channels, 4, 1) # tx, ty, tw, th self.obj_pred = nn.Conv2d(in_channels, 1, 1) # objectness self.act = nn.SiLU() def forward(self, x): c = self.act(self.cls_conv(x)) r = self.act(self.reg_conv(x)) cls = self.cls_pred(c) reg = self.reg_pred(r) obj = self.obj_pred(r) return torch.cat([reg, obj, cls], dim=1) #多尺度检测头 class Detect(nn.Module): def __init__(self, num_classes=80): super().__init__() self.head_small = DecoupledHead(512, num_classes) # p3 self.head_medium = DecoupledHead(1024, num_classes) # p4_out self.head_large = DecoupledHead(1024, num_classes) # p5_out def forward(self, x): p3, p4, p5 = x pred_small = self.head_small(p3) # shape: (B, 4 + 1 + num_classes, H/8, W/8) pred_medium = self.head_medium(p4) # (B, ..., H/16, W/16) pred_large = self.head_large(p5) # (B, ..., H/32, W/32) return [pred_small, pred_medium, pred_large] #YOLO-V8整体模型 class YOLOv8(nn.Module): def __init__(self, num_classes=80): super().__init__() self.backbone = Backbone() self.neck = Neck() self.detect = Detect(num_classes) def forward(self, x): c3, c4, c5 = self.backbone(x) # 修改为三输出 features = self.neck(c3, c4, c5) # 传入三参数 predictions = self.detect(features) return predictions #----------------------------------------------------------------------------------------------------------------------# #模型训练测试 def check_voc_dataset(path): voc_root = os.path.join(path, 'VOC2012') # 检查必要目录 required_dirs = [ ('Annotations', os.path.join(voc_root, 'Annotations')), ('JPEGImages', os.path.join(voc_root, 'JPEGImages')), ('ImageSets/Main', os.path.join(voc_root, 'ImageSets', 'Main')) ] for name, d in required_dirs: if not os.path.exists(d): raise FileNotFoundError(f"❌ 缺失目录: {name} -> {d}") print(f"✅ 找到目录: {name}") # 检查必要文件 required_files = { 'train.txt': os.path.join(voc_root, 'ImageSets', 'Main', 'train.txt'), 'val.txt': os.path.join(voc_root, 'ImageSets', 'Main', 'val.txt') } for name, f in required_files.items(): if not os.path.exists(f): raise FileNotFoundError(f"❌ 缺失文件: {name} -> {f}") if os.path.getsize(f) == 0: raise ValueError(f"❌ 文件为空: {name}") print(f"✅ 找到文件: {name}") # 验证 train.txt 内容合法性 train_txt = required_files['train.txt'] with open(train_txt, 'r') as f: lines = [line.strip() for line in f.readlines() if line.strip()] if len(lines) == 0: raise ValueError("❌ train.txt 无有效内容!") # 检查前几个样本是否存在对应 jpg/xml print(f"🔍 正在验证前 3 个样本的 jpg/xml 是否存在...") jpeg_dir = os.path.join(voc_root, 'JPEGImages') anno_dir = os.path.join(voc_root, 'Annotations') for img_id in lines[:3]: jpg_path = os.path.join(jpeg_dir, f"{img_id}.jpg") xml_path = os.path.join(anno_dir, f"{img_id}.xml") if not os.path.exists(jpg_path): raise FileNotFoundError(f"❌ 图像缺失: {jpg_path}") if not os.path.exists(xml_path): raise FileNotFoundError(f"❌ 标注缺失: {xml_path}") print(f"✅ 前 {len(lines)} 个训练样本均通过验证") return True # 设置路径(建议用绝对路径) # 获取当前脚本所在目录(YOLO/ 文件夹) script_dir = os.path.dirname(os.path.abspath(__file__)) print(f"📁 数据集根目录(存放 VOCdevkit 的位置): {script_dir}") # 检查是否存在 VOCdevkit vocdevkit_path = os.path.join(script_dir, 'VOCdevkit') if not os.path.exists(vocdevkit_path): raise FileNotFoundError(f"❌ 缺失 VOCdevkit 文件夹: {vocdevkit_path}") # 验证 VOC2012 子目录 voc2012_path = os.path.join(vocdevkit_path, 'VOC2012') if not os.path.exists(voc2012_path): raise FileNotFoundError(f"❌ 缺失 VOC2012: {voc2012_path}") # ✅ 正确加载方式:root = script_dir(包含 VOCdevkit 的目录) voc_train = VOCDetection( root=script_dir, # ⚠️ 不是 'VOCdevkit',而是它的父目录! year='2012', image_set='train', download=False, transform=None ) voc_val = VOCDetection( root=script_dir, year='2012', image_set='val', download=False, transform=None ) print(f"✅ 训练集大小: {len(voc_train)}") print(f"✅ 验证集大小: {len(voc_val)}") # 测试读取第一个样本 try: img, target = voc_train[0] print("📌 第一个样本加载成功") obj = target['annotation']['object'][0] print(f"示例物体: {obj['name']} @ {obj['bndbox']}") except Exception as e: print(f"❌ 加载失败: {e}") raise # 图像变换 transform = transforms.Compose([ transforms.Resize((640, 640)), # 调整为网络输入大小 transforms.ToTensor(), # 转为 [0,1] 归一化张量 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # ImageNet 标准化 ]) #定义Dataset类 class VOCDataset(Dataset): def __init__(self, data_list, img_size=640): self.data_list = data_list self.img_size = img_size self.class_to_idx = { 'aeroplane': 0, 'bicycle': 1, 'bird': 2, 'boat': 3, 'bottle': 4, 'bus': 5, 'car': 6, 'cat': 7, 'chair': 8, 'cow': 9, 'diningtable': 10, 'dog': 11, 'horse': 12, 'motorbike': 13, 'person': 14, 'pottedplant': 15, 'sheep': 16, 'sofa': 17, 'train': 18, 'tvmonitor': 19 } # 定义增强 pipeline(包含 resize, flip, color jitter, normalize, to_tensor) self.transform = A.Compose([ A.Resize(height=img_size, width=img_size), A.HorizontalFlip(p=0.5), A.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1, p=0.5), A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ToTensorV2() # 转为 [C,H,W] tensor ], bbox_params=A.BboxParams( format='pascal_voc', label_fields=['class_labels'], min_visibility=0.1 )) def __len__(self): return len(self.data_list) def __getitem__(self, idx): image, ann = self.data_list[idx] img = np.array(image.convert("RGB")) boxes = [] labels = [] for obj in ann['annotation']['object']: cls_name = obj['name'] if cls_name not in self.class_to_idx: continue label = self.class_to_idx[cls_name] bbox = obj['bndbox'] xmin = float(bbox['xmin']) ymin = float(bbox['ymin']) xmax = float(bbox['xmax']) ymax = float(bbox['ymax']) if xmax > xmin and ymax > ymin: boxes.append([xmin, ymin, xmax, ymax]) labels.append(label) if len(boxes) == 0: boxes = [[0, 0, 10, 10]] labels = [0] try: transformed = self.transform(image=img, bboxes=boxes, class_labels=labels) except Exception as e: print(f"Augmentation error at index {idx}: {e}") # 回退到 torchvision 流程 img_pil = transforms.ToPILImage()(img) img_tensor = transforms.Compose([ transforms.Resize((640, 640)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])(img_pil) boxes_tensor = torch.tensor(boxes, dtype=torch.float32).clone() labels_tensor = torch.tensor(labels, dtype=torch.long).clone() else: # ✅ 最终修复:确保图像张量是克隆且连续的 img_tensor = transformed["image"].clone().contiguous() # ← 关键修改! boxes_tensor = torch.tensor(transformed["bboxes"], dtype=torch.float32).clone() labels_tensor = torch.tensor(transformed["class_labels"], dtype=torch.long).clone() target = { "boxes": boxes_tensor, "labels": labels_tensor, "image_id": torch.tensor([idx], dtype=torch.long) } return img_tensor, target # 创建数据集实例 train_dataset = VOCDataset(voc_train, img_size=640) val_dataset = VOCDataset(voc_val, img_size=640) # 使用 DataLoader 加载批次 train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=8, shuffle=True, num_workers=4, pin_memory=True, collate_fn=collate_fn # ⚠️ 使用自定义函数! ) val_loader = torch.utils.data.DataLoader( val_dataset, batch_size=8, shuffle=False, num_workers=4, pin_memory=True, collate_fn=collate_fn ) #损失函数 def compute_loss(outputs, targets, strides=[8, 16, 32], num_classes=20): device = outputs[0].device criterion_cls = nn.BCEWithLogitsLoss(reduction='none') criterion_obj = nn.BCEWithLogitsLoss(reduction='none') total_loss_cls = torch.zeros(1, device=device) total_loss_obj = torch.zeros(1, device=device) total_loss_reg = torch.zeros(1, device=device) num_positive = 0 img_size = 640 feature_sizes = [img_size // s for s in strides] # [80, 40, 20] for i, pred in enumerate(outputs): H, W = feature_sizes[i], feature_sizes[i] stride = strides[i] bs = pred.shape[0] # Reshape: [B, C, H, W] -> [B, H*W, 4+1+num_classes] pred = pred.permute(0, 2, 3, 1).reshape(bs, -1, 4 + 1 + num_classes) reg_pred = pred[..., :4] # [B, H*W, 4] obj_pred = pred[..., 4] # [B, H*W] cls_pred = pred[..., 5:] # [B, H*W, nc] # Generate anchor points for this scale yv, xv = torch.meshgrid( torch.arange(H), torch.arange(W), indexing='ij' # 显式指定避免警告 ) grid_xy = torch.stack((xv, yv), dim=2).float().to(device) # (H, W, 2) grid_xy = grid_xy.reshape(-1, 2).unsqueeze(0).repeat(bs, 1, 1) # (B, H*W, 2) anchor_points = (grid_xy + 0.5) * stride # grid center in original image # Decode box center pred_xy = anchor_points + (reg_pred[..., :2].sigmoid() * stride - 0.5 * stride) pred_wh = torch.exp(reg_pred[..., 2:]) * stride pred_boxes = torch.cat([pred_xy, pred_wh], dim=-1) # [B, H*W, 4] # Prepare targets obj_target = torch.zeros_like(obj_pred) cls_target = torch.zeros_like(cls_pred) reg_target = torch.zeros_like(reg_pred) fg_mask = torch.zeros_like(obj_pred, dtype=torch.bool) for b in range(bs): tbox = targets[b]['boxes'] # (N, 4), xyxy format tlabel = targets[b]['labels'] # (N,) if len(tbox) == 0: continue # Convert tbox from xyxy to xywh tbox_xyxy = tbox tbox_xywh = torch.cat([ (tbox_xyxy[:, :2] + tbox_xyxy[:, 2:]) / 2, tbox_xyxy[:, 2:] - tbox_xyxy[:, :2] ], dim=1) # (N, 4) # Match: find best overlap between gt centers and anchor points gt_centers = tbox_xywh[:, :2] # (N, 2) distances = (anchor_points[b].unsqueeze(1) - gt_centers.unsqueeze(0)).pow(2).sum(dim=-1) # (H*W, N) _, closest_grid_idx = distances.min(dim=0) # each gt → nearest grid _, closest_gt_idx = distances.min(dim=1) # each grid → nearest gt # Positive samples: grids whose closest gt is itself pos_mask = torch.zeros(H * W, dtype=torch.bool, device=device) for gt_i in range(len(tbox)): grid_i = closest_grid_idx[gt_i] if closest_gt_idx[grid_i] == gt_i: # mutual match pos_mask[grid_i] = True fg_mask[b][pos_mask] = True obj_target[b][pos_mask] = 1.0 cls_target[b][pos_mask] = nn.functional.one_hot(tlabel.long(), num_classes=num_classes).float() # Regression target for positive samples matched_gt = tbox_xywh[pos_mask.sum(dim=0).nonzero(as_tuple=True)[0]] # tricky fix; assume one-to-one if len(matched_gt) > 0: reg_target[b][pos_mask] = torch.cat([ (matched_gt[:, :2] - anchor_points[b][pos_mask]) / stride, torch.log(matched_gt[:, 2:] / stride + 1e-8) ], dim=1) # Only compute loss on positive samples if fg_mask.any(): loss_obj = criterion_obj(obj_pred, obj_target) total_loss_obj += loss_obj.mean() pos_obj = obj_target == 1 if pos_obj.any(): total_loss_cls += criterion_cls(cls_pred[pos_obj], cls_target[pos_obj]).mean() total_loss_reg += torch.abs(reg_pred[pos_obj] - reg_target[pos_obj]).mean() num_positive += pos_obj.sum().item() else: total_loss_obj += obj_pred.sum() * 0 total_loss_cls += cls_pred.sum() * 0 total_loss_reg += reg_pred.sum() * 0 # Normalize losses if num_positive > 0: total_loss_reg /= num_positive total_loss_cls /= num_positive total_loss_obj /= len(outputs) total_loss = total_loss_obj + total_loss_cls + total_loss_reg * 5.0 # weight reg more return total_loss, total_loss_obj.detach(), total_loss_cls.detach(), total_loss_reg.detach() if __name__ == '__main__': device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = YOLOv8(num_classes=20).train().to(device) optimizer = torch.optim.Adam(model.parameters(), lr=1e-4) def move_to_device(data, device): if isinstance(data, torch.Tensor): return data.to(device, non_blocking=True) elif isinstance(data, list): return [move_to_device(d, device) for d in data] elif isinstance(data, dict): return {k: move_to_device(v, device) for k, v in data.items()} else: return data for epoch in range(10): print(f"\nEpoch {epoch + 1}/10") for i, (images, targets) in enumerate(train_loader): images = move_to_device(images, device) targets = move_to_device(targets, device) outputs = model(images) loss, loss_obj, loss_cls, loss_reg = compute_loss(outputs, targets, num_classes=20) optimizer.zero_grad() loss.backward() optimizer.step() if i % 10 == 0: print(f"Iter {i}, Loss: {loss.item():.4f} " f"(obj={loss_obj.item():.4f}, cls={loss_cls.item():.4f}, reg={loss_reg.item():.4f})") #----------------------------------------------------------------------------------------------------------------------# #可视化 #YOLO输出解码 def decode_outputs(outputs, strides=[8, 16, 32], img_size=640, num_classes=20): device = outputs[0].device pred_boxes = [] pred_scores = [] pred_labels = [] for i, pred in enumerate(outputs): bs, _, ny, nx = pred.shape stride = strides[i] H, W = ny, nx # Reshape prediction pred = pred.permute(0, 2, 3, 1).reshape(bs, -1, 4 + 1 + num_classes) reg_pred = pred[..., :4] obj_pred = pred[..., 4:5].sigmoid() cls_pred = pred[..., 5:].sigmoid() # Grid centers yv, xv = torch.meshgrid(torch.arange(H), torch.arange(W), indexing='ij') grid_xy = torch.stack((xv, yv), dim=2).float().to(device) # (H, W, 2) grid_xy = grid_xy.reshape(-1, 2).unsqueeze(0).expand(bs, -1, -1) # (B, H*W, 2) anchor_points = (grid_xy + 0.5) * stride # Decode boxes pred_xy = anchor_points + reg_pred[..., :2].sigmoid() * stride - 0.5 * stride pred_wh = torch.exp(reg_pred[..., 2:]) * stride boxes = torch.cat([pred_xy, pred_wh], dim=-1) scores, labels = torch.max(cls_pred, dim=-1) scores = scores * obj_pred.squeeze(-1) # Normalize to [0,1] img_w, img_h = img_size, img_size boxes_normalized = boxes / torch.tensor([img_w, img_h, img_w, img_h], device=device) pred_boxes.append(boxes_normalized) pred_scores.append(scores) pred_labels.append(labels) # 合并所有尺度 pred_boxes = torch.cat(pred_boxes, dim=1) # (B, L, 4) pred_scores = torch.cat(pred_scores, dim=1) # (B, L) pred_labels = torch.cat(pred_labels, dim=1) # (B, L) return pred_boxes, pred_scores, pred_labels #NMS后处理 def postprocess(pred_boxes, pred_scores, pred_labels, conf_thresh=0.25, nms_thresh=0.5): """ 对每张图片进行 NMS 后处理 """ final_boxes, final_scores, final_labels = [], [], [] for i in range(pred_boxes.shape[0]): # 遍历 batch 中每张图 boxes = pred_boxes[i] # (L, 4) scores = pred_scores[i] # (L,) labels = pred_labels[i] # (L,) # 过滤低置信度 mask = scores > conf_thresh boxes = boxes[mask] scores = scores[mask] labels = labels[mask] if len(boxes) == 0: final_boxes.append(torch.empty(0, 4)) final_scores.append(torch.empty(0)) final_labels.append(torch.empty(0)) continue # 转 xyxy 并应用 NMS boxes_xyxy = torchvision.ops.box_convert(boxes, in_fmt='cxcywh', out_fmt='xyxy') keep = torchvision.ops.nms(boxes_xyxy, scores, nms_thresh) final_boxes.append(boxes[keep]) final_scores.append(scores[keep]) final_labels.append(labels[keep]) return final_boxes, final_scores, final_labels #可视化函数 def visualize_predictions(model, val_loader, idx_to_class, device=device, num_images=4): model.eval() inv_normalize = transforms.Compose([ transforms.Normalize(mean=[0., 0., 0.], std=[1/0.229, 1/0.224, 1/0.225]), transforms.Normalize(mean=[-0.485, -0.456, -0.406], std=[1., 1., 1.]), ]) figure, ax = plt.subplots(num_images, 2, figsize=(12, 6 * num_images)) if num_images == 1: ax = ax.unsqueeze(0) with torch.no_grad(): for i, (images, targets) in enumerate(val_loader): images = images.to(device) outputs = model(images) pred_boxes, pred_scores, pred_labels = decode_outputs(outputs) det_boxes, det_scores, det_labels = postprocess(pred_boxes, pred_scores, pred_labels) for j in range(min(num_images, len(images))): # 图像 j img = images[j].cpu() img = inv_normalize(img) img = torch.clamp(img, 0, 1) img = transforms.ToPILImage()(img) # 绘图 ax[j, 0].imshow(img); ax[j, 0].set_title("Ground Truth") ax[j, 1].imshow(img); ax[j, 1].set_title("Predictions") # 绘制 GT gt_boxes = targets[j]['boxes'].cpu() gt_labels = targets[j]['labels'].cpu() for k in range(len(gt_boxes)): box = gt_boxes[k].numpy() label = idx_to_class[gt_labels[k].item()] rect = plt.Rectangle( (box[0]*640, box[1]*640), (box[2]-box[0])*640, (box[3]-box[1])*640, fill=False, edgecolor='green', linewidth=2 ) ax[j, 0].add_patch(rect) ax[j, 0].text(box[0]*640, box[1]*640, label, color='white', fontsize=10, bbox=dict(facecolor='green', alpha=0.7)) # 绘制 Pred pred_box_img = det_boxes[j].cpu() * 640 # 转为像素坐标 pred_score_img = det_scores[j].cpu() pred_label_img = det_labels[j].cpu() for k in range(len(pred_box_img)): box = pred_box_img[k].numpy() score = pred_score_img[k].item() label = idx_to_class[pred_label_img[k].item()] x1, y1 = int(box[0]), int(box[1]) w, h = int(box[2] - box[0]), int(box[3] - box[1]) rect = plt.Rectangle( (x1, y1), w, h, fill=False, edgecolor='red', linewidth=2 ) ax[j, 1].add_patch(rect) ax[j, 1].text(x1, y1, f"{label}:{score:.2f}", color='white', fontsize=10, bbox=dict(facecolor='red', alpha=0.7)) ax[j, 0].axis('off'); ax[j, 1].axis('off') break # 只看第一批次 plt.tight_layout() plt.show() model.train() # 回到训练模式 # 类别索引反查表 idx_to_class = {v: k for k, v in train_dataset.class_to_idx.items()} # 在某个 epoch 后调用 visualize_predictions(model, val_loader, idx_to_class, device="cuda", num_images=4) D:\Anaconda\envs\crawler\python.exe D:\Pycharm\Py_Projects\YOLO\v8_1.00.py 📁 数据集根目录(存放 VOCdevkit 的位置): D:\Pycharm\Py_Projects\YOLO ✅ 训练集大小: 5717 ✅ 验证集大小: 5823 📌 第一个样本加载成功 示例物体: horse @ {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'} Epoch 1/10 📁 数据集根目录(存放 VOCdevkit 的位置): D:\Pycharm\Py_Projects\YOLO ✅ 训练集大小: 5717 ✅ 验证集大小: 5823 📌 第一个样本加载成功 示例物体: horse @ {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'} 📁 数据集根目录(存放 VOCdevkit 的位置): D:\Pycharm\Py_Projects\YOLO ✅ 训练集大小: 5717 ✅ 验证集大小: 5823 📌 第一个样本加载成功 示例物体: horse @ {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'} 📁 数据集根目录(存放 VOCdevkit 的位置): D:\Pycharm\Py_Projects\YOLO ✅ 训练集大小: 5717 ✅ 验证集大小: 5823 📌 第一个样本加载成功 示例物体: horse @ {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'} 📁 数据集根目录(存放 VOCdevkit 的位置): D:\Pycharm\Py_Projects\YOLO ✅ 训练集大小: 5717 ✅ 验证集大小: 5823 📌 第一个样本加载成功 示例物体: horse @ {'xmin': '53', 'ymin': '87', 'xmax': '471', 'ymax': '420'} Traceback (most recent call last): File "D:\Pycharm\Py_Projects\YOLO\v8_1.00.py", line 551, in <module> loss, loss_obj, loss_cls, loss_reg = compute_loss(outputs, targets, num_classes=20) File "D:\Pycharm\Py_Projects\YOLO\v8_1.00.py", line 499, in compute_loss reg_target[b][pos_mask] = torch.cat([ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 6 but got size 1 for tensor number 1 in the list. 进程已结束,退出代码为 1

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license """ Train a YOLOv5 model on a custom dataset Usage: $ python path/to/train.py --data coco128.yaml --weights yolov5s.pt --img 640 """ import argparse import logging import math import os import random import sys import time from copy import deepcopy from pathlib import Path import numpy as np import torch import torch.distributed as dist import torch.nn as nn import yaml from torch.cuda import amp from torch.nn.parallel import DistributedDataParallel as DDP from torch.optim import Adam, SGD, lr_scheduler from tqdm import tqdm FILE = Path(__file__).resolve() ROOT = FILE.parents[0] # YOLOv5 root directory if str(ROOT) not in sys.path: sys.path.append(str(ROOT)) # add ROOT to PATH ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative import val # for end-of-epoch mAP from models.experimental import attempt_load from models.yolo import Model from utils.autoanchor import check_anchors from utils.datasets import create_dataloader from utils.general import labels_to_class_weights, increment_path, labels_to_image_weights, init_seeds, \ strip_optimizer, get_latest_run, check_dataset, check_git_status, check_img_size, check_requirements, \ check_file, check_yaml, check_suffix, print_args, print_mutation, set_logging, one_cycle, colorstr, methods from utils.downloads import attempt_download from utils.loss import ComputeLoss from utils.plots import plot_labels, plot_evolve from utils.torch_utils import EarlyStopping, ModelEMA, de_parallel, intersect_dicts, select_device, \ torch_distributed_zero_first from utils.loggers.wandb.wandb_utils import check_wandb_resume from utils.metrics import fitness from utils.loggers import Loggers from utils.callbacks import Callbacks LOGGER = logging.getLogger(__name__) LOCAL_RANK = int(os.getenv('LOCAL_RANK', -1)) # https://pytorch.org/docs/stable/elastic/run.html RANK = int(os.getenv('RANK', -1)) WORLD_SIZE = int(os.getenv('WORLD_SIZE', 1)) def train(hyp, # path/to/hyp.yaml 或者超参数字典 opt, device, callbacks ): # 设置训练相关的目录和参数 save_dir, epochs, batch_size, weights, single_cls, evolve, data, cfg, resume, noval, nosave, workers, freeze, = \ Path(opt.save_dir), opt.epochs, opt.batch_size, opt.weights, opt.single_cls, opt.evolve, opt.data, opt.cfg, \ opt.resume, opt.noval, opt.nosave, opt.workers, opt.freeze # 创建保存模型权重的目录 w = save_dir / 'weights' # 权重保存目录 (w.parent if evolve else w).mkdir(parents=True, exist_ok=True) # 如果需要演化,则创建父目录,否则创建权重目录 last, best = w / 'last.pt', w / 'best.pt' # 定义最后和最好的模型文件路径 # 加载超参数 if isinstance(hyp, str): with open(hyp, errors='ignore') as f: hyp = yaml.safe_load(f) # 从 YAML 文件中加载超参数字典 LOGGER.info(colorstr('hyperparameters: ') + ', '.join(f'{k}={v}' for k, v in hyp.items())) # 记录超参数信息 # 保存运行设置 with open(save_dir / 'hyp.yaml', 'w') as f: yaml.safe_dump(hyp, f, sort_keys=False) # 保存超参数到 YAML 文件 with open(save_dir / 'opt.yaml', 'w') as f: yaml.safe_dump(vars(opt), f, sort_keys=False) # 保存训练选项到 YAML 文件 data_dict = None # 初始化数据字典 # 初始化日志记录器 if RANK in [-1, 0]: # 仅在主进程中执行 loggers = Loggers(save_dir, weights, opt, hyp, LOGGER) # 创建日志记录器实例 if loggers.wandb: # 如果使用 wandb 进行实验追踪 data_dict = loggers.wandb.data_dict # 获取 wandb 的数据字典 if resume: # 如果是恢复训练 weights, epochs, hyp = opt.weights, opt.epochs, opt.hyp # 更新权重和超参数 # 注册回调函数 for k in methods(loggers): # 遍历日志记录器的方法 callbacks.register_action(k, callback=getattr(loggers, k)) # 将日志记录器的方法注册为回调 # 配置 plots = not evolve # 是否创建绘图,演化模式下不创建 cuda = device.type != 'cpu' # 检查是否使用 CUDA(GPU) init_seeds(1 + RANK) # 初始化随机种子,确保每个进程的种子不同 # 在分布式训练的主进程中执行以下操作 with torch_distributed_zero_first(LOCAL_RANK): data_dict = data_dict or check_dataset(data) # 检查数据集,如果数据字典为 None,则加载数据集 # 获取训练和验证数据集的路径 train_path, val_path = data_dict['train'], data_dict['val'] nc = 1 if single_cls else int(data_dict['nc']) # 获取类别数量,单类别情况下数量为 1 # 获取类别名称,如果是单类别且名称列表长度不为 1,则设为 ['item'] names = ['item'] if single_cls and len(data_dict['names']) != 1 else data_dict['names'] assert len(names) == nc, f'{len(names)} names found for nc={nc} dataset in {data}' # 检查类别名称的数量是否与 nc 匹配 is_coco = data.endswith('coco.yaml') and nc == 80 # 检查数据集是否为 COCO 数据集,且类别数量是否为 80 # Model check_suffix(weights, '.pt') # 检查权重文件的后缀是否为 .pt pretrained = weights.endswith('.pt') # 判断权重文件是否为预训练模型 if pretrained: # 如果是预训练模型,则尝试下载它 with torch_distributed_zero_first(LOCAL_RANK): weights = attempt_download(weights) # 如果本地找不到权重文件,则下载 ckpt = torch.load(weights, map_location=device) # 加载检查点 # 创建模型,cfg 为配置文件,ch 为输入通道数(一般为3),nc 为类别数,anchors 为锚框 model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # 创建模型实例 exclude = ['anchor'] if (cfg or hyp.get('anchors')) and not resume else [] # 定义需要排除的键 csd = ckpt['model'].float().state_dict() # 获取检查点的 state_dict,转为 FP32 格式 csd = intersect_dicts(csd, model.state_dict(), exclude=exclude) # 交集,获取匹配的参数 model.load_state_dict(csd, strict=False) # 加载参数 LOGGER.info(f'Transferred {len(csd)}/{len(model.state_dict())} items from {weights}') # 输出转移的参数数量 else: # 如果不是预训练模型,则使用给定的 cfg 创建新模型 model = Model(cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # 创建模型实例 # Freeze freeze = [f'model.{x}.' for x in range(freeze)] # 定义需要冻结的层 for k, v in model.named_parameters(): v.requires_grad = True # 默认所有层均可训练 if any(x in k for x in freeze): # 检查当前层是否在冻结列表中 print(f'freezing {k}') # 打印冻结层的信息 v.requires_grad = False # 冻结该层的参数 # 优化器 nbs = 64 # 规定的批量大小 accumulate = max(round(nbs / batch_size), 1) # 在优化之前累积损失 hyp['weight_decay'] *= batch_size * accumulate / nbs # 按照批量大小缩放 weight_decay LOGGER.info(f"Scaled weight_decay = {hyp['weight_decay']}") # 记录缩放后的 weight_decay g0, g1, g2 = [], [], [] # 定义优化器参数组 # 遍历模型的所有模块 for v in model.modules(): if hasattr(v, 'bias') and isinstance(v.bias, nn.Parameter): # 如果模块有偏置 g2.append(v.bias) # 将偏置添加到 g2 if isinstance(v, nn.BatchNorm2d): # 如果模块是 BatchNorm2d g0.append(v.weight) # 将权重添加到 g0(不使用权重衰减) elif hasattr(v, 'weight') and isinstance(v.weight, nn.Parameter): # 如果模块有权重 g1.append(v.weight) # 将权重添加到 g1(使用权重衰减) # 根据选择的优化器类型创建优化器 if opt.adam: # 使用 Adam 优化器,调整 beta1 为动量 optimizer = Adam(g0, lr=hyp['lr0'], betas=(hyp['momentum'], 0.999)) else: # 使用 SGD 优化器 optimizer = SGD(g0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True) # 添加参数组 g1(使用 weight_decay)和 g2(偏置) optimizer.add_param_group({'params': g1, 'weight_decay': hyp['weight_decay']}) optimizer.add_param_group({'params': g2}) # 添加偏置 g2 LOGGER.info(f"{colorstr('optimizer:')} {type(optimizer).__name__} with parameter groups " f"{len(g0)} weight, {len(g1)} weight (no decay), {len(g2)} bias") # 记录优化器信息 # 清理不再需要的参数组 del g0, g1, g2 # 学习率调度器 if opt.linear_lr: # 如果选择线性学习率调度,定义学习率函数 lf lf = lambda x: (1 - x / (epochs - 1)) * (1.0 - hyp['lrf']) + hyp['lrf'] # 线性调度 else: # 否则使用余弦调度,创建学习率函数 lf lf = one_cycle(1, hyp['lrf'], epochs) # 从 1 到 hyp['lrf'] 的余弦调度 # 创建学习率调度器,将优化器和学习率函数 lf 传入 scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf) # 可视化学习率调度器 (可选) # plot_lr_scheduler(optimizer, scheduler, epochs) # EMA ema = ModelEMA(model) if RANK in [-1, 0] else None # Resume start_epoch, best_fitness = 0, 0.0 # 初始化开始的轮次和最佳适应度 if pretrained: # 如果使用预训练模型 # Optimizer if ckpt['optimizer'] is not None: # 如果检查点中包含优化器状态 optimizer.load_state_dict(ckpt['optimizer']) # 加载优化器的状态字典 best_fitness = ckpt['best_fitness'] # 更新最佳适应度 # EMA if ema and ckpt.get('ema'): # 如果启用 EMA 且检查点中包含 EMA 状态 ema.ema.load_state_dict(ckpt['ema'].float().state_dict()) # 加载 EMA 的状态字典 ema.updates = ckpt['updates'] # 更新 EMA 的次数 # Epochs start_epoch = ckpt['epoch'] + 1 # 设置开始的轮次为检查点的轮次加 1 if resume: # 如果选择了恢复训练 assert start_epoch > 0, f'{weights} training to {epochs} epochs is finished, nothing to resume.' # 确保可以恢复 if epochs < start_epoch: # 如果设置的轮次小于恢复的轮次 LOGGER.info( f"{weights} has been trained for {ckpt['epoch']} epochs. Fine-tuning for {epochs} more epochs.") # 日志记录 epochs += ckpt['epoch'] # 继续训练更多轮次 del ckpt, csd # 清理检查点和其他变量以释放内存 # Image sizes gs = max(int(model.stride.max()), 32) # 获取模型的最大步幅作为网格大小,确保至少为 32 nl = model.model[-1].nl # 获取检测层的数量(用于缩放 hyp['obj'] 超参数) imgsz = check_img_size(opt.imgsz, gs, floor=gs * 2) # 验证图像大小是否是网格大小的倍数,且不小于 gs 的两倍 # DP mode if cuda and RANK == -1 and torch.cuda.device_count() > 1: # 如果在多 GPU 环境下且未使用分布式训练,发出警告 logging.warning('DP not recommended, instead use torch.distributed.run for best DDP Multi-GPU results.\n' 'See Multi-GPU Tutorial at https://github.com/ultralytics/yolov5/issues/475 to get started.') model = torch.nn.DataParallel(model) # 使用数据并行(DataParallel) # SyncBatchNorm if opt.sync_bn and cuda and RANK != -1: # 如果启用了同步批归一化且处于分布式训练模式,转换模型为同步批归一化 model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model).to(device) LOGGER.info('Using SyncBatchNorm()') # 记录使用同步批归一化的信息 # Trainloader train_loader, dataset = create_dataloader( train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls, hyp=hyp, augment=True, cache=opt.cache, rect=opt.rect, rank=LOCAL_RANK, workers=workers, image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr('train: ') ) # 创建训练数据加载器和数据集 mlc = int(np.concatenate(dataset.labels, 0)[:, 0].max()) # 找到数据集中最大标签类 nb = len(train_loader) # 计算批次的数量 # 检查最大标签类是否小于类别总数 assert mlc < nc, f'Label class {mlc} exceeds nc={nc} in {data}. Possible class labels are 0-{nc - 1}' # 处理过程 0 if RANK in [-1, 0]: # 创建验证数据加载器,批大小是原来的两倍 val_loader = create_dataloader( val_path, imgsz, batch_size // WORLD_SIZE * 2, gs, single_cls, hyp=hyp, cache=None if noval else opt.cache, rect=True, rank=-1, workers=workers, pad=0.5, prefix=colorstr('val: ') )[0] # 如果不是恢复训练 if not resume: labels = np.concatenate(dataset.labels, 0) # 合并所有标签 # c = torch.tensor(labels[:, 0]) # 提取类别 # cf = torch.bincount(c.long(), minlength=nc) + 1. # 统计频率 # model._initialize_biases(cf.to(device)) # 初始化偏置 if plots: plot_labels(labels, names, save_dir) # 绘制标签分布 # 锚框检查 if not opt.noautoanchor: check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz) model.half().float() # 预先减少锚框精度 callbacks.run('on_pretrain_routine_end') # 运行训练前例程结束的回调 # DDP模式 if cuda and RANK != -1: # 使用分布式数据并行(DDP)包装模型 model = DDP(model, device_ids=[LOCAL_RANK], output_device=LOCAL_RANK) # 模型参数 hyp['box'] *= 3. / nl # 将框的超参数缩放到检测层数量 hyp['cls'] *= nc / 80. * 3. / nl # 将类别超参数缩放到类别数量和检测层数量 hyp['obj'] *= (imgsz / 640) ** 2 * 3. / nl # 将目标超参数缩放到图像尺寸和检测层数量 hyp['label_smoothing'] = opt.label_smoothing # 设置标签平滑参数 # 将类别数量附加到模型 model.nc = nc # attach number of classes to model # 将超参数附加到模型 model.hyp = hyp # attach hyperparameters to model # 计算并附加类别权重到模型 model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) * nc # attach class weights # 将类别名称附加到模型 model.names = names # attach class names to model # 开始训练 t0 = time.time() # 记录开始时间 nw = max(round(hyp['warmup_epochs'] * nb), 1000) # 计算预热迭代次数,最小为1000次(相当于3个epoch) # nw = min(nw, (epochs - start_epoch) / 2 * nb) # 限制预热时间小于总训练时间的一半 last_opt_step = -1 # 最后一次优化步骤 maps = np.zeros(nc) # 每个类别的mAP results = (0, 0, 0, 0, 0, 0, 0) # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls) scheduler.last_epoch = start_epoch - 1 # 设置调度器的最后epoch为当前epoch之前 scaler = amp.GradScaler(enabled=cuda) # 初始化混合精度训练的梯度缩放器 stopper = EarlyStopping(patience=opt.patience) # 初始化早停机制,设定耐心值 compute_loss = ComputeLoss(model) # 初始化损失计算类 # 记录训练信息 LOGGER.info(f'Image sizes {imgsz} train, {imgsz} val\n' f'Using {train_loader.num_workers} dataloader workers\n' f"Logging results to {colorstr('bold', save_dir)}\n" f'Starting training for {epochs} epochs...') # 打印训练开始信息 for epoch in range(start_epoch, epochs): # 训练周期循环 model.train() # 设置模型为训练模式 # 可选:更新图像权重(仅适用于单GPU) if opt.image_weights: cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc # 计算类别权重 iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw) # 计算图像权重 dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n) # 随机加权索引 # 可选:更新马赛克边框 # b = int(random.uniform(0.25 * imgsz, 0.75 * imgsz + gs) // gs * gs) # dataset.mosaic_border = [b - imgsz, -b] # 设置高度和宽度边框 mloss = torch.zeros(3, device=device) # 初始化平均损失 if RANK != -1: train_loader.sampler.set_epoch(epoch) # 设置训练加载器的当前周期 pbar = enumerate(train_loader) # 遍历训练数据加载器 LOGGER.info(('\n' + '%10s' * 7) % ('Epoch', 'gpu_mem', 'box', 'obj', 'cls', 'labels', 'img_size')) # 日志信息 if RANK in [-1, 0]: pbar = tqdm(pbar, total=nb) # 显示进度条 optimizer.zero_grad() # 优化器梯度清零 for i, (imgs, targets, paths, _) in pbar: # 批处理循环 ni = i + nb * epoch # 计算自训练开始以来的集成批次数 imgs = imgs.to(device, non_blocking=True).float() / 255.0 # 将uint8类型转换为float32并归一化 # Warmup阶段 if ni <= nw: xi = [0, nw] # 线性插值范围 # compute_loss.gr = np.interp(ni, xi, [0.0, 1.0]) # IOU损失比率(obj_loss = 1.0或IOU) accumulate = max(1, np.interp(ni, xi, [1, nbs / batch_size]).round()) # 计算累积步骤 for j, x in enumerate(optimizer.param_groups): # 遍历优化器参数组 # 更新学习率:偏置学习率从0.1降到lr0,其他学习率从0.0升到lr0 x['lr'] = np.interp(ni, xi, [hyp['warmup_bias_lr'] if j == 2 else 0.0, x['initial_lr'] * lf(epoch)]) if 'momentum' in x: x['momentum'] = np.interp(ni, xi, [hyp['warmup_momentum'], hyp['momentum']]) # 更新动量 # 多尺度训练 if opt.multi_scale: sz = random.randrange(imgsz * 0.5, imgsz * 1.5 + gs) // gs * gs # 随机选择大小 sf = sz / max(imgs.shape[2:]) # 计算缩放因子 if sf != 1: ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]] # 计算新形状(调整为gs的倍数) imgs = nn.functional.interpolate(imgs, size=ns, mode='bilinear', align_corners=False) # 重新调整图像大小 # 前向传播 with amp.autocast(enabled=cuda): # 使用混合精度训练 pred = model(imgs) # 前向传播得到预测 loss, loss_items = compute_loss(pred, targets.to(device)) # 计算损失 if RANK != -1: loss *= WORLD_SIZE # 在DDP模式下进行梯度平均 if opt.quad: loss *= 4. # 如果使用四元组,则损失乘以4 # 反向传播 scaler.scale(loss).backward() # 反向传播并缩放 # 优化 if ni - last_opt_step >= accumulate: # 如果达到累积步骤 scaler.step(optimizer) # 更新优化器 scaler.update() # 更新缩放器 optimizer.zero_grad() # 清零梯度 if ema: # 如果使用EMA(指数移动平均) ema.update(model) # 更新EMA last_opt_step = ni # 更新上一次优化步骤 # 日志记录 if RANK in [-1, 0]: # 如果是主进程 mloss = (mloss * i + loss_items) / (i + 1) # 更新平均损失 mem = f'{torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0:.3g}G' # 获取GPU内存使用情况 pbar.set_description(('%10s' * 2 + '%10.4g' * 5) % ( f'{epoch}/{epochs - 1}', mem, *mloss, targets.shape[0], imgs.shape[-1])) # 更新进度条描述 callbacks.run('on_train_batch_end', ni, model, imgs, targets, paths, plots, opt.sync_bn) # 调用回调函数 # 结束批处理循环 # 学习率调度器 lr = [x['lr'] for x in optimizer.param_groups] # 记录当前学习率 scheduler.step() # 更新学习率 if RANK in [-1, 0]: # 如果是主进程 # 计算mAP(平均精度均值) callbacks.run('on_train_epoch_end', epoch=epoch) # 运行回调函数,记录训练周期结束 ema.update_attr(model, include=['yaml', 'nc', 'hyp', 'names', 'stride', 'class_weights']) # 更新EMA模型属性 final_epoch = (epoch + 1 == epochs) or stopper.possible_stop # 检查是否是最后一个周期 if not noval or final_epoch: # 如果不进行验证或是最后一个周期 results, maps, _ = val.run(data_dict, # 验证模型 batch_size=batch_size // WORLD_SIZE * 2, # 设置验证批次大小 imgsz=imgsz, # 图像尺寸 model=ema.ema, # 使用EMA模型进行验证 single_cls=single_cls, # 是否为单类别 dataloader=val_loader, # 验证数据加载器 save_dir=save_dir, # 保存路径 plots=False, # 是否绘制图 callbacks=callbacks, # 回调函数 compute_loss=compute_loss) # 计算损失 # 更新最佳mAP fi = fitness(np.array(results).reshape(1, -1)) # 计算适应度(加权组合[精度, 召回率, mAP@.5, mAP@.5-.95] if fi > best_fitness: # 如果当前适应度大于最佳适应度 best_fitness = fi # 更新最佳适应度 log_vals = list(mloss) + list(results) + lr # 合并损失、结果和学习率 callbacks.run('on_fit_epoch_end', log_vals, epoch, best_fitness, fi) # 运行适应度记录回调 # 保存模型 if (not nosave) or (final_epoch and not evolve): # 如果需要保存模型 ckpt = {'epoch': epoch, # 当前周期 'best_fitness': best_fitness, # 最佳适应度 'model': deepcopy(de_parallel(model)).half(), # 深拷贝模型并转换为半精度 'ema': deepcopy(ema.ema).half(), # 深拷贝EMA模型并转换为半精度 'updates': ema.updates, # EMA更新次数 'optimizer': optimizer.state_dict(), # 优化器状态 'wandb_id': loggers.wandb.wandb_run.id if loggers.wandb else None} # wandb ID # 保存最后模型和最佳模型,并根据周期删除 torch.save(ckpt, last) # 保存最后的模型 if best_fitness == fi: # 如果当前适应度为最佳 torch.save(ckpt, best) # 保存最佳模型 if (epoch > 0) and (opt.save_period > 0) and (epoch % opt.save_period == 0): # 根据周期保存模型 torch.save(ckpt, w / f'epoch{epoch}.pt') # 保存指定周期的模型 del ckpt # 删除检查点以释放内存 callbacks.run('on_model_save', last, epoch, final_epoch, best_fitness, fi) # 运行模型保存回调 # 停止单GPU训练 if RANK == -1 and stopper(epoch=epoch, fitness=fi): # 如果是单GPU且满足停止条件 break # 结束训练 # end epoch ---------------------------------------------------------------------------------------------------- # end training ----------------------------------------------------------------------------------------------------- if RANK in [-1, 0]: # 如果是主进程 # 记录已完成的周期和耗时 LOGGER.info(f'\n{epoch - start_epoch + 1} epochs completed in {(time.time() - t0) / 3600:.3f} hours.') # 对于最后一个和最佳模型进行处理 for f in last, best: if f.exists(): # 如果文件存在 strip_optimizer(f) # 去除优化器状态以减小模型文件大小 if f is best: # 如果是最佳模型 LOGGER.info(f'\nValidating {f}...') # 记录验证信息 results, _, _ = val.run(data_dict, # 验证模型 batch_size=batch_size // WORLD_SIZE * 2, # 设置批次大小 imgsz=imgsz, # 图像尺寸 model=attempt_load(f, device).half(), # 加载模型并转换为半精度 iou_thres=0.65 if is_coco else 0.60, # 设置IOU阈值(针对COCO数据集) single_cls=single_cls, # 是否为单类别 dataloader=val_loader, # 验证数据加载器 save_dir=save_dir, # 保存路径 save_json=is_coco, # 是否保存为JSON(针对COCO数据集) verbose=True, # 是否输出详细信息 plots=True, # 是否绘制图 callbacks=callbacks, # 回调函数 compute_loss=compute_loss) # 计算损失 callbacks.run('on_train_end', last, best, plots, epoch) # 运行训练结束的回调 LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}") # 记录结果保存路径 torch.cuda.empty_cache() # 清空CUDA缓存 return results # 返回结果 def parse_opt(known=False): """ 函数功能:设置opt参数 """ parser = argparse.ArgumentParser() # --------------------------------------------------- 常用参数 --------------------------------------------- parser.add_argument('--weights', type=str, default=ROOT / 'weights/yolov5s.pt', help='initial weights path') # weights: 权重文件 parser.add_argument('--cfg', type=str, default='models/yolov5s.yaml', help='model.yaml path') # cfg: 网络模型配置文件 包括nc、depth_multiple、width_multiple、anchors、backbone、head等 parser.add_argument('--data', type=str, default=ROOT / 'data/VOC-hat.yaml', help='hyperparameters path') # hyp: 训练时的超参文件 parser.add_argument('--epochs', type=int, default=100) # epochs: 训练轮次 parser.add_argument('--batch-size', type=int, default=32, help='total batch size for all GPUs') # batch-size: 训练批次大小 parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=608, help='train, val image size (pixels)') # imgsz: 输入网络的图片分辨率大小 parser.add_argument('--rect', action='store_true', help='rectangular training') # rect: 是否采用Rectangular training/inference,一张图片为长方形,我们在将其送入模型前需要将其resize到要求的尺寸,所以我们需要通过补灰padding来变为正方形的图。 parser.add_argument('--resume', nargs='?', const=True, default="", help='resume most recent training') # resume: 断点续训, 从上次打断的训练结果处接着训练 默认False parser.add_argument('--nosave', action='store_true', help='only save final checkpoint') # nosave: 不保存模型 默认保存 store_true: only test final epoch parser.add_argument('--noval', action='store_true', help='only validate final epoch') # noval: 只在最后一次进行测试,默认False parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check') # noautoanchor: 不自动调整anchor 默认False(自动调整anchor) parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations') # evolve: 是否进行超参进化,使得数值更好 默认False parser.add_argument('--bucket', type=str, default='', help='gsutil bucket') # bucket: 谷歌云盘bucket 一般用不到 parser.add_argument('--cache', type=str, nargs='?', const='ram', default="True", help='--cache images in "ram" (default) or "disk"') # cache:是否提前缓存图片到内存,以加快训练速度 parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training') # image-weights: 对于那些训练不好的图片,会在下一轮中增加一些权重 parser.add_argument('--device', default='0', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') # device: 训练的设备 parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%') # multi-scale: 是否使用多尺度训练 默认False,要被32整除。 parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class') # single-cls: 数据集是否只有一个类别 默认False parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer') # adam: 是否使用adam优化器 parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode') # sync-bn: 是否使用跨卡同步bn操作,再DDP中使用 默认False parser.add_argument('--workers', type=int, default=0, help='maximum number of dataloader workers') # workers: dataloader中的最大work数(线程个数) parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name') # project: 训练结果保存的根目录 默认是runs/train parser.add_argument('--name', default='exp', help='save to project/name') # name: 训练结果保存的目录 默认是exp parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment') # exist_ok: 是否重新创建日志文件, False时重新创建文件(默认文件都是不存在的) parser.add_argument('--quad', action='store_true', help='quad dataloader') # quad: dataloader取数据时, 是否使用collate_fn4代替collate_fn 默认False parser.add_argument('--linear-lr', action='store_true', help='linear LR') # linear-lr:用于对学习速率进行调整,默认为 False,(通过余弦函数来降低学习率) parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon') # label-smoothing: 标签平滑增强 默认0.0不增强 要增强一般就设为0.1 parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)') # 早停机制,训练到一定的epoch,如果模型效果未提升,就让模型提前停止训练。 parser.add_argument('--freeze', type=int, default=0, help='Number of layers to freeze. backbone=10, all=24') # freeze: 使用预训练模型的规定固定权重不进行调整 --freeze 10 :意思从第0层到到第10层不训练 parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)') # 设置多少个epoch保存一次模型 parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify') # local_rank: rank为进程编号 -1且gpu=1时不进行分布式 -1且多块gpu使用DataParallel模式 # --------------------------------------------------- W&B(wandb)参数 --------------------------------------------- parser.add_argument('--entity', default=None, help='W&B: Entity') #wandb entity 默认None parser.add_argument('--upload_dataset', action='store_true', help='W&B: Upload dataset as artifact table') # 是否上传dataset到wandb tabel(将数据集作为交互式 dsviz表 在浏览器中查看、查询、筛选和分析数据集) 默认False parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval') # 设置界框图像记录间隔 Set bounding-box image logging interval for W&B 默认-1 opt.epochs // 10 parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use') opt = parser.parse_known_args()[0] if known else parser.parse_args() return opt def main(opt, callbacks=Callbacks()): # 设置日志记录 set_logging(RANK) # 主进程检查 if RANK in [-1, 0]: print_args(FILE.stem, opt) # 打印运行参数 check_git_status() # 检查Git仓库状态(确保代码是最新版本) check_requirements(exclude=['thop']) # 检查依赖包,排除'thop'包 # 恢复中断的运行 if opt.resume and not check_wandb_resume(opt) and not opt.evolve: # 检查是否从中断位置恢复 ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run() # 获取指定或最近的检查点路径 assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist' # 检查检查点文件是否存在 # 从指定检查点目录加载训练配置 with open(Path(ckpt).parent.parent / 'opt.yaml', errors='ignore') as f: opt = argparse.Namespace(**yaml.safe_load(f)) # 加载训练配置到opt变量 opt.cfg, opt.weights, opt.resume = '', ckpt, True # 设置配置文件路径和权重文件路径 LOGGER.info(f'Resuming training from {ckpt}') # 打印恢复信息 else: # 校验文件和路径的配置 opt.data, opt.cfg, opt.hyp, opt.weights, opt.project = \ check_file(opt.data), check_yaml(opt.cfg), check_yaml(opt.hyp), str(opt.weights), str( opt.project) # 检查配置文件路径 assert len(opt.cfg) or len(opt.weights), 'either --cfg or --weights must be specified' # 确保cfg或weights参数至少一个存在 if opt.evolve: # 演化模式 opt.project = str(ROOT / 'runs/evolve') # 设置演化运行保存路径 opt.exist_ok, opt.resume = opt.resume, False # 设置路径是否覆盖并禁用恢复 # 设置保存目录并递增路径 opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # DDP模式(分布式数据并行) device = select_device(opt.device, batch_size=opt.batch_size) # 选择计算设备 if LOCAL_RANK != -1: # 如果启用DDP assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command' # 检查CUDA设备数量是否足够 assert opt.batch_size % WORLD_SIZE == 0, '--batch-size must be multiple of CUDA device count' # 确保batch size是设备数的倍数 assert not opt.image_weights, '--image-weights argument is not compatible with DDP training' # 确保未启用图像权重 assert not opt.evolve, '--evolve argument is not compatible with DDP training' # 确保未启用演化模式 torch.cuda.set_device(LOCAL_RANK) # 设置CUDA设备 device = torch.device('cuda', LOCAL_RANK) # 指定设备 dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo") # 初始化进程组,选择nccl或gloo作为通信后端 # 训练模型 if not opt.evolve: # 如果不是演化模式,进行训练 train(opt.hyp, opt, device, callbacks) # 调用train函数进行训练 if WORLD_SIZE > 1 and RANK == 0: # 在多GPU模式下,销毁进程组 LOGGER.info('Destroying process group... ') dist.destroy_process_group() # 销毁DDP进程组 # 进行超参数演化(可选) else: # 超参数演化元数据(变异规模 0-1,最小值,上限) meta = { 'lr0': (1, 1e-5, 1e-1), # 初始学习率 (SGD=1E-2, Adam=1E-3) 'lrf': (1, 0.01, 1.0), # 最终 OneCycleLR 学习率 (lr0 * lrf) 'momentum': (0.3, 0.6, 0.98), # SGD 动量/Adam beta1 'weight_decay': (1, 0.0, 0.001), # 优化器的权重衰减 'warmup_epochs': (1, 0.0, 5.0), # 预热轮数(允许使用小数) 'warmup_momentum': (1, 0.0, 0.95), # 预热初始动量 'warmup_bias_lr': (1, 0.0, 0.2), # 预热初始偏置学习率 'box': (1, 0.02, 0.2), # 边框损失增益 'cls': (1, 0.2, 4.0), # 分类损失增益 'cls_pw': (1, 0.5, 2.0), # 分类 BCELoss 正权重 'obj': (1, 0.2, 4.0), # 目标损失增益(与像素成比例) 'obj_pw': (1, 0.5, 2.0), # 目标 BCELoss 正权重 'iou_t': (0, 0.1, 0.7), # IoU 训练阈值 'anchor_t': (1, 2.0, 8.0), # 锚点倍数阈值 'anchors': (2, 2.0, 10.0), # 每个输出网格的锚点数量(0 为忽略) 'fl_gamma': (0, 0.0, 2.0), # 聚焦损失伽马(efficientDet 默认伽马=1.5) 'hsv_h': (1, 0.0, 0.1), # 图像 HSV-色相增强(比例) 'hsv_s': (1, 0.0, 0.9), # 图像 HSV-饱和度增强(比例) 'hsv_v': (1, 0.0, 0.9), # 图像 HSV-亮度增强(比例) 'degrees': (1, 0.0, 45.0), # 图像旋转 (+/- 度) 'translate': (1, 0.0, 0.9), # 图像平移 (+/- 比例) 'scale': (1, 0.0, 0.9), # 图像缩放 (+/- 增益) 'shear': (1, 0.0, 10.0), # 图像剪切 (+/- 度) 'perspective': (0, 0.0, 0.001), # 图像透视 (+/- 比例),范围 0-0.001 'flipud': (1, 0.0, 1.0), # 图像上下翻转(概率) 'fliplr': (0, 0.0, 1.0), # 图像左右翻转(概率) 'mosaic': (1, 0.0, 1.0), # 图像混合(概率) 'mixup': (1, 0.0, 1.0), # 图像混合(概率) 'copy_paste': (1, 0.0, 1.0) # 段落复制粘贴(概率) } # 打开超参数文件并加载超参数字典 with open(opt.hyp, errors='ignore') as f: hyp = yaml.safe_load(f) # 使用 YAML 加载超参数字典 if 'anchors' not in hyp: # 如果超参数中没有 anchors(可能被注释掉) hyp['anchors'] = 3 # 设置默认的 anchors 数量 # 设置选项,指示只进行验证和保存最终的训练结果 opt.noval, opt.nosave, save_dir = True, True, Path(opt.save_dir) # 只验证和保存最终轮次的模型 # 定义演化文件路径 evolve_yaml, evolve_csv = save_dir / 'hyp_evolve.yaml', save_dir / 'evolve.csv' if opt.bucket: # 如果指定了云存储桶 # 下载已有的 evolve.csv 文件 os.system(f'gsutil cp gs://{opt.bucket}/evolve.csv {save_dir}') # 下载 evolve.csv(如果存在) # 进行指定轮数的超参数演化 for _ in range(opt.evolve): # 迭代演化的代数 if evolve_csv.exists(): # 如果 evolve.csv 存在,选择最佳超参数并进行变异 # 选择父代 parent = 'single' # 父代选择方法:'single' 或 'weighted' x = np.loadtxt(evolve_csv, ndmin=2, delimiter=',', skiprows=1) # 加载演化结果 n = min(5, len(x)) # 考虑的上一个结果的数量 x = x[np.argsort(-fitness(x))][:n] # 按适应度排序,选择前 n 个变异 w = fitness(x) - fitness(x).min() + 1E-6 # 计算权重(确保和大于0) # 根据选择方法选取父代 if parent == 'single' or len(x) == 1: # x = x[random.randint(0, n - 1)] # 随机选择 x = x[random.choices(range(n), weights=w)[0]] # 基于权重选择 elif parent == 'weighted': x = (x * w.reshape(n, 1)).sum(0) / w.sum() # 加权组合 # 进行变异 mp, s = 0.8, 0.2 # 变异概率,标准差 npr = np.random npr.seed(int(time.time())) # 设置随机种子 g = np.array([meta[k][0] for k in hyp.keys()]) # 获取增益,范围 0-1 ng = len(meta) # 元数据中的超参数数量 v = np.ones(ng) # 初始化变异量 # 确保变异发生,避免重复 while all(v == 1): # 在没有变化时继续变异 v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0) # 应用变异 for i, k in enumerate(hyp.keys()): # 遍历超参数 hyp[k] = float(x[i + 7] * v[i]) # 变异超参数 # 限制超参数在预设范围内 for k, v in meta.items(): hyp[k] = max(hyp[k], v[1]) # 限制下限 hyp[k] = min(hyp[k], v[2]) # 限制上限 hyp[k] = round(hyp[k], 5) # 保留五位有效数字 # 训练变异后的模型 results = train(hyp.copy(), opt, device, callbacks) # 写入变异结果 print_mutation(results, hyp.copy(), save_dir, opt.bucket) # 绘制结果图表 plot_evolve(evolve_csv) print(f'Hyperparameter evolution finished\n' f"Results saved to {colorstr('bold', save_dir)}\n" f'Use best hyperparameters example: $ python train.py --hyp {evolve_yaml}') def run(**kwargs): # 用法示例: import train; train.run(data='coco128.yaml', imgsz=320, weights='yolov5m.pt') opt = parse_opt(True) # 解析命令行参数并返回选项对象 for k, v in kwargs.items(): # 遍历关键字参数 setattr(opt, k, v) # 将每个参数设置到选项对象中 main(opt) # 调用主函数,传入选项对象 if __name__ == "__main__": opt = parse_opt() main(opt)

最新推荐

recommend-type

【轴承故障诊断】基于融合鱼鹰和柯西变异的麻雀优化算法OCSSA-VMD-CNN-BILSTM轴承诊断研究【西储大学数据】(Matlab代码实现)

【轴承故障诊断】基于融合鱼鹰和柯西变异的麻雀优化算法OCSSA-VMD-CNN-BILSTM轴承诊断研究【西储大学数据】(Matlab代码实现)内容概要:本文提出了一种基于融合鱼鹰和柯西变异的麻雀优化算法(OCSSA)优化变分模态分解(VMD)参数,并结合卷积神经网络(CNN)与双向长短期记忆网络(BiLSTM)的轴承故障诊断模型。该方法利用西储大学公开的轴承数据集进行验证,通过OCSSA算法优化VMD的分解层数K和惩罚因子α,有效提升信号分解精度,抑制模态混叠;随后利用CNN提取故障特征的空间信息,BiLSTM捕捉时间序列的动态特征,最终实现高精度的轴承故障分类。整个诊断流程充分结合了信号预处理、智能优化与深度学习的优势,显著提升了复杂工况下轴承故障诊断的准确性与鲁棒性。; 适合人群:具备一定信号处理、机器学习及MATLAB编程基础的研究生、科研人员及从事工业设备故障诊断的工程技术人员。; 使用场景及目标:①应用于旋转机械设备的智能运维与故障预警系统;②为轴承等关键部件的早期故障识别提供高精度诊断方案;③推动智能优化算法与深度学习在工业信号处理领域的融合研究。; 阅读建议:建议读者结合MATLAB代码实现,深入理解OCSSA优化机制、VMD参数选择策略以及CNN-BiLSTM网络结构的设计逻辑,通过复现实验掌握完整诊断流程,并可进一步尝试迁移至其他设备的故障诊断任务中进行验证与优化。
recommend-type

民营企业融资风险管理的分析与建议.doc.doc

民营企业融资风险管理的分析与建议.doc.doc
recommend-type

霸王茶姬运营分析:数据驱动的销售与用户策略

资源摘要信息:"《霸王茶姬店铺运营分析》报告分析框架介绍" 报告的标题《霸王茶姬店铺运营分析》以及描述指出了报告的核心内容是针对新中式茶饮品牌“霸王茶姬”的运营状况进行深入分析,其目的在于通过数据分析提升销售业绩、优化产品组合、增强用户粘性,并为运营策略提供数据支持。以下为报告的详细知识点: 1. 市场分析: - 新中式茶饮品牌霸王茶姬在市场上拥有良好的口碑,原因在于其高品质原料和独特口感。 - 面临激烈的市场竞争和消费者需求多样化,霸王茶姬需要明确其市场定位,以及如何在竞争中脱颖而出。 2. 销售与用户研究: - 分析销售数据、用户画像、产品表现和市场营销效果,旨在精细化管理运营策略,促进持续发展。 - 用户画像分析包括会员用户占比、用户年龄和性别分布、复购率与用户忠诚度、购买渠道占比等。 3. 数据分析方法: - 使用Python作为主要分析工具,实现数据的描述性统计和可视化分析。 - 数据处理涵盖数据清洗、缺失值处理和异常值检测,以确保分析结果的准确性。 4. 销售数据可视化: - 通过日/周/月销售额趋势图、各门店销售额对比柱状图、订单量与客单价分析饼图等图表形式,直观展示销售数据。 5. 销售数据分析结果: - 日销售额趋势显示周末销售额显著高于工作日,尤其以周六为最高峰。 - 月度销售额在夏季(6-8月)达到高峰,冬季(12-2月)相对较低。 - A门店销售额最高,占比30%,B门店和C门店销售额相近,分别占25%和20%。 - 平均客单价为35元,订单量高峰出现在下午2-5点。 6. 产品销售分析: - 分析各产品销量排名、爆款产品与滞销产品,并探讨组合购买情况及季节性产品销量趋势。 7. 结论与建议: - 根据分析得出的核心发现,提出针对性的运营优化策略和市场营销建议。 - 针对如何增长销售额、提升用户粘性、优化产品组合、提高运营效率及市场策略优化等方面,给出明确的结论和建议。 报告的内容与结构突显了数据驱动决策的重要性,并展示了如何利用数据分析方法来解决实际业务问题,从而为企业决策层提供科学的决策依据。通过对霸王茶姬店铺运营的深入分析,报告意在帮助企业识别市场机会,规避风险,优化运营流程,并最终实现业绩的增长。
recommend-type

【Altium Designer从入门到精通】:揭秘9大核心模块与PCB设计底层逻辑(新手必看)

# Altium Designer:从设计启蒙到系统级协同的进阶之路 在今天这个电子产品迭代速度堪比摩尔定律的时代,一块PCB板早已不再是“连线+焊盘”的简单堆叠。它承载着高速信号、精密电源、严苛EMI控制和复杂热管理的多重使命。而Altium Designer(简称AD),正是这样一位能陪你从初学者成长为系统架构师的“电子设计伴侣”。 我们不妨先抛开那些教科书式的目录划分——什么“第1章”、“第2节”,真正重要的是**理解这套工具背后的工程思维**。它不只是让你画出一张漂亮的图纸,而是教会你如何构建一个**可验证、可复用、可量产**的完整设计体系。 所以,让我们以一种更自然的方式展开这
recommend-type

相位恢复算法

### 相位恢复算法概述 #### 原理 相位恢复是指从测量的幅度信息中恢复原始信号的相位信息的过程。这一过程通常涉及到复杂的优化问题,因为相位信息通常是不可直接获取的。基于迭代最近点 (ICP) 的全场相位恢复算法能够有效地解决相位 unwrapping 问题,并实现高精度、高效率的相位恢复[^1]。 对于具体的物理机制而言,相位梯度对应波前的倾斜,这会导致光能量在横向的重新分布。相位的变化会引发衍射效应,进而影响 \(z\) 方向上强度的变化[^4]。 #### 实现方法 一种常见的实现方式是通过迭代算法逐步逼近真实的相位分布。例如,在 MATLAB 中可以使用如下代码来实现 ICP
recommend-type

C#编程语言的全面教程:基础语法与面向对象编程

资源摘要信息:"C#语言教程介绍" C#(读作“C Sharp”)是由微软公司于2000年推出的一种现代化面向对象编程语言,其设计目的是为了能够开发出具有复杂功能的软件组件,并且能够在微软的.NET平台上运行。C#语言以其简洁、面向对象、类型安全等特点,迅速成为开发Windows应用程序、Web服务、游戏以及跨平台解决方案的热门选择。 一、环境搭建 在正式开始学习C#编程之前,必须首先搭建好开发环境。通常情况下,开发者会优先考虑使用微软官方提供的Visual Studio集成开发环境(IDE),它适合从简单的学习项目到复杂的应用开发。Visual Studio提供了代码编辑、调试以及多种工具集,极大地提高了开发效率。 除了IDE,还需要安装.NET软件开发工具包(SDK),它是运行和构建C#程序所必需的。.NET SDK不仅包括.NET运行时,还包含用于编译和管理C#项目的一系列命令行工具和库。 二、C#基础语法 1. 命名空间与类 C#使用`using`关键字来引入命名空间,这对于使用类库和模块化代码至关重要。例如,使用`using System;`可以让程序访问`System`命名空间下的所有类,比如`Console`类。 类是C#中定义对象蓝图的核心,使用`class`关键字来声明。类可以包含字段、属性、方法和其他类成员,这些成员共同定义了类的行为和数据。 2. 变量与数据类型 在C#中,变量是用于存储数据值的基本单元。在使用变量之前,必须声明它并指定数据类型。C#支持多种基本数据类型,如整数(`int`)、浮点数(`double`)、字符(`char`)和布尔值(`bool`)。此外,C#还支持更复杂的数据类型,比如字符串(`string`)和数组。 3. 控制流语句 控制流语句用于控制程序的执行路径。它们能够根据条件判断来决定执行哪部分代码,或者通过循环重复执行某段代码。常用的控制流语句有: - `if`语句,用于基于条件表达式的结果执行代码块。 - `for`循环,用于按照一定次数重复执行代码块。 - `while`循环,根据条件表达式的结果循环执行代码块。 - `switch`语句,用于根据不同的条件执行不同的代码块。 三、面向对象编程(OOP) C#是一种纯粹的面向对象编程语言,它提供了类和对象的概念来支持面向对象的编程范式。 1. 类与对象 类在C#中是对象的蓝图或模板。一个类定义了一个对象的结构(数据成员)和行为(方法成员)。对象是类的实际实例,通过调用类的构造函数来创建。 2. 构造函数 构造函数是一种特殊的方法,它的名称与类名相同,并且在创建类的新对象时自动调用。构造函数负责初始化对象的状态。 3. 封装、继承与多态 封装是指将对象的实现细节隐藏起来,并向外界提供访问对象状态和行为的接口。 继承允许一个类(称为子类)继承另一个类(称为父类)的属性和方法,以此来重用代码和实现层级结构。 多态允许不同类的对象以统一的接口进行交互,并且可以在运行时确定要调用的方法的具体实现。 四、高级特性 C#提供了丰富的高级特性,这些特性使得C#更加灵活和强大。 1. 泛型与集合 泛型允许开发者编写与特定数据类型无关的代码,这使得同一个算法或方法能够应用于不同的数据类型,同时还能保持类型安全。 C#提供了丰富的集合类型,比如数组、列表(`List<T>`)、队列(`Queue<T>`)、栈(`Stack<T>`)和字典(`Dictionary<TKey,TValue>`)等,这些集合类型帮助开发者更高效地管理数据集合。 2. 异常处理 C#通过异常处理机制为开发者提供了处理程序运行时错误的方法。异常可以在检测到错误时抛出,并且在程序的其他部分捕获和处理。 3. Lambda表达式与LINQ Lambda表达式提供了一种简洁的定义匿名方法的方式,它们在C#的许多高级特性中都有应用。 LINQ(语言集成查询)是C#的一个强大特性,它提供了一种一致的方法来查询和处理数据,无论数据是存储在数据库中、XML文件中还是内存中的集合。 五、并发编程 在多核处理器时代,并发编程变得异常重要。C#通过多种方式支持并发编程,例如提供线程的基础操作、线程池和任务并行库(TPL)等。 任务并行库简化了并行编程,它允许开发者轻松地执行并行任务和并行化循环操作。异步编程是C#的另一个重要特性,特别是async和await关键字的引入,它们使得异步代码的编写更加直观和简洁。 此外,C#还支持并发集合和原子操作,这些是实现线程安全集合和高效同步机制的重要工具。 总结而言,C#语言结合了面向对象的强大功能和现代编程语言的许多便捷特性,使其在各种类型的软件开发中成为了一个非常流行和实用的选择。通过不断学习和实践C#语言的基础和高级特性,开发者能够有效地创建各种高性能的应用程序。
recommend-type

【通达信行情推送机制揭秘】:基于回调的异步数据处理模型优化方案

# 通达信高频行情引擎:从回调机制到AI预知的全链路实战 在A股量化交易的“毫秒生死战”中,谁能更快地看到盘口异动、更早触发策略信号,谁就掌握了超额收益的钥匙。然而,当你的策略还在用轮询方式拉取数据时,对手早已通过**事件驱动 + 零拷贝 + 协程流水线**构建了微秒级响应系统——这正是我们今天要深挖的技术战场。 想象这样一个场景:某只股票突然出现连续大单扫货,从第一笔成交到你收到Tick推送,中间究竟经历了多少层“阻塞”?是SDK回调卡在主线程?还是解码过程反复`memcpy`拖慢节奏?亦或是因子计算和信号判定串行执行导致延迟堆积? 别急,这篇文章不讲空洞理论,咱们直接上硬核干货——带
recommend-type

卷积加速算法

### 卷积加速算法概述 在深度学习领域,卷积操作是卷积神经网络(CNNs)中最耗时的部分之一。为了提高效率,研究者们提出了多种卷积加速算法来优化这一过程。 #### Winograd 算法 Winograd 算法专门针对小尺寸卷积核设计,旨在减少计算复杂度。该方法通过重新排列输入矩阵和滤波器矩阵的方式减少了所需的乘法次数。具体来说,在标准卷积中,对于大小为 \(n \times n\) 的图像块与同样大小的卷积核做卷积需要执行大量的浮点数相乘累加运算;而采用 Winograd 变换后,则可以有效地降低这些冗余计算[^1]。 ```python import numpy as np
recommend-type

赵致琢教授探讨中国计算机科学教育的发展策略

资源摘要信息:《中国计算机科学专业教育发展道路的思考》 知识点一:计算机科学教育的发展与挑战 随着计算机科学的飞速发展,学科专业办学面临诸多挑战。例如,计算机科学从“前科学”时代向成熟学科的过渡使得学科知识体系日渐庞大且复杂。这要求高校在计算机科学教育过程中采用更加合理和科学的办学策略,适应社会多样化的需求。 知识点二:分层次分类办学的策略 报告提出了分层次分类办学作为应对当前教育挑战的关键策略。这一策略涉及在研究生教育和本科教育中设立不同的培养目标和课程体系,以培养不同类型的计算机科学人才,如创新人才、应用技术开发人才和职业技术人才。 知识点三:学科专业教育的重新定位 高等教育中存在办学定位模糊的问题,导致教育资源分配不合理。因此,赵教授建议高校需要明确自身定位,根据学科专业教学的要求,分类开展教学活动,避免盲目追求规模扩大而忽视教育质量。 知识点四:专业认证的重要性 赵教授强调专业认证的重要性。通过专业认证体系,可以保证教育质量,确保培养的人才满足社会的需求和标准,从而提升学科专业的社会认可度。 知识点五:教学改革实践经验 厦门大学在计算机科学本科教学改革方面提供了实践经验。例如,通过强化数学基础和增加实践课程的比重,厦门大学成功地提升了教育质量,并取得了显著成效。这些经验对其他高校具有借鉴意义。 知识点六:教育改革的本土化与国际合作 赵教授指出,中国高等教育改革应该立足本土文化,借鉴国外的先进经验和教育理念。通过校际协作,可以提升师资水平,推动教育质量的整体提升。这表明国际交流与合作对于学科建设与教学改革具有重要意义。 知识点七:构建学科人才培养的科学体系 为应对教育挑战,需要全面建立学科人才培养的科学体系,包括科学理论体系、示范教育基地和质量保障体系。这三个体系是确保教育质量和可持续发展的基础。 知识点八:问题根源的深入分析 报告进一步分析了当前计算机科学教育问题的根源,包括宏观决策上的缺失、微观运行中的混乱以及外部环境问题。这些问题导致了教育资源配置的不合理和教学效率的低下。 知识点九:师资队伍建设的重要性 赵教授提到,当前师资队伍存在不足,大部分高校需要提升师资的起点和质量。师资队伍的建设是提高教育质量的关键,需要从选拔、培养到评价等多方面进行系统的改革和创新。 知识点十:对未来的展望与选择 在总结前人经验和分析现状的基础上,赵教授呼吁对高等教育和科学技术未来的发展道路做出正确的选择,强调科学、理智和质量的重要性,并强调了中国高等教育改革需要立足本国传统文化根基,同时借鉴国外先进经验,进行系统的变革。 综合以上内容,赵致琢教授的报告不仅深入分析了当前中国计算机科学专业教育所面临的挑战,还提出了具有实践价值的应对策略,强调了教育改革的必要性和紧迫性。报告内容丰富,为当前和未来的计算机科学教育提供了宝贵的参考和指导。
recommend-type

深入通达信导出函数内幕(cdecl vs stdcall调用约定深度剖析)

# 通达信函数调用的底层密码:从栈帧到云原生的跨越 在金融软件的世界里,通达信就像一座神秘而坚固的城堡。它的DLL文件是城门,导出函数是守卫,而每一个试图与之通信的第三方程序,都必须说出正确的“暗号”——否则轻则被拒之门外,重则引发整个进程崩溃💥。 你有没有试过写一个看似完美的Python脚本去调用`TdxWll.dll`里的`GetDataRealTime`?结果刚一运行,弹窗就告诉你:“内存访问冲突”? 或者更诡异的是,函数明明返回了数据,但下一次调用时却像喝醉了一样胡言乱语,指针乱飞? 别急,这不是你的代码写得烂,而是你没摸清这座城堡真正的通关密语:**调用约定(Callin