使用Ultralytics YOLO进行模型导出

Ultralytics YOLO生态系统和集成

简介

训练模型的最终目标是将其部署到实际应用中。Ultralytics YOLO11的导出模式提供了多种选项，可以将您训练好的模型导出为不同格式，使其可以在各种平台和设备上部署。本综合指南旨在引导您了解模型导出的细节，展示如何实现最大兼容性和性能。

观看：如何导出自定义训练的Ultralytics YOLO模型并在网络摄像头上运行实时推理。

为什么选择YOLO11的导出模式？

多功能性：可导出为多种格式，包括ONNX、TensorRT、CoreML等。
性能：使用TensorRT可获得高达5倍的GPU加速，使用ONNX或OpenVINO可获得3倍的CPU加速。
兼容性：使您的模型可以在众多硬件和软件环境中通用部署。
易用性：简单的CLI和Python API，可快速直观地导出模型。

导出模式的主要特点

以下是一些突出的功能：

一键导出：简单的命令即可导出为不同格式。
批量导出：导出支持批量推理的模型。
优化推理：导出的模型经过优化，可实现更快的推理时间。
教程视频：深入的指南和教程，确保顺畅的导出体验。

Tip

导出为ONNX或OpenVINO可获得高达3倍的CPU加速。
导出为TensorRT可获得高达5倍的GPU加速。

使用示例

将YOLO11n模型导出为不同格式，如ONNX或TensorRT。有关导出参数的完整列表，请参阅下面的参数部分。

Example

PythonCLI

from ultralytics import YOLO

# 加载模型
model = YOLO("yolo11n.pt")  # 加载官方模型
model = YOLO("path/to/best.pt")  # 加载自定义训练的模型

# 导出模型
model.export(format="onnx")

yolo export model=yolo11n.pt format=onnx      # 导出官方模型
yolo export model=path/to/best.pt format=onnx # 导出自定义训练的模型

参数

此表详细说明了将YOLO模型导出为不同格式的配置和选项。这些设置对于优化导出模型的性能、大小和在各种平台和环境中的兼容性至关重要。正确的配置可确保模型准备好以最佳效率部署到预期的应用中。

Argument	Type	Default	Description
`format`	`str`	`'torchscript'`	Target format for the exported model, such as `'onnx'`, `'torchscript'`, `'engine'` (TensorRT), or others. Each format enables compatibility with different deployment environments.
`imgsz`	`int` or `tuple`	`640`	Desired image size for the model input. Can be an integer for square images (e.g., `640` for 640×640) or a tuple `(height, width)` for specific dimensions.
`keras`	`bool`	`False`	Enables export to Keras format for TensorFlow SavedModel, providing compatibility with TensorFlow serving and APIs.
`optimize`	`bool`	`False`	Applies optimization for mobile devices when exporting to TorchScript, potentially reducing model size and improving inference performance. Not compatible with NCNN format or CUDA devices.
`half`	`bool`	`False`	Enables FP16 (half-precision) quantization, reducing model size and potentially speeding up inference on supported hardware. Not compatible with INT8 quantization or CPU-only exports for ONNX.
`int8`	`bool`	`False`	Activates INT8 quantization, further compressing the model and speeding up inference with minimal accuracy loss, primarily for edge devices. When used with TensorRT, performs post-training quantization (PTQ).
`dynamic`	`bool`	`False`	Allows dynamic input sizes for ONNX, TensorRT and OpenVINO exports, enhancing flexibility in handling varying image dimensions. Automatically set to `True` when using TensorRT with INT8.
`simplify`	`bool`	`True`	Simplifies the model graph for ONNX exports with `onnxslim`, potentially improving performance and compatibility with inference engines.
`opset`	`int`	`None`	Specifies the ONNX opset version for compatibility with different ONNX parsers and runtimes. If not set, uses the latest supported version.
`workspace`	`float` or `None`	`None`	Sets the maximum workspace size in GiB for TensorRT optimizations, balancing memory usage and performance. Use `None` for auto-allocation by TensorRT up to device maximum.
`nms`	`bool`	`False`	Adds Non-Maximum Suppression (NMS) to the exported model when supported (see Export Formats), improving detection post-processing efficiency. Not available for end2end models.
`batch`	`int`	`1`	Specifies export model batch inference size or the maximum number of images the exported model will process concurrently in `predict` mode. For Edge TPU exports, this is automatically set to 1.
`device`	`str`	`None`	Specifies the device for exporting: GPU (`device=0`), CPU (`device=cpu`), MPS for Apple silicon (`device=mps`) or DLA for NVIDIA Jetson (`device=dla:0` or `device=dla:1`). TensorRT exports automatically use GPU.
`data`	`str`	`'coco8.yaml'`	Path to the dataset configuration file (default: `coco8.yaml`), essential for INT8 quantization calibration. If not specified with INT8 enabled, a default dataset will be assigned.
`fraction`	`float`	`1.0`	Specifies the fraction of the dataset to use for INT8 quantization calibration. Allows for calibrating on a subset of the full dataset, useful for experiments or when resources are limited. If not specified with INT8 enabled, the full dataset will be used.

调整这些参数可以根据特定要求自定义导出过程，例如部署环境、硬件限制和性能目标。选择适当的格式和设置对于在模型大小、速度和准确性之间达到最佳平衡至关重要。

导出格式

下表列出了可用的YOLO11导出格式。您可以使用format参数导出为任何格式，例如format='onnx'或format='engine'。您可以直接在导出的模型上进行预测或验证，例如yolo predict model=yolo11n.onnx。导出完成后会显示您的模型的使用示例。

Format	`format` Argument	Model	Metadata	Arguments
PyTorch	-	`yolo11n.pt`	✅	-
TorchScript	`torchscript`	`yolo11n.torchscript`	✅	`imgsz`, `optimize`, `nms`, `batch`, `device`
ONNX	`onnx`	`yolo11n.onnx`	✅	`imgsz`, `half`, `dynamic`, `simplify`, `opset`, `nms`, `batch`, `device`
OpenVINO	`openvino`	`yolo11n_openvino_model/`	✅	`imgsz`, `half`, `dynamic`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
TensorRT	`engine`	`yolo11n.engine`	✅	`imgsz`, `half`, `dynamic`, `simplify`, `workspace`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
CoreML	`coreml`	`yolo11n.mlpackage`	✅	`imgsz`, `half`, `int8`, `nms`, `batch`, `device`
TF SavedModel	`saved_model`	`yolo11n_saved_model/`	✅	`imgsz`, `keras`, `int8`, `nms`, `batch`, `device`
TF GraphDef	`pb`	`yolo11n.pb`	❌	`imgsz`, `batch`, `device`
TF Lite	`tflite`	`yolo11n.tflite`	✅	`imgsz`, `half`, `int8`, `nms`, `batch`, `data`, `fraction`, `device`
TF Edge TPU	`edgetpu`	`yolo11n_edgetpu.tflite`	✅	`imgsz`, `device`
TF.js	`tfjs`	`yolo11n_web_model/`	✅	`imgsz`, `half`, `int8`, `nms`, `batch`, `device`
PaddlePaddle	`paddle`	`yolo11n_paddle_model/`	✅	`imgsz`, `batch`, `device`
MNN	`mnn`	`yolo11n.mnn`	✅	`imgsz`, `batch`, `int8`, `half`, `device`
NCNN	`ncnn`	`yolo11n_ncnn_model/`	✅	`imgsz`, `half`, `batch`, `device`
IMX500	`imx`	`yolov8n_imx_model/`	✅	`imgsz`, `int8`, `data`, `fraction`, `device`
RKNN	`rknn`	`yolo11n_rknn_model/`	✅	`imgsz`, `batch`, `name`, `device`

常见问题

如何将YOLO11模型导出为ONNX格式？

使用Ultralytics将YOLO11模型导出为ONNX格式非常简单。它提供了Python和CLI两种方法来导出模型。

Example

PythonCLI

from ultralytics import YOLO

# 加载模型
model = YOLO("yolo11n.pt")  # 加载官方模型
model = YOLO("path/to/best.pt")  # 加载自定义训练的模型

# 导出模型
model.export(format="onnx")

yolo export model=yolo11n.pt format=onnx      # 导出官方模型
yolo export model=path/to/best.pt format=onnx # 导出自定义训练的模型

有关该过程的更多详细信息，包括处理不同输入大小等高级选项，请参阅ONNX集成指南。

使用TensorRT进行模型导出有哪些好处？

使用TensorRT进行模型导出可以显著提高性能。导出为TensorRT的YOLO11模型可以实现高达5倍的GPU加速，非常适合实时推理应用。

多功能性：针对特定硬件设置优化模型。
速度：通过高级优化实现更快的推理。
兼容性：与NVIDIA硬件无缝集成。

要了解更多关于集成TensorRT的信息，请参阅TensorRT集成指南。

如何在导出YOLO11模型时启用INT8量化？

INT8量化是压缩模型并加速推理的绝佳方法，特别是在边缘设备上。以下是如何启用INT8量化：

Example

PythonCLI

from ultralytics import YOLO

model = YOLO("yolo11n.pt")  # 加载模型
model.export(format="engine", int8=True)

yolo export model=yolo11n.pt format=engine int8=True # 导出带有INT8量化的TensorRT模型

INT8量化可以应用于各种格式，如TensorRT、OpenVINO和CoreML。为获得最佳量化结果，请使用data参数提供具有代表性的数据集。

为什么在导出模型时动态输入大小很重要？

动态输入大小允许导出的模型处理不同的图像尺寸，为不同的使用场景提供灵活性并优化处理效率。在导出为ONNX或TensorRT等格式时，启用动态输入大小可确保模型能够无缝适应不同的输入形状。

要启用此功能，在导出时使用dynamic=True标志：

Example

PythonCLI

from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.export(format="onnx", dynamic=True)

yolo export model=yolo11n.pt format=onnx dynamic=True

动态输入大小对于输入尺寸可能变化的应用特别有用，例如视频处理或处理来自不同来源的图像。

优化模型性能需要考虑哪些关键导出参数？

理解和配置导出参数对于优化模型性能至关重要：

format: 导出模型的目标格式（例如，onnx、torchscript、tensorflow）。
imgsz: 模型输入所需的图像大小（例如，640或(height, width)）。
half: 启用FP16量化，减小模型大小并可能加速推理。
optimize: 应用特定优化，适用于移动或受限环境。
int8: 启用INT8量化，对边缘AI部署非常有益。

对于在特定硬件平台上的部署，请考虑使用专门的导出格式，如NVIDIA GPU的TensorRT、Apple设备的CoreML或Google Coral设备的Edge TPU。

📅 Created 1 year ago ✏️ Updated 0 days ago

使用Ultralytics YOLO进行模型导出

简介

为什么选择YOLO11的导出模式？

导出模式的主要特点

使用示例

参数

导出格式

常见问题

如何将YOLO11模型导出为ONNX格式？

使用TensorRT进行模型导出有哪些好处？

如何在导出YOLO11模型时启用INT8量化？

为什么在导出模型时动态输入大小很重要？

优化模型性能需要考虑哪些关键导出参数？

Comments