owlite
method benchmark
python
benchmark(
dynamic_input_options: DynamicInputOptions | dict[str, dict[str, int]] | None = None,
download_engine: bool = True
) → None
Execute the benchmark for the converted model on a connected device.
owl.benchmark
uses the ONNX created by owl.export
. The ONNX is sent to the connected device and converted to a engine compatible with the device's runtime, which is benchmarked behind the scenes. If the benchmark finishes successfully, the benchmark summary will be displayed on the terminal. The converted engine file will also be downloaded into the workspace. You can find more information about the benchmark results from the project page in OwLite Web UI.
{% hint style="warning" %}
In general, any model generated by owl.export
can be benchmarked with owl.benchmark
, regardless of whether it is trained or not. Additionally, the model to be benchmarked is already determined when owl.export
is executed.
To ensure accurate latency measurements, especially for quantized models, we strongly recommend using a pre-trained or calibrated model before using owl.export
.
For details on model preparation, please refer to the PYTHON API/OwLite/Export
{% endhint %}
Args:
dynamic_input_options
(dict[str, dict[str, int]]
): By default, the exported model will have the shapes of all input tensors set to exactly match those given when calling convert. To specify axes of tensors as dynamic (i.e. known only at run-time), setdynamic_input_options
to a dictionary with schema:- KEY (
str
): the name of an input tensor. - VALUE (
dict[str, int]
): the dynamic range setting dictionary containing"min"
,"opt"
,"max"
,"test"
dimension size settings.
- KEY (
download_engine
(bool
, optional): Whether to wait until the benchmarking is finished to download the engine. Defaults to True.
Raises:
TypeError
: When themodel
is an instance oftorch.nn.DataParallel
ortorch.nn.DistributedDataParallel
.RuntimeError
: Whendynamic_input_options
is set for baseline benchmark.ValueError
: When invaliddynamic_input_options
is given.
Workflow:
owl.benchmark
goes through the following steps:
Uploading The Model's Weights: It uploads the ONNX weight file to the device manager only for paid plan user.
Creating A Runtime Engine: It converts the model into the runtime engine format (e.g. TensorRT engine) compatible with the device's runtime if necessary.
Benchmarking on Device: It benchmarks the runtime engine on the device associated with the current baseline or experiment. When finished, it return the benchmarking results including latency, which will be displayed on the terminal.
Downloading The Runtime Engine: The runtime engine file will be downloaded to the user's workspace. (paid plan only)
Notes:
Benchmarking Considerations for Free Plan Users
Benchmarking a model typically involves uploading its weight files for the most accurate results. However, if you're on the OwLite free plan, uploading weight files isn't currently supported. To address this, OwLite automatically generates random weights for your model's ONNX graph, allowing you to benchmark without needing your own weights. It's important to keep in mind that benchmarks using randomly generated weights might be less accurate compared to those using your actual model weights.
Interrupting Benchmarking
The benchmarking process can be interrupted at any time by pressing Ctrl+C. This will gracefully terminate the current experiment on your machine and display an exit message.
- Early Interruption: If the interruption occurs before the model weights are uploaded, the benchmarking process on the device will also be aborted.
- Late Interruption: If the interruption occurs after the model weights are uploaded, the benchmarking process will continue on the connected device. In either case, you'll be provided with a URL linking to the OwLite website for further project configuration. > Important Notes: The benchmark will still be accessible on the connected device after interruption, allowing you to resume the process later at your convenience. However, please be aware that manual engine retrieval will not be possible after interrupting the process.
Examples:
Baseline Mode (or Experiment Mode with Static Batch Size)
```python import owlite
Initialize a baseline or experiment
owl = owlite.init(...)
Initialize your model
model = ...
Convert the model
model = owl.convert(model, ...)
Export the model into ONNX
owl.export(model)
Benchmark the model
owl.benchmark() ```
Experiment Mode with Dynamic Batch Size
```python import owlite
Initialize a baseline or experiment
owl = owlite.init(...)
Initialize your model
model = ...
Convert the model
model = owl.convert(model, ...)
Export the model into ONNX with dynamic axis options
owl.export(model, dynamicaxisoptions={"x": 0})
Benchmark the model with dynamic input options
owl.benchmark(dynamicinputoptions={"x": {"min": 1, "opt": 4, "max": 8, "test": 5}}) ```
OwLite [INFO] Benchmark initiated for the experiment 'dynamic' for the baseline 'sampleModel'
in the project 'testProject'
OwLite [INFO] Benchmark requested on NVIDIA RTX A6000 [TensorRT]
OwLite [INFO] Polling for benchmark result. You are free to Ctrl+C away. When it is done, you can find the
results at https://owlite.ai/project/detail/94af0e4c784fb1f
Your position in the queue: 0
OwLite [INFO] Uploading ONNX model weight to optimize the engine
OwLite [INFO] Uploading /home/sqzb/workspace/owlite/testProject/sampleModel/dynamic/
testProject_sampleModel_dynamic.bin
100%|█████████████████████████████████████████████████████████████████████████████████| 541k/541k
[00:00<00:00, 2.26MiB/s]
OwLite [INFO] Uploading done
[.........🦉..........]
Benchmarking done
OwLite [INFO] Experiment: dynamic
Latency: 0.0245361 (ms) on NVIDIA RTX A6000 [TensorRT]
For more details, visit https://owlite.ai/project/detail/94af0e4c784fb1f
OwLite [INFO] Downloading file at
/home/sqzb/workspace/owlite/testProject/sampleModel/dynamic/testProject_sampleModel_dynamic.engine
100%|█████████████████████████████████████████████████████████████████████████████████| 554k/554k
[00:00<00:00, 9.51MiB/s]
OwLite [INFO] Downloading done
OwLite will create the engine file with the hierarchical structure below.
- owlite
- testProject
- SampleModel
- dynamic
- testProject_SampleModel_dynamic.onnx # created by owlite.export()
- testProject_SampleModel_dynamic.bin # created by owlite.export()
- testProject_SampleModel_dynamic.engine
Free plan user
However, please note that the Free plan does not allow you to export TensorRT engine files with the model's weights. Instead, a random weight engine will be created and you can only query its latency. You will not be able to get the generated engine.
OwLite [INFO] Benchmark initiated for the experiment 'dynamic' for the baseline '"sampleModel"'
in the project 'testProject'
OwLite [INFO] Benchmark requested
OwLite [INFO] Polling for benchmark result. You are free to Ctrl+C away. When it is done,
you can find the results at https://owlite.ai/project/detail/94af0e4c784fb1f
[.........🦉..........]
Benchmarking done
OwLite [INFO] Experiment: dynamic
Latency: 0.0327148 (ms) on NVIDIA RTX A6000
For more details, visit https://owlite.ai/project/detail/94af0e4c784fb1f
OwLite [INFO] The free plan doesn't support the engine download. Upgrade to a higher plan to download
the engine through OwLite with a seamless experience. Even so, OwLite still provides you ONNX
so that you can generate a engine independently
Updated: 2024-06-13T23:42:42