owlite

`method` `benchmark`

python benchmark( dynamic_input_options: DynamicInputOptions | dict[str, dict[str, int]] | None = None, download_engine: bool = True ) → None

Execute the benchmark for the converted model on a connected device.

owl.benchmark uses the ONNX created by owl.export. The ONNX is sent to the connected device and converted to a engine compatible with the device's runtime, which is benchmarked behind the scenes. If the benchmark finishes successfully, the benchmark summary will be displayed on the terminal. The converted engine file will also be downloaded into the workspace. You can find more information about the benchmark results from the project page in OwLite Web UI.

{% hint style="warning" %}

In general, any model generated by owl.export can be benchmarked with owl.benchmark, regardless of whether it is trained or not. Additionally, the model to be benchmarked is already determined when owl.export is executed.

To ensure accurate latency measurements, especially for quantized models, we strongly recommend using a pre-trained or calibrated model before using owl.export.

For details on model preparation, please refer to the PYTHON API/OwLite/Export

{% endhint %}

Args:

dynamic_input_options (dict[str, dict[str, int]]): By default, the exported model will have the shapes of all input tensors set to exactly match those given when calling convert. To specify axes of tensors as dynamic (i.e. known only at run-time), set dynamic_input_options to a dictionary with schema:
- KEY (str): the name of an input tensor.
- VALUE (dict[str, int]): the dynamic range setting dictionary containing "min", "opt", "max", "test" dimension size settings.
download_engine (bool, optional): Whether to wait until the benchmarking is finished to download the engine. Defaults to True.

Raises:

TypeError: When the model is an instance of torch.nn.DataParallel or torch.nn.DistributedDataParallel.
RuntimeError: When dynamic_input_options is set for baseline benchmark.
ValueError: When invalid dynamic_input_options is given.

Workflow:

owl.benchmark goes through the following steps:

Uploading The Model's Weights: It uploads the ONNX weight file to the device manager only for paid plan user.
Creating A Runtime Engine: It converts the model into the runtime engine format (e.g. TensorRT engine) compatible with the device's runtime if necessary.
Benchmarking on Device: It benchmarks the runtime engine on the device associated with the current baseline or experiment. When finished, it return the benchmarking results including latency, which will be displayed on the terminal.
Downloading The Runtime Engine: The runtime engine file will be downloaded to the user's workspace. (paid plan only)

Notes:

Benchmarking Considerations for Free Plan Users

Benchmarking a model typically involves uploading its weight files for the most accurate results. However, if you're on the OwLite free plan, uploading weight files isn't currently supported. To address this, OwLite automatically generates random weights for your model's ONNX graph, allowing you to benchmark without needing your own weights. It's important to keep in mind that benchmarks using randomly generated weights might be less accurate compared to those using your actual model weights.

Interrupting Benchmarking

The benchmarking process can be interrupted at any time by pressing Ctrl+C. This will gracefully terminate the current experiment on your machine and display an exit message.

Early Interruption: If the interruption occurs before the model weights are uploaded, the benchmarking process on the device will also be aborted.

Late Interruption: If the interruption occurs after the model weights are uploaded, the benchmarking process will continue on the connected device. In either case, you'll be provided with a URL linking to the OwLite website for further project configuration. > Important Notes: The benchmark will still be accessible on the connected device after interruption, allowing you to resume the process later at your convenience. However, please be aware that manual engine retrieval will not be possible after interrupting the process.

Examples:

Baseline Mode (or Experiment Mode with Static Batch Size)

```python import owlite

Initialize a baseline or experiment

owl = owlite.init(...)

Initialize your model

model = ...

Convert the model

model = owl.convert(model, ...)

Export the model into ONNX

owl.export(model)

Benchmark the model

owl.benchmark() ```

Experiment Mode with Dynamic Batch Size

```python import owlite

Initialize a baseline or experiment

owl = owlite.init(...)

Initialize your model

model = ...

Convert the model

model = owl.convert(model, ...)

Export the model into ONNX with dynamic axis options

owl.export(model, dynamicaxisoptions={"x": 0})

Benchmark the model with dynamic input options

owl.benchmark(dynamicinputoptions={"x": {"min": 1, "opt": 4, "max": 8, "test": 5}}) ```

OwLite [INFO] Benchmark initiated for the experiment 'dynamic' for the baseline 'sampleModel' in the project 'testProject' OwLite [INFO] Benchmark requested on NVIDIA RTX A6000 [TensorRT] OwLite [INFO] Polling for benchmark result. You are free to Ctrl+C away. When it is done, you can find the results at https://owlite.ai/project/detail/94af0e4c784fb1f Your position in the queue: 0 OwLite [INFO] Uploading ONNX model weight to optimize the engine OwLite [INFO] Uploading /home/sqzb/workspace/owlite/testProject/sampleModel/dynamic/ testProject_sampleModel_dynamic.bin 100%|█████████████████████████████████████████████████████████████████████████████████| 541k/541k [00:00<00:00, 2.26MiB/s] OwLite [INFO] Uploading done [.........🦉..........] Benchmarking done OwLite [INFO] Experiment: dynamic Latency: 0.0245361 (ms) on NVIDIA RTX A6000 [TensorRT] For more details, visit https://owlite.ai/project/detail/94af0e4c784fb1f OwLite [INFO] Downloading file at /home/sqzb/workspace/owlite/testProject/sampleModel/dynamic/testProject_sampleModel_dynamic.engine 100%|█████████████████████████████████████████████████████████████████████████████████| 554k/554k [00:00<00:00, 9.51MiB/s] OwLite [INFO] Downloading done

OwLite will create the engine file with the hierarchical structure below.

- owlite - testProject - SampleModel - dynamic - testProject_SampleModel_dynamic.onnx # created by owlite.export() - testProject_SampleModel_dynamic.bin # created by owlite.export() - testProject_SampleModel_dynamic.engine

Free plan user

However, please note that the Free plan does not allow you to export TensorRT engine files with the model's weights. Instead, a random weight engine will be created and you can only query its latency. You will not be able to get the generated engine.

OwLite [INFO] Benchmark initiated for the experiment 'dynamic' for the baseline '"sampleModel"' in the project 'testProject' OwLite [INFO] Benchmark requested OwLite [INFO] Polling for benchmark result. You are free to Ctrl+C away. When it is done, you can find the results at https://owlite.ai/project/detail/94af0e4c784fb1f [.........🦉..........] Benchmarking done OwLite [INFO] Experiment: dynamic Latency: 0.0327148 (ms) on NVIDIA RTX A6000 For more details, visit https://owlite.ai/project/detail/94af0e4c784fb1f OwLite [INFO] The free plan doesn't support the engine download. Upgrade to a higher plan to download the engine through OwLite with a seamless experience. Even so, OwLite still provides you ONNX so that you can generate a engine independently

Updated: 2024-06-13T23:42:42

Search

OwLite Help Center

Benchmark

owlite

`method` `benchmark`

Args:

Raises:

Workflow:

Notes:

Examples:

Initialize a baseline or experiment

Initialize your model

Convert the model

Export the model into ONNX

Benchmark the model

Initialize a baseline or experiment

Initialize your model

Convert the model

Export the model into ONNX with dynamic axis options

Benchmark the model with dynamic input options

Search

OwLite Help Center

owlite

method benchmark

Args:

Raises:

Workflow:

Notes:

Examples:

Initialize a baseline or experiment

Initialize your model

Convert the model

Export the model into ONNX

Benchmark the model

Initialize a baseline or experiment

Initialize your model

Convert the model

Export the model into ONNX with dynamic axis options

Benchmark the model with dynamic input options

`method` `benchmark`