ResNet18

Overview

We will walk you through the process of using OwLite to quantize a PyTorch model. As our example, we will quantize the ResNet18 model from torchvision.

This tutorial aims to familiarize you with the OwLite package and its features. We will not focus on accuracy in this tutorial, as our primary goal is to demonstrate how to use OwLite.

Prerequisites

To follow this tutorial, you should be familiar with the following:

Python programming language
PyTorch framework
Basic concepts of quantization

Let's get started!

1. Install OwLite

1-1. Create an account and install OwLite.

Create an account at OwLite Web UI and install the OwLite Python library using PyPi.

pip install owlite --extra-index-url https://pypi.squeezebits.com/

1-2. Login with CLI.

owlite login

2. Model Setup

import torch 
import torchvision

batch_size = 64 device = "cuda" if torch.cuda.is_available() else "cpu"

traindataset = torchvision.datasets.FakeData(256, (3, 224, 224), 1000, torchvision.transforms.ToTensor()) trainloader = torch.utils.data.DataLoader(traindataset, batchsize=batch_size)

model = torchvision.models.resnet18(pretrained=True) model.to(device)

This step sets up the essential environment for model compression. It imports the necessary Python packages, sets the input shape, and creates a PyTorch model.

3. Upload baseline

import owlite

owl = owlite.init(project="Tutorial", baseline="resnet18")

# Convert the baseline model
example_input = torch.randn(batch_size, 3, 224, 224)
model = owl.convert(model, example_input)

# Export the baseline model to ONNX
owl.export(model)

# Benchmark the baseline model
owl.benchmark()

# View the baseline settings and create a compression configuration on the OwLite website

4. Quantize your model on the web

Now, please move to the project and baseline you just uploaded. Press the plus icon to create a new experiment for setting the compression configuration.

Set compression configuration for the experiment. Try the OpType setting on the side tab or the Layer setting by clicking on the layer. If you are the user above the free plan, try our recommended setting to create a compression configuration.

Once you have applied the recommendation setting or any other changes, click the save icon to save the compression configuration.

5. Quantize and Upload Experiment

Next, please include your experiment name in the code.

import owlite

owl = owlite.init(project="Tutorial", baseline="resnet18", experiment="quantized")

# Convert the model for quantization
model = owl.convert(model, example_input)

# Apply quantization through calibration
with owlite.calibrate(model):
    for i, (img, _) in enumerate(train_loader):
        model(img.to(device))
        if i >= (256 // batch_size) - 1: # Use only 256 images for calibration
            break

# Export the quantized model to ONNX
owl.export(model)

# Benchmark the quantized model
owl.benchmark()

This step quantizes the model using the compression configuration created in the previous step. The calibrated model uses a subset of the training data. OwLite benchmarks the quantized model to determine its performance.

6. Review your results

The OwLite website provides various tools for visualizing and analyzing quantization results. You can use these tools to compare the performance of the baseline and quantized models, identify areas where quantization may have negatively impacted performance, and adjust the quantization settings to improve performance.

Also, you can get every baseline and experiment's ONNX and TensorRT files from their directory.

Free plan users can review the benchmark result and explore visualized models but are limited to getting the actual ONNX and TensorRT files.

Additional information

For more information on OwLite's features and options, please refer to the OwLite website and documentation.

Note:

Quantization may degrade model performance, so it is essential to check the benchmark results and adjust the quantization settings to find the best trade-off between performance and size.
OwLite supports a variety of models and quantization techniques. You should adjust the settings appropriately depending on the model and technique used.

I hope this tutorial provides a basic understanding of PyTorch model quantization and how to use OwLite to improve model performance.

Search

OwLite Help Center