Overview
We will walk you through the process of using OwLite to quantize a PyTorch model. As our example, we will quantize the ResNet18 model from torchvision.
This tutorial aims to familiarize you with the OwLite package and its features. We will not focus on accuracy in this tutorial, as our primary goal is to demonstrate how to use OwLite.
Prerequisites
To follow this tutorial, you should be familiar with the following:
- Python programming language
- PyTorch framework
- Basic concepts of quantization
Let's get started!
1. Install OwLite
1-1. Create an account and install OwLite.
Create an account at OwLite Web UI and install the OwLite Python library using PyPi.
pip install owlite --extra-index-url https://pypi.squeezebits.com/
1-2. Login with CLI.
owlite login
2. Model Setup
import torch
import torchvision
batch_size = 64 device = "cuda" if torch.cuda.is_available() else "cpu"
traindataset = torchvision.datasets.FakeData(256, (3, 224, 224), 1000, torchvision.transforms.ToTensor()) trainloader = torch.utils.data.DataLoader(traindataset, batchsize=batch_size)
model = torchvision.models.resnet18(pretrained=True) model.to(device)
This step sets up the essential environment for model compression. It imports the necessary Python packages, sets the input shape, and creates a PyTorch model.
3. Upload baseline
import owlite
owl = owlite.init(project="Tutorial", baseline="resnet18")
# Convert the baseline model
example_input = torch.randn(batch_size, 3, 224, 224)
model = owl.convert(model, example_input)
# Export the baseline model to ONNX
owl.export(model)
# Benchmark the baseline model
owl.benchmark()
# View the baseline settings and create a compression configuration on the OwLite website
4. Quantize your model on the web
Now, please move to the project and baseline you just uploaded. Press the plus icon to create a new experiment for setting the compression configuration.
Set compression configuration for the experiment. Try the OpType setting on the side tab or the Layer setting by clicking on the layer. If you are the user above the free plan, try our recommended setting to create a compression configuration.
Once you have applied the recommendation setting or any other changes, click the save icon to save the compression configuration.
5. Quantize and Upload Experiment
Next, please include your experiment name in the code.
import owlite
owl = owlite.init(project="Tutorial", baseline="resnet18", experiment="quantized")
# Convert the model for quantization
model = owl.convert(model, example_input)
# Apply quantization through calibration
with owlite.calibrate(model):
for i, (img, _) in enumerate(train_loader):
model(img.to(device))
if i >= (256 // batch_size) - 1: # Use only 256 images for calibration
break
# Export the quantized model to ONNX
owl.export(model)
# Benchmark the quantized model
owl.benchmark()
This step quantizes the model using the compression configuration created in the previous step. The calibrated model uses a subset of the training data. OwLite benchmarks the quantized model to determine its performance.
6. Review your results
The OwLite website provides various tools for visualizing and analyzing quantization results. You can use these tools to compare the performance of the baseline and quantized models, identify areas where quantization may have negatively impacted performance, and adjust the quantization settings to improve performance.
Also, you can get every baseline and experiment's ONNX and TensorRT files from their directory.
Free plan users can review the benchmark result and explore visualized models but are limited to getting the actual ONNX and TensorRT files.
Additional information
For more information on OwLite's features and options, please refer to the OwLite website and documentation.
Note:
- Quantization may degrade model performance, so it is essential to check the benchmark results and adjust the quantization settings to find the best trade-off between performance and size.
- OwLite supports a variety of models and quantization techniques. You should adjust the settings appropriately depending on the model and technique used.
I hope this tutorial provides a basic understanding of PyTorch model quantization and how to use OwLite to improve model performance.