owlite
method convert
python
convert(model: Module, *args: Any, **kwargs: Any) → GraphModule
Convert the model into a torch.fx.GraphModule
object using the example input(s) provided.
{% hint style="warning" %}
The example input(s) provided for owl.convert
will also be used by owl.export
for the ONNX and Engine conversion afterward. Therefore, it is crucial to provide appropriate example input(s) to ensure the correct behavior of your model.
{% endhint %}
Args:
model
(torch.nn.Module
): The model to be compressed. Note that it must be an instance oftorch.nn.Module
, but nottorch.nn.DataParallel
ortorch.nn.DistributedDataParallel
. See troubleshooting - Models wrapped withtorch.nn.DataParallel
ortorch.nn.parallel.DistributedDataParallel
for more details.*args
: the example input(s) that would be passed to the model's forward method.**kwargs
: the example input(s) that would be passed to the model's forward method. > These example inputs are required to convert the model into atorch.fx.GraphModule
instance. Each input must be one of the following:- A
torch.Tensor
object - A tuple of
torch.Tensor
objects - A dictionary whose keys are strings and values are
torch.Tensor
objects.
- A
Returns:
GraphModule
: Thetorch.fx.GraphModule
object converted from themodel
.
Raises:
HTTPError
: When request for compression configuration was not successful.
Behavior in each mode
owl.convert
behaves differently depending on the mode triggered by owlite.init
.
Baseline Mode: In this mode,
owl.convert
traces the input model with the example input(s).Experiment Mode: In this mode, the converted
torch.fx.GraphModule
object will be further modified according to the compression configuration from the experiment. This configuration could have been created by the user on the OwLite website, or copied from another experiment (in 'duplicate from' mode). If there's no compression configuration, it returns the same model as in baseline mode. For dynamic batch size baseline model without compression, create an experiment.
Workflow:
owl.convert
goes through the following steps:
Conversion: it converts the input model to the format configurable by OwLite, namely to a torch.fx.GraphModule instance, with the example input(s) provided via
*args
and**kwargs
. This procedure might fail depending on your model's implementation and the torch.compile's coverage in the PyTorch version you're using. If this is the case, you may need to find and fix the causes of the failure provided by the error message.Compression: In the experiment mode, it further compresses the converted model if the experiment's compression configuration exists. Keep in mind that you must setup the compression configuration via OwLite Web UI before running the
owl.convert
in order to compress your model.
Examples:
Baseline Mode
```python import owlite import torch
owl = owlite.init(project="testProject", baseline="sampleModel")
Create a sample model
class SampleModel(torch.nn.Module): def init(self): super().init() self.conv1 = torch.nn.Conv2d(3, 64, 3) self.pool1 = torch.nn.MaxPool2d(2, 2) self.conv2 = torch.nn.Conv2d(64, 128, 3) self.pool2 = torch.nn.MaxPool2d(2, 2) self.fc1 = torch.nn.Linear(128 * 7 * 7, 10)
Create a model instance
model = SampleModel()
Convert the model
model = owl.convert(model, torch.randn(4,3,64,64))
Print the model
print(model) ```
This code will create a sample model, convert it to a GraphModule in baseline mode, and export it to ONNX. The output of the code is as follows:
``` OwLite [INFO] Connected device: NVIDIA RTX A6000 OwLite [WARNING] Existing local directory found at /home/sqzb/workspace/owlite/testProject/sampleModel/sample Model. Continuing this code will overwrite the data OwLite [INFO] Created new project 'testProject' OwLite [INFO] Created new baseline 'sampleModel' at project 'testProject' OwLite [INFO] Converted the model GraphModule( (selfconv1): Conv2d(3, 64, kernelsize=(3, 3), stride=(1, 1)) (selfpool1): MaxPool2d(kernelsize=2, stride=2, padding=0, dilation=1, ceilmode=False) (selfconv2): Conv2d(64, 128, kernelsize=(3, 3), stride=(1, 1)) (selfpool2): MaxPool2d(kernelsize=2, stride=2, padding=0, dilation=1, ceilmode=False) (selffc1): Linear(infeatures=6272, out_features=10, bias=True) )
def forward(self, x : torch.Tensor): sqzbmoduledevicecanary = self.sqzbmoduledevicecanary getattr1 = sqzbmoduledevicecanary.device; sqzbmoduledevicecanary = None selfconv1 = self.selfconv1(x); x = None relu = torch.nn.functional.relu(selfconv1); selfconv1 = None selfpool1 = self.selfpool1(relu); relu = None selfconv2 = self.selfconv2(selfpool1); selfpool1 = None relu1 = torch.nn.functional.relu(selfconv2); selfconv2 = None selfpool2 = self.selfpool2(relu1); relu1 = None view = selfpool2.view(-1, 6272); selfpool2 = None selffc1 = self.selffc1(view); view = None outputadapter = owlitebackendfxtraceoutputadapter((selffc1,)); selffc1 = None return output_adapter ```
Experiment Mode
```python import torch
owl = owlite.init(project="testProject", baseline="sampleModel", experiment="conv")
Create a sample model
class SampleModel(torch.nn.Module): def init(self): super().init() self.conv1 = torch.nn.Conv2d(3, 64, 3) self.pool1 = torch.nn.MaxPool2d(2, 2) self.conv2 = torch.nn.Conv2d(64, 128, 3) self.pool2 = torch.nn.MaxPool2d(2, 2) self.fc1 = torch.nn.Linear(128 * 7 * 7, 10)
def forward(self, x):
x = self.conv1(x)
x = torch.nn.functional.relu(x)
x = self.pool1(x)
x = self.conv2(x)
x = torch.nn.functional.relu(x)
x = self.pool2(x)
x = x.view(-1, 128 * 7 * 7)
x = self.fc1(x)
return x
Create a model instance
model = SampleModel()
Convert the model
model = owl.convert(model, torch.randn(4, 3, 64, 64))
Print the model
print(model) ```
This code will create a sample model, convert it to a GraphModule in experiment mode, and apply the compression configuration from the init
function. The output of the code is as follows:
``` OwLite [INFO] Connected device: NVIDIA RTX A6000 OwLite [INFO] Experiment data will be saved in /home/sqzb/workspace/owlite/testProject/sampleModel/conv OwLite [INFO] Loaded existing project 'testProject' OwLite [INFO] Existing compression configuration for 'conv' found OwLite [INFO] Model conversion initiated OwLite [INFO] Compression configuration found for 'conv' OwLite [INFO] Applying compression configuration OwLite [INFO] Converted the model GraphModule( (selfconv1): QConv2d( 3, 64, kernelsize=(3, 3), stride=(1, 1) (weightquantizer): FakeQuantizer(ste(precision: 8, perchannel, quantmin: -127, quantmax: 127, isenabled: True, calib: AbsmaxCalibrator)) (inputquantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, q zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) ) (selfpool1): MaxPool2d(kernelsize=2, stride=2, padding=0, dilation=1, ceilmode=False) (selfconv2): QConv2d( 64, 128, kernelsize=(3, 3), stride=(1, 1) (weightquantizer): FakeQuantizer(ste(precision: 8, perchannel, quantmin: -127, quantmax: 127, isenabled: True, calib: AbsmaxCalibrator)) (inputquantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) ) (selfpool2): MaxPool2d(kernelsize=2, stride=2, padding=0, dilation=1, ceilmode=False) (selffc1): QLinear( infeatures=6272, outfeatures=10, bias=True (weightquantizer): FakeQuantizer(ste(precision: 8, perchannel, quantmin: -127, quantmax: 127, isenabled: True, calib: AbsmaxCalibrator)) (inputquantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) ) (selfconv10quantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) (selfpool10quantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) (selfconv20quantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) (selfpool20quantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) (selffc10quantizer): FakeQuantizer(ste(precision: 8, pertensor, quantmin: -128, quantmax: 127, zeropoint: 0.0, iszeropointfolded: False, isenabled: True, calib: AbsmaxCalibrator)) )
def forward(self, x : torch.Tensor): selfconv10quantizer = self.selfconv10quantizer(x); x = None selfconv1 = self.selfconv1(selfconv10quantizer); selfconv10quantizer = None relu = torch.nn.functional.relu(selfconv1); selfconv1 = None selfpool10quantizer = self.selfpool10quantizer(relu); relu = None selfpool1 = self.selfpool1(selfpool10quantizer); selfpool10quantizer = None selfconv20quantizer = self.selfconv20quantizer(selfpool1); selfpool1 = None selfconv2 = self.selfconv2(selfconv20quantizer); selfconv20quantizer = None relu1 = torch.nn.functional.relu(selfconv2); selfconv2 = None selfpool20quantizer = self.selfpool20quantizer(relu1); relu1 = None selfpool2 = self.selfpool2(selfpool20quantizer); selfpool20quantizer = None view = selfpool2.view(-1, 6272); selfpool2 = None selffc10quantizer = self.selffc10quantizer(view); view = None selffc1 = self.selffc1(selffc10quantizer); selffc10quantizer = None outputadapter = owlitebackendfxtraceoutputadapter((selffc1,)); selffc1 = None return outputadapter ```
Updated: 2024-06-13T23:42:42