Troubleshooting

This documentation covers common issues that we have seen with the OwLite package.

1. Known issues on OwLite conversion

When using owlite.convert and owlite.export API with a model wrapped with torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel, you might get an error as follows: TypeError: torch.nn.DataParallel is not supported by symbolic trace
To avoid this error, call owlite.convert or owlite.export with the attribute module model.module
After conversion, you can wrap the converted model with torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel again. ``` model = owl.convert(model.module, exampleinputs) model = torch.nn.DataParallel(model, deviceids=[0, 1])
PTQ or QAT

owl.export(model.module) ```

You must feed inputs of the same shapes as the ones fed to owl.convert when you run training / inference for the model returned by owl.convert. Otherwise, you might see a runtime error as follows: RuntimeError: shape '[8, 128, 12, 64]' is invalid for input of size 393216
This error depends on the models, i.e., you may not encounter the above error even if you feed data with different batch sizes or shapes into the models.
For example, the option drop_last = True is recommended to use if data are fed with torch.utils.data.DataLoader, because the last batch generated from torch.utils.data.DataLoader may have the different batch size than batch_size.

If converting a model with owlite.convert API in training mode (i.e. model.train()), you may get an unexpected model.
For example, the nodes related to nn.Dropout or stochastic depth which should be used only in training might be included in the onnx model and TensorRT engine. It will result in unexpected behaviors in inference.
To avoid this error, we suggest to convert a model after changing the status of the model with evaluation mode (i.e. model.eval())

In case of models including loss function (i.e. nn.MSELoss, nn.CrossEntropyLoss, torch.nn.functional.mse_loss) in forward method, you will encounter the failure to build the TensorRT engine while running owlite.benchmark .
To avoid this failure, loss functions should be not be included in its forward method.
Similarly, if you encountered the TensorRT failure to build due to unsupported onnx nodes in TensorRT, we suggest excluding those nodes from the model.

If a model contains multi-head attention modules, its built TensorRT engine may use mha kernel.
However, if the sequence length (seq_len) of any query-key multiplication (i.e. the row or column dimension of an attention matrix) in the model is larger than 512, and INT8 Quantization is applied to any matmul node, its TensorRT engine will encounter the slower latency than that of baseline FP16 engine. It is because INT8 mha kernel is supported in case of seq_len <= 512 to the best of our knowledge.
To avoid this issue, we recommend that you do not apply quantization to any matmul node in the model if it includes any multi-head attention module with seq_len > 512