Troubleshooting
This documentation covers common issues that we have seen with the OwLite package.
1. Known issues on OwLite conversion
Models wrapped with torch.nn.DataParallel
or torch.nn.parallel.DistributedDataParallel
- When using
owlite.convert
andowlite.export
API with a model wrapped withtorch.nn.DataParallel
ortorch.nn.parallel.DistributedDataParallel
, you might get an error as follows:TypeError: torch.nn.DataParallel is not supported by symbolic trace
- To avoid this error, call
owlite.convert
orowlite.export
with the attribute modulemodel.module
- After conversion, you can wrap the converted model with
torch.nn.DataParallel
ortorch.nn.parallel.DistributedDataParallel
again. ``` model = owl.convert(model.module, exampleinputs) model = torch.nn.DataParallel(model, deviceids=[0, 1])PTQ or QAT
owl.export(model.module) ```
Feeding data with dynamic shape after OwLite conversion
- You must feed inputs of the same shapes as the ones fed to owl.convert when you run training / inference for the model returned by owl.convert. Otherwise, you might see a runtime error as follows:
RuntimeError: shape '[8, 128, 12, 64]' is invalid for input of size 393216
This error depends on the models, i.e., you may not encounter the above error even if you feed data with different batch sizes or shapes into the models.
For example, the option
drop_last = True
is recommended to use if data are fed withtorch.utils.data.DataLoader
, because the last batch generated fromtorch.utils.data.DataLoader
may have the different batch size thanbatch_size
.
Models with training mode
- If converting a model with
owlite.convert
API in training mode (i.e.model.train()
), you may get an unexpected model. - For example, the nodes related to
nn.Dropout
or stochastic depth which should be used only in training might be included in the onnx model and TensorRT engine. It will result in unexpected behaviors in inference. - To avoid this error, we suggest to convert a model after changing the status of the model with evaluation mode (i.e.
model.eval()
)
2. Known issues on TensorRT benchmark
Models with loss function in forward method
- In case of models including loss function (i.e.
nn.MSELoss
,nn.CrossEntropyLoss
,torch.nn.functional.mse_loss
) in forward method, you will encounter the failure to build the TensorRT engine while runningowlite.benchmark
. - To avoid this failure, loss functions should be not be included in its forward method.
- Similarly, if you encountered the TensorRT failure to build due to unsupported onnx nodes in TensorRT, we suggest excluding those nodes from the model.
Restriction in INT8 MHA modules
- If a model contains multi-head attention modules, its built TensorRT engine may use
mha
kernel. - However, if the sequence length (
seq_len
) of any query-key multiplication (i.e. the row or column dimension of an attention matrix) in the model is larger than 512, and INT8 Quantization is applied to anymatmul
node, its TensorRT engine will encounter the slower latency than that of baseline FP16 engine. It is because INT8mha
kernel is supported in case ofseq_len <= 512
to the best of our knowledge. - To avoid this issue, we recommend that you do not apply quantization to any
matmul
node in the model if it includes any multi-head attention module withseq_len > 512