Pytorch模型通过TensorRT部署推理---HRNet
⼀、Pytorch模型转换成onnx格式
使⽤Pytorch⾃带的port函数即可将Pytorch模型转换成onnx格式。
images = Variable(images).float().cuda()
config = yaml.load(open('brick_hrnet.yaml'))
net = HighResolutionNet(config)
net.float().cuda()
net.eval()
net.load_state_dict(torch.load(config['checkout']))
# Export the model to an ONNX file
_grad():
output = port(net,
images,
'',
verbo=Fal)
print("Export of complete!")
⼆、安装TensorRT6.0
三、onnx⽣成TensorRT engine
按照以下命令⾏进⾏操作:
git clone --recur-submodules -b 6./onnx/onnx-tensorrt.git
cd onnx-tensorrt
# Update submodules
git submodule update --init --recursive
# Build
mkdir build &&cd build
英语口语学习培训
cmake .. -DCUDA_INCLUDE_DIRS=/usr/local/cuda/include/ -DTENSORRT_ROOT=/usr/lib/x86_64-linux-gnu
make -j8
sudo make install
# Update system config
sudo ldconfig
四、将onnx模型转换成trt格式
由于我们待转换的模型是HRNet,所以在第⼀步转换onnx格式的时候有以下需要注意的地⽅:不能出现.size,.shape的字眼(因为TensorRT是静态图!)
在HRNet.py原版的代码中,在HighResolutionModule的forward过程中的fu_layer环节,作者粗暴地将待fu的层bilinear上采样到与当前层相同的⼤⼩,其实有两种改法:1.在make_fu_layer的时候对于上采样的层指定好需要上采样的尺度,⽽不是不进⾏上采样(单纯的步长=1的卷积),但是这种改法需要重新训练⽹络;2.第⼆种改法就是每次构建stage的时候制定好每个branch的特征图⼤⼩,到时候在HighResolutionModule forward的过程中直接使⽤,⽽不是.shape获取;现在将第⼆种代码的更改⽰例如下:
只需关注#----------更改 开始-------------- #----------更改 结束--------------即可!
将HighResolutionNet中的_make_stage进⾏如下更改:
#这⾥实际上是制作单个stage
def_make_stage(lf, layer_config, num_inchannels,
multi_scale_output=True):
num_modules = layer_config['NUM_MODULES']
num_branches = layer_config['NUM_BRANCHES']
num_blocks = layer_config['NUM_BLOCKS']
num_channels = layer_config['NUM_CHANNELS']
block = blocks_dict[layer_config['BLOCK']]
fu_method = layer_config['FUSE_METHOD']
modules =[]
for i in range(num_modules):
# multi_scale_output is only ud last module
if not multi_scale_output and i == num_modules -1:
ret_multi_scale_output =Fal
el:
ret_multi_scale_output =True
#这⾥是在构建⾼分辨率的单个分⽀
#这⾥姑且认为x的分辨率是branch的个数挂钩的
#----------更改开始--------------
x_shape=[]
if(num_branches==2):
x_shape=[[512,624],[256,312]]
elif(num_branches==3):
x_shape =[[512,624],[256,312],[128,156]]
elif(num_branches==4):
x_shape =[[512,624],[256,312],[128,156],[64,78]] el:
print("error mzy")
modules.append(
HighResolutionModule(num_branches,
block,
num_blocks,
num_inchannels,
num_channels,
fu_method,
x_shape,
ret_multi_scale_output)
)
#----------更改结束--------------
num_inchannels = modules[-1].get_num_inchannels() return nn.Sequential(*modules), num_inchannels
将HighResolutionModule中的forward更改如下:
def forward(lf, x):
#如果该stage只有⼀个分⽀的话,是不需要融合的
if lf.num_branches ==1:
return[lf.branches[0](x[0])]
bestir
#如果该stage 有多个分⽀的话,针对每⼀个branch各⾃进⾏传输
for i in range(lf.num_branches):
x[i]= lf.branches[i](x[i])
#以下是融合层
x_fu =[]
for i in range(len(lf.fu_layers)):
y = x[0]if i ==0el lf.fu_layers[i][0](x[0])
违约责任英文for j in range(1, lf.num_branches):
if i == j:
y = y + x[j]
elif j > i:
生活大爆炸 第六季#----------更改开始--------------
#modified by mzy
y = y + F.interpolate(
lf.fu_layers[i][j](x[j]),
#size=[x[i].shape[2], x[i].shape[3]],
size=lf.x_shape[i],
mode='bilinear')
#----------更改结束--------------
el:
y = y + lf.fu_layers[i][j](x[j])
x_fu.lu(y))
return x_fu
然后在HighResolutionModule init函数的末尾加上:
lf.x_shape=x_shape
更改完HRNet.py之后,我们使⽤步骤⼀的⽅法将其打包成onnx格式;
然后,在安装好的onnx2trt环境下运⾏(也就是build⽬录下):
onnx2trt hrnet_add_320_and_ -o hrnet_add_320_and_ -b 1
其中-b 后⾯的数值代表max batch size,可根据⾃⾝需求调整
运⾏上述命令之后,我们可以得到如下结果:
mzy@mzy-Precision-3630-Tower:~/TensorRT/onnx-tensorrt/build$ onnx2trt hrnet_add_320_and_ -o hrnet_add_320_and_ -b 1 ----------------------------------------------------------------
Input filename: hrnet_add_320_and_
ONNX IR version: 0.0.4
Opt version: 9
Producer name: pytorch
Producer version: 1.3
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parr was built against (0.0.3).
Parsing model
Building TensorRT engine, FP16 available:0
Max batch size: 1
Max workspace size: 1024 MiBtbt
[2021-03-31 01:41:13 WARNING] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
centimeter
[2021-03-31 01:44:01 WARNING] TensorRT was linked against cuBLAS 10.2.0 but loaded cuBLAS 10.1.0
Writing TensorRT engine to hrnet_add_320_and_
All done
此时build⽬录下会⽣成对应的trt⽂件;
这⾥需要注意的是:
onnx2trt有以下参数,我们可以重点指定max_batch_size\model_data_type_bit_depth\max_workspace_size_bytes,
如果显⽰FP16 available:0,代表不是使⽤的float16或者说不⽀持float16,使⽤的是float32,这个精度会影响到后续我们加载数据到指针内存,所以这⾥需要注意⼀下。
Usage: onnx2trt onnx_model.pb
[-o ](output TensorRT engine)
[-t onnx_model.pbtxt](output ONNX text file without weights)
[-T onnx_model.pbtxt](output ONNX text file with weights)
[-b max_batch_size (default 32)]
[-w max_workspace_size_bytes (default 1 GiB)]
[-d model_data_type_bit_depth](32 => float32, 16 => float16)
[-l](list layers and their shapes)
[-g](debug mode)
气缸套[-v](increa verbosity)
[-q](decrea verbosity)
[-V](show version information)
[-h](show help)
托福是什么
五、TensorRT+CLion加载推理模型
加载TensorRT库⽂件
不同于以往的库,在cmakelist⾥⾯加载是加载头⽂件和库⽂件路径,这⾥直接find_library搜索库⽂件
cmake_minimum_required(VERSION 3.16)
project(TensorRT_HRNet)
t(CMAKE_CXX_STANDARD 11)
t(OpenCV_DIR /home/mzy/workspace/opencv-3.4/build)
find_package( OpenCV 3 REQUIRED )
INCLUDE_DIRECTORIES(${OpenCV_INCLUDE_DIRS})
find_package(CUDA REQUIRED)
INCLUDE_DIRECTORIES(${CUDA_INCLUDE_DIRS})
find_library(NVINFER NAMES libnvinfer.so)
find_library(NVPARSERS NAMES nvparrs)
find_library(NVONNXPARSERS NAMES nvonnxparr)
if(NVINFER)
message("TensorRT is available!")
message("NVINFER: ${NVINFER}")
message("NVPARSERS: ${NVPARSERS}")
message("NVONNXPARSERS: ${NVONNXPARSERS}")
t(TRT_AVAIL ON)
el()
message("TensorRT is NOT Available")
t(TRT_AVAIL OFF)
口译翻译
endif()
add_executable(TensorRT_HRNet BrickDetect.h BrickDetect.cpp main3.cpp)
target_link_libraries( TensorRT_HRNet
${OpenCV_LIBS}
${CUDA_LIBRARIES}
${NVINFER}
${NVPARSERS}
${NVONNXPARSERS}
${TensorRT_LIBRARIES}
)
使⽤TensorRT load 已经转好的trt engine执⾏推理获取结果主要可分为以下⼏个步骤:
1.加载trt engine,并反序列化
2.加载数据,根据需要进⾏前置处理,然后将数据拷贝到指针数组中传递给gpu;
3.执⾏推理过程;
4.获取推理结果,并将其从gpu指针拷贝到cpu内存上,⽅便下⼀步的后处理;
5.执⾏后处理;
下⾯我们以HRNet为例,记录上述过程
加载trt engine,并反序列化
//const std::string engine_name ---为trt模型的存储路径
//TRTUniquePtr<nvinfer1::ICudaEngine>& engine ------ TRTUniquePtr<nvinfer1::ICudaEngine> engine{nullptr};为空的engine,⽤于返回创建好的engine //TRTUniquePtr<nvinfer1::IExecutionContext>& context ---- TRTUniquePtr<nvinfer1::IExecutionCont
ext> context{nullptr};空的上下⽂,同样⽤于返回创建好的上下⽂
void BrickDetect::derializeEngineModel(const std::string engine_name,
TRTUniquePtr<nvinfer1::ICudaEngine>& engine,
TRTUniquePtr<nvinfer1::IExecutionContext>& context){
std::ifstream in_file(engine_name.c_str(), std::ios::in | std::ios::binary);
if(!in_file.is_open()){
std::cerr <<"ERROR: fail to open file: "<< engine_name.c_str()<< std::endl;
exit(1);
}
std::streampos begin, end;
begin = llg();
shanshanin_file.ekg(0, std::ios::end);
end = llg();
size_t size = end - begin;
std::cout <<"engine file size: "<< size <<" bytes"<< std::endl;
in_file.ekg(0, std::ios::beg);
std::unique_ptr<unsigned char[]>engine_data(new unsigned char[size]);
ad((char*)(), size);
in_file.clo();
// derialize the engine
nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(gLogger);
<(runtime->derializeCudaEngine((const void*)(), size,nullptr));
<(engine->createExecutionContext());
}
加载数据,根据需要进⾏前置处理,然后将数据拷贝到指针数组中传递给gpu