Triton_server部署学习笔记
下载镜像 docker pill http://nvcr.io/nvidia/tritonserver:22.07-py3
docker run --gpus all -itd -p8000:8000 -p8001:8001 -p8002:8002 -v /home/ai-developer/server/docs/examples/model_repository/:/models nvcr.io/nvidia/tritonserver:22.07-py3
docker exec -it a5bc bash
tritonserver --model-repository=/models --strict-model-config=false
非必要config文件
支持的格式有TrnsorRT,TensorFLOW saved-model,ONNX models do not require config.pbtxt when --strict-model-config=false
config中,platfrom可填写Tensorrt_plan,onnxruntime_onnx,pytorch_libtorch backend tensorrt,onnxruntime,pytorch
dims: [ 3,-1,-1 ] -1代表可变维度
–model-control-model explicit
git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd data/squad/
下载数据集 sh squad_download.sh
模型映射文件地址
cd /models
下载演示模型的地址
https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_qa_squad11_amp
#粘贴wget命令 wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/bert_pyt_ckpt_large_qa_squad11_amp/versions/19.09.0/zip -O bert_pyt_ckpt_large_qa_squad11_amp_19.09.0.zip
转换格式
python3 triton/export_model.py –input-path triton/model.py –input-type pyt –output-path $/models/exported_model.onnx –output-type onnx –dataloader triton/dataloader.py –ignore-unknown-parameters –onnx-opset 13 ${FLAG} –config-file bert_configs/large.json –checkpoint /models/bert_large_qa.pt –precision fp16 –vocab-file /models/vocab.txt –max-seq-length 34 –predict-file /opt/tritonserver/DeepLearningExamples/PyTorch/LanguageModeling/BERT/data/squad/v1.1/dev-v1.1.json –batch-size 16