otcws-customized增量训练

下载源码v3.4

https://ltp.readthedocs.io/zh_CN/v3.4.0/install.html

编译

安装visual studio 2017 cmake 选择visual studio 2017编译器 visual studio 2017 release发布 运行

增量训练

D:xxprogramltp-3.4.0	ools	rainRelease

cmd

D:xxprogramltp-3.4.0	ools	rainRelease>otcws.exe customized-learn --baseline-model D:xxprogramltp_data_v3.4.0cws.model --model D:xxprogramltp_data_v3.4.0mycws.model --reference D:xxprogramltp-3.4.0	ools	rainsamplesegexample-train.seg --development D:xxprogramltp-3.4.0	ools	rainsamplesegexample-holdout.seg

参考 https://github.com/HIT-SCIR/ltp-cws

代码

# 个性化分词
from pyltp import CustomizedSegmentor

customized_segmentor = CustomizedSegmentor()
# 加载模型,第二个参数是您的增量模型路径
customized_segmentor.load(cws_model_path, /path/to/your/customized_model)
words_2 = customized_segmentor.segment(你会机器学习中的随机森林吗)

print(	.join(words_2))
customized_segmentor.release()
# 个性化模型使用外部字典
customized_segmentor_1 = CustomizedSegmentor()  # 初始化实例
customized_segmentor_1.load_with_lexicon(cws_model_path, /path/to/your/customized_model, /path/to/your/lexicon) # 加载模型
words_3 = customized_segmentor_1.segment(你会机器学习中的随机森林吗)

print(	.join(words_3))
customized_segmentor.release()
经验分享 程序员 微信小程序 职场和发展