python提取pdf中的所有图表

python提取pdf中的所有图表 2023-03-31 246

1、将我们的pdf转换成图片：

2、通过正则匹配到图表对应的位置：

for i in layout:
            if hasattr(i, get_text):
                text = i.get_text().strip()
                if zhongwenjiance.search(text) == None:
                    # print(zhongwenjiance.search(text), ===================================================)
                    continue
                # print(text, ---, len(re.sub( +, , text)))
                # 匹配关键词
                if re.search(rFig.d, text):
                    if panduan:
                        shifouduogetu = True
                        loc_top.append((value_bottom_bbox, text.split(
)[1]))
                    else:
                        loc_top.append((value, text.split(
)[1]))
                    loc_bottom.append((i.bbox, text))
                    value_bottom_bbox = i.bbox
                    value_bottom_text = text.split(
)[1]
                    panduan = True
                    # print(text)
                elif len(re.sub( +, , text)) > 100:
                    panduan = False
                    value = i.bbox
                    value_text = text

3、在原来的图片上进行截取：

4、效果图：

免费搭建微信查券返利机器人来轻松赚佣金

文章来自:IT技术分享网
分享地址:http://www.5ityx.cn/cate100/238418.html

上一篇： .gitignore 文件不生效问题 & 解决方法

下一篇： .gitignore与.git/info/exclude区别

python提取pdf中的所有图表

1、将我们的pdf转换成图片：

2、通过正则匹配到图表对 应的位置：

3、在原来的图片上进行截取：

4、效果图：

python提取pdf中的所有图表 相关内容

聚合标签

2、通过正则匹配到图表对应的位置：

python提取pdf中的所有图表相关内容