python图片识别表格_OCRTable-从包含表格的扫描图片中识别表格和文字

更新时间:2023-05-19 11:10:34 阅读: 评论:0

端午节的python图⽚识别表格_OCRTable-从包含表格的扫描图⽚中识
别表格和⽂字
OCR Table
二十五史
Introduction
For scanning copies containing tables or forms, many OCR softwares recognize text in entire page as whole by discarding all tables. Sometimes it is inconvenient for urs. This project retains table structures as well and save the recognizing
result as a Microsoft Word document.
This project consists of a DLL and an EXE, both of which are 64-bit. The subdirectory corresponding to DLL is tableocr, developed by Visual C++. It implements core functions, including table structure recognition and text recognition. the subdirectory corresponding to EXE is ocrtable, developed by C#, which provides ur interface. Below the pictures directory are sample scanning copies.
Suggestions are welcome. In addition to submitting an issue, you can email me as well. My email address is
天下无不散之筵席
最想要的礼物Recognition Performance
English Character Recognition Example
Plea do not lect "Recognize simplified Chine characters".
Scanning copy:
Result (Note that tables are not displayed in order):
Chine Character Recognition Example
You need lect "Recognize simplified Chine characters".
肠胃不好能喝茶吗Scanning copy:
Result:
Chine character recognition relies on Tesract official pre-training package, which supports only a few fonts. Urs can consider training Tesract mannually or using other OCR technologies instead.
Development Environment
DLL Development environment
Windows 7 SP1 x64
Visual Studio Community 2017
OpenCV 3.4.3
Tesract 4.0.0-beta.4 (Compiled by Git source. Plea arch online resources to learn how to tup Chine character recognition.)
For the convenience of debugging, the DLL module includes Debug EXE configuration, which outputs EXE. The program displays table structures and outputs recognized text by OutputDebugString Windows API. Note that recognition process may take long time, and the popup window needs to be clod by keyboard instead of mou.花的折法
EXE Development environment
Windows 7 SP1 x64
好好学习天天向上英文Visual Studio Community 2017
DocX(Xceed.Words.dll)(downloaded by nuget) Revision History
2004年属什么
2018-09-30
Complete the first edition.
2019-09-14
Fix bugs in DLL.
Add international support in EXE.
Update this document.

本文发布于:2023-05-19 11:10:34,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/916810.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:表格   扫描   礼物   肠胃
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图