A Thesis Submitted in Partial Fulfillment of the Requirements
全球化的影响
for the Degree of Master of Engineering
Study on Key Techniques in Archive Digitalization
Candidate : Xiao Feiconsiderdoing
Major : System Engineering
Supervisor : Assoc Prof. Zhu Mingfu
Huazhong University of Science and Technology
Wuhan, Hubei 430074, P. R. China
June, 2008
英语词源控制
意趣
独创性声明
本人声明所呈交的学位论文是我个人在导师的指导下进行的研究工作及取得的研究成果。尽我所知,除文中已标明引用的内容外,本论文不包含任何其他人或集体已经发表或撰写过的研究成果。对本文的研究做出贡献的个人和集体,均已在文中以明确方式标明。本人完全意识到本声明的法律结果由本人承担。
学位论文作者签名:
日期:
年 月 日
学位论文版权使用授权书
本学位论文作者完全了解学校有关保留、使用学位论文的规定,即:学校有权保留并向国家有关部门或机构送交论文的复印件和电子版,允许论文被查阅和借阅。本人授权华中科技大学可以将本学位论文的全部或部分内容编入有关数据库进行检索,可以采用影印、缩印或扫描等复制手段保存和汇编本学位论文。
保密□,在______年解密后适用本授权数。
before本论文属于
不保密□。 (请在以上方框内打“√”)
skydrive是什么学位论文作者签名: 指导教师签名:
日期: 年
月 日 日期: 年 月 日
华中科技大学硕士学位论文
摘要
近年来,档案处理技术已经朝着数字化、信息化、网络化的方向飞速发展。传统的纸质档案处理方法在一定程度上限制了档案信息的共享与查询,浩淼如烟的档案给“三化”提出了新的挑战。针对档案信息化中的两项关键技术:件符识别技术与检索映像技术,本文从理论基础、应用方式、模拟仿真分析三个步骤对档案的数字加工与应用系统展开了全面和深入的论述。
件符识别技术是整个流程的基础和核心。件符识别技术以模式分类和神经网络技术为核心;以档案扫描图像处理技术为基本工作原理;以件符作为标识档案文件之间的分隔符;以人工预处理技术预先将档案和指定文件对应好为保障,成功的实现了取代原始条形码信息;降低了档案数据库的高度冗余,提高了查询效率;给后期的检索映像环节及其它的档案应用创造了条件和基础。
try
检索映像技术是整个流程的目的和归宿。.此次成功的利用图片识别的方法将全文检索技术应用于档案信息检索中,以文本数据挖掘技术为基础,提出了档案间的相关度的概念,使得不同档案之间的自动聚类成为可能,并以不同优先级的形式,呈现在用户面前不同级别的搜索结果。另外提出一种基于语义的智能分词技术解决了中文模糊查询时自动分词理解的问题,从而更加便利的实现了查全、查准、查快的检索目标,是网络搜索技术在档案信息检索中的一次成功的应用。
针对原有档案信息系统中的存在问题,结合上述关键技术为理论基础,本文提出了在平台下的MVC模式和三层体系架构的设计理念,便于日后的维护和功能的扩展。同时提出了两种远程客户端的实现模式,以及可配置的数据访问技术,为将来系统向大型分布式档案信息平台打下了基础。
关键词:件符识别;检索映像;全文检索;汉语分词
华中科技大学硕士学位论文
Abstractbombshell
In recent years, archive-processing technology has been developing unimaginably towards the direction of digital, informational and networking at very fast speed. Traditional paper-bad archival processing methods to some extent limit the sharing of files and information inquiries,large qualitie
s of archive brings about new challenge to the trend. Focud on two key techniques in archive information process: symbol recognition and arch mapping, a comprehensive and in-depth discuss was made in this thesis from three aspects as theoretical foundation, application methods, and analysis simulation.
Symbol recognition is the ba and core of the entire process. In traditional barcode-bad information recognition application, massive archive files burden archive workers as well as barcode attaching is a rather complex work and error prone. Meanwhile, barcode undermine the original appearance of the files. Symbol recognition technique made full u of pattern classification and neural network as core technique,file scanning image processing technique as the basic principle, symbol as parator between two files, manual preprocessing to guarantee correspondence, which successfully replaced original barcode, lowed down redundancy in archive databa, improved efficiency of the inquiries, and brought out considerable convenience to the following step: Search mapping.
Search mapping is the goal and end-result of the whole process. Traditional paper-bad archival retrieval method is no doubt of low efficiency. When facing large amounts of unrelated data, like Internet Web information retrieval, archive information retrieval is also more and more challenged. A
pply modern internet retrieval technique into archive information retrieval; make full u of text mining as basis, brought forward the concept of correlation degree between archives, which makes automatic clustering between archives possible to follow. I t’s also a successful application of network arching technique into archive information retrieving.
Via modeling and simulation to the application technique, real archive data was made
华中科技大学硕士学位论文
ud of as training samples, testing result was exported, integrated evaluation indicators of this system was also established, which facilitated optimizing the system. In the end, a full summarization and conclusion of the key techniques was made, mentioned where should be ameliorated, and a solid foundation for the next step: establishing distributed sharing archive information platform was in the meantime constituted.
Keywords: symbol recognition; arch mapping; full text arch; word gmentation俄罗斯世界杯主题曲
auspice