Chine Character Recognition: History, Status, and
Prospects
DAI Ruwei1, LIU Chenglin2, and XIAO Baihua1
1 Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chi-
ne Academy of Sciences, Beijing 10080, China
{ruwei.dai,baihua.xiao}@ia.ac
2 National Laboratory of Pattern Recognition, Institute of Automation, Chine Academy of
Sciences, Beijing 10080, China
liucl@nlpr.ia.ac
Abstract. Chine character recognition (CCR) is an important branch of pat-
tern recognition. It was considered as an extremely difficult problem due to the
very large number of categories, complicated structures, similarity between
characters, and the variability of fonts or writing styles. Becau of its unique
technical challenges and great social needs, the last four decades witnesd the
intensive rearch in this field and a rapid increa of successful applications.
However, higher recognition performance is continuously needed to improvedm是什么意思>自我介绍 英文
the existing applications and to exploit new applications. This paper first pro-
vides an overview of Chine character recognition and the properties of Chi-
ne characters. Some important methods and successful results in the history
of Chine character recognition are then summarized. As for classification
bionicmethods, this article pays special attention to the syntactic-mantic approach
technocracyfor online Chine character recognition, as well as the meta-synthesis ap-
proach for discipline crossing. Finally, the remaining problems and the possible
solutions to them are discusd.
1 Introduction
Chine character recognition is an important branch of pattern recognition [1-5]. The solution of this problem relies on many techniques in various fields: image processing, machine learning, cognitive science (noetic science), linguistics, etc. From the start of pattern recognition rearch in 1950s, character recognition has been a major test ca and a stimulator of pattern recognition methodology. At the first workshop on pattern recognition, held in Puerto Rico, US, 1966, about one third of papers were dealing with character recognition [6]. Approaches such as blurring [7], directional pattern matching [8,9], hierarchical classification [10,11] and multiple classifiers combina-tion [12,13] were first propod by the character recognition community, and later evolved into attractive rearch fields.
Character recognition systems contribute tremendously to the advance of the auto-mation process and can be of significant benefit to man-machine communication in many applications, such as postal mail sorting, business card reading, bank checks
and transaction forms processing, and recently, in digital libraries and mobile phones. Chine characters are ud by over 1.3 billion people in China and some other coun-tries or areas, but typing Chine characters into computers is not a trivial tasks. In China, many people cannot even u phonetic codes for character entry becau they habitually speak dialect and cannot pronounce Mandarin correctly. So, the automatic recognition of Chine characters would have widespread special benefits.
Chine character recognition was considered as an extremely difficult problem due to the very large number of categories, complicated structures, similarity between characters, and the variability of fonts or writing styles. Due to its unique technical challenges and great social needs, the last four decades witnesd the intensive re-arch in this field and a rapid increa of successful applications. This paper pro-vides a brief review of this field, outlines the important methods and advances, and discuss the potential future rearch directions.
The approaches of character recognition is dichotomized into online and offline depending on the hardware and application mode. It is called “online” if the temporal quence of pen trajectory (captured by, e.g., digitizing tablet) is available. The pen trajectory is immediately recognized after it is written, and the ur can respond to the recognition result (to correct the result or re-write). It is “of
fline” if to recognize pre-viously written text, which is converted to images using a scanner or a camera. This paper covers both online and offline Chine character recognition.
The rest of this paper is organized as follows: Section 2 describes the properties of Chine characters, Section 3 briefly reviews the history of Chine character recog-nition and the state of the art. Section 4 address from syntactic to syntactic-mantic approach and its applications to online Chine Characters Recognition. Section 5 discuss the discipline crossing between pattern recognition and systems science, as well as the resulting meta-synthesis approaches. Finally, ction 6 discuss the re-maining problems and the possibilities for solving them.
2 Properties of Chine Characters
Chine characters have unique structures compared to western characters and this uniqueness pos technical challenges to recognition. This ction summarizes the properties of Chine characters as follows.
2.1 Evolution of Chine Characters
Fig. 1 demonstrates the evolution of Chine characters. The origin of Chine char-acters can be tr
aced back to oracle script and script on bronze before 1000 BC. Offi-cial script was invented in Qin Dyanisty (about 220BC), and got popular in Han Dy-nasty. Its shape is very similar to the contemporary characters. Regular script, cursive script and fluent script were invented in late Han Dynasty (about 180AD). After that time, while spoken Chine varies across regions, the written Chine characters remain relatively stable. The regular script, cursive script and fluent script have been commonly ud until today. However, as you can e in Fig.2, the traditional Chine
82届奥斯卡获奖名单
characters have too many strokes. To ea writing, Chine government carried out Chine character reformation and published 2,235 simplified characters during 1956-1964. The average number of strokes for the 2,235 characters was reduced from 16.03 to 10.3. The simplified Chine characters together with the characters that were not simplified, come to be a standard for official communication across China.
Fig. 1. Examples of the evolution of Chine characters (from left to right: ‘sun’, ‘moon’, ‘vehicle’, and ‘hor’).
Fig. 2. Examples of traditional and simplified Chine characters (upper for traditional, lower for simplified).
2.2 Chine Character Set
Chine characters are ud in daily communications by over one quarter of world’s population, mainly in Asia, such as China, Korea, Japan, and Singapore. There are mainly three character ts: traditional Chine characters, simplified Chine charac-ters, and Japane Kanji [5].
配置英文
In Japan, 2,965 Kanji characters are included in the JIS level-1 standard and 3,390 Kanji characters are in the level-2 standard. Japane Kanji characters have mostly identical shape to the corresponding traditional Chine or simplified Chine.
In Taiwan of China, 5,401 traditional characters are included in a standard t. In the mainland of Chi
na, three character ts, containing 6,763, 20,902 and 27,533 Chine characters, respectively, were announced as the National Standards (e Table 1). The 6,763 characters in GB2312-80 covers 99.99% of usage, but still do not suffice. Especially, many characters ud in human names and place names are not included in this t. A general-purpo recognizer needs to cover about 9,000 simpli-fied characters, about 3,000 of which have different traditional shapes. In addition, about 1,000 symbols and special characters should be included. In experiments of academic rearch, usually 3,755 characters are considered. The very large number of
categories pos a technical challenge for efficient and accurate classification of Chi-ne characters.
Table 1. National standards of Chine character t. National standard Number of characters
Description GB2312-80 Level-1: 3,755
Level-2: 3,008
(Totally 6,763)
Simplified GBK 20,902
Simplified and Traditional GB18030-2000 27,533 Plus characters of minority
nationalities
学雅思要多久
2.3 Character Structures
Chine characters are ideographs with complicated structures. Many Chine charac-ters contain relatively independent substructures, called radicals, and some common radicals are shared by different characters. That is to say, a Chine character is com-pod of radicals, which are in turn compod of straight-line or poly-line strokes (e Fig. 3).
As far as we know, the most complicated Chine character has 36 strokes, e the bottom left of Fig. 3. The total number of radicals and single-component characters in Chine characters is about 500.
chemist
Fig. 3. Examples of Chine character structures (the right panel shows a complicated Chine character with five radicals, some of which can be further decompod ).
The pattern of Chine character structures can be roughly categorized into 10 types (single-radical, left-right, up-down, up-right, left-down, up-left-down, left-up-right, left-down-right, and enclosure), e Fig. 4. Some of the patterns can be further divided into sub-categories.
The structural complexity of Chine characters is a merit for recognition: it carries rich information for discriminating different characters. This hierarchical character-radical-stroke structure can be utilized in recognition to largely reduce the size of reference model databa and speed up recognition. However, the complexity of structures makes the structural description difficult.
performancesFig. 4. 10 types of Chine character structures (single-radical, left-right, up-down, up-left, up-right, left-down, up-left-down, left-up-right, left-down-right and enclosure).
Besides the large number of categories and the complexity of structures, there are many similar Chine characters which differ only slightly (e Fig. 5). The similar characters are hard to discriminate by computer recognizers.2013年四级答案
Fig. 5. Examples of similar Chine character pairs.
2.4 Writing Styles
The enormous writing styles of different persons can be roughly divided into three categories: regular script (also called handprint), fluent script, and cursive script. The intermediate style between regular and fluent is called fluent-regular, and the inter-mediate between fluent and cursive is called fluent-cursive [5]. Some examples of the three typical styles are shown in Fig. 6. We can e that the strokes of regular script are mostly straight-line gments. The fluent script has many curved strokes and, frequently, successive strokes are connected. In cursive script, some character shapes differ drastically from the standard shape.