香港城市大學 語言資訊科學研究中心
如何学好英语发音Language Information Sciences Rearch Centre
City University of Hong Kong
folks
boc 共時語料庫
上海火星时代
Synchronous Corpus
我心目中的语文老师切詞規則 Segmentation Guidelines
註 Notes:
1. LIVAC文本照錄各報章電子版原文,原則上不作改動,由於各地文書處理習慣不同,因此文本中的非漢字字元及標點符號未有統一以單位元字元维持英文nite (single-byte character) 或雙位元字元valintine (double-byte character) 表示。
汽车销售话术
In principle LIVAC collects the electronic version of newspaper texts in their originals. However, there are different local conventions with respect to non-Chine characters and punctuation marks, which can therefore appear as single-byte or double-byte characters in the corpus texts.
2. 有關切詞規則第十項(非漢字部份),LIVAC有理数混合运算练习题原有特殊符號標示切詞,現為符合是次切詞比賽要求的格式,已將該等符號刪除,因此原來視為一詞的非漢字短句會被切分,唯此改變應不會影響一般切詞系統的運作及結果評估。
To comply with the formatting requirement of the current bakeoff, the original word delimiters in LIVAC have been removed. Thus some non-Chine word strings which the following original guidelines treat as one word will now appear as gmented words in the corpus. For example in 10.9 below, <Supreme Governor of the Church of England>, instead of one unit, will now appear as ven parate words. This change, however, should not affect the operation of gmentation systems in general and the asssment of the gmentation results.