司考香港城市大學 語言資訊科學研究中心
Language Information Sciences Rearch Centre
x86是多少位City University of Hong Kong
面试题目大全
共時語料庫
考试大Synchronous Corpus
切詞規則 Segmentation Guidelines
註 Notes:
1. LIVAC文本照錄各報章電子版原文,原則上不作改動,由於各地文書處理習慣不同,因此文本中的非漢字字元及標點符號未有統一以單位元字元 (single-byte character) 或雙位元字元 (double-byte character) 创建文明校园手抄报内容>三鲜莲花酥表示。
In principle LIVAC collects the electronic version of newspaper texts in their originals. However, there are different local conventions with respect to non-Chine characters and punctuation marks, which can therefore appear as single-byte or double-byte characters in the corpus texts.
三亚自由行2. 有關切詞規則第十項(非漢字部份),LIVAC原有特殊符號標示切詞,現為符合是次切詞比賽要求的格式,已將該等符號刪除,因此原來視為一詞的非漢字短句會被切分,唯此改變應不會影響一般切詞系統的運作及結果評估。
泥石流To comply with the formatting requirement of the current bakeoff, the original word delimiters in LIVAC have been removed. Thus some non-Chine word strings which the following original guidelines treat as one word will now appear as gmented words in the corpus. For example in 10.9 below, <Supreme Governor of the Church of England>, instead of one unit, will now appear as ven parate words. This change, however, should not affect the operation of gmentation systems in general and the asssment of the gmentation results.