Simple_query_language语料库检索语言(特适用于BNC)

更新时间:2023-05-21 12:16:51 阅读: 评论:0

This quick reference gives a conci overview of the most commonly needed features of Simple Query Syntax; e Chapter 6 of Hoffmann et al. (2008) for a comprehensive reference and tutorial. Query expressions that you can enter in BNCweb's arch box are printed in typewriter font, followed by an arrow and the matching words or word quences in italics (e.g. st?ing!sting, stung). Basic word form arches
•To arch for word forms, simply type them into the query field and click [Start query]:  glitterati!glitterati
•U wildcards for unspecified letters, and prefix or suffix arches:
?  for a single arbitrary character
s?ng!sing, sang, song, …
*  for zero or more characters
*able!able, table, capable, suitable, available, …
+  for one or more characters
+able!table, capable, suitable, … but not able
??+  for three or more characters, etc.
??+able!capable, … but not able, table, unable, stable •Combine multiple wildcards:  *oo+oo*!Voodoo, schoolroom, …
•Protect wildcards and other metacharacters with backslash \ to match the literal character (called "escaping" the metacharacter):
\?!?
?!a, b, c, …, A, B, C, …, 1, 2, 3, …, ., !, ?, …
Simple Query Syntax us the following metacharacters:
? * + , : @ / ( ) [ ] { } _ - < >
•List comma-parated alternatives (optionally including wildcards) in square brackets:
??+[able,ability]!capable, capability, availability, …
neighbo[u,]r!neighbour, neighbor
•Searches are ca-innsitive by default: the queries bath, Bath and BATH find the same matches (viz. the three word forms bath, Bath and BATH). Set the "Query mode" drop-down menu to "Simple query (ca-nsitive)" to distinguish between AIDS and aids,for example.
•U :d modifier to ignore accents: fiancee:d!fiancée, fiancee (for details, e Hoffmann et al. 2008, Section 6.10 and Appendix 4). Matching parts-of-speech (POS)
•Search for a word form with a specific POS tag by linking them with an underscore _. Wildcards can be ud both for word form and POS tag:
lights_NN2!  plural noun lights, but not the verb form lights
*ly_AJ0!  adjectives ending in -ly (e.g. daily)
super+_V*!  verb forms starting with super-
•You can also arch by POS tag only: _PNX! any reflexive pronoun •Complete listing of POS tags ud in the BNC is given on last page.
•U simplified POS tags enclod in curly braces: super+_{VERB} for verb forms starting with super- (no wildcards allowed in simplified tags).
•List of simplified POS tags (Table 3.8 of Hoffmann et al. (2008) shows comparison with full tagt):
A, ADJ adjective INT, INTERJ interjection
morgan stanleyN, SUBST noun PREP preposition
V, VERB verb PRON pronoun
ADV adverb $, STOP punctuation
ART article UNC other / uncertain
CONJ conjunction
•Keep in mind that part-of-speech tags have been assigned by an automatic software tool and are not always correct ( beer_{N} can_{N}).
when christmas come to town
Headword and lemma queries
•Search by headword, enclod in curly braces: {light} finds the forms light, lights, lit, lighted, lighting, lighter and lightest(but not the nouns lighting and lighter).
•In BNCweb, the lemma is a combination of headword and simplified POS tag, parated by a slash /. A lemma query between the noun, verb and adjective reading of LIGHT:
{light/V} !light, lights, lit, lighted, lighting (tagged as verb)
{light/N}!light, lights (tagged as noun)
{light/A}!light, lighter, lightest (tagged as adjective)
Word quences
•Queries can consist of multiple words, e.g. talk of the town
•All words and punctuation symbols ("tokens") are parated by blanks;
posssives (Peter's) and contracted forms (they've, gonna) must be split: he will \, wo n't he \?!he will, won't he?
•Each query item in a quence can make full u of wildcards, part-of-speech constraints, and headword or lemma arches:
{number/N} of _{A} _NN2!numbers of younger men, …
•U +to skip an arbitrary token, or *for an optional token. Combine + and * for larger gaps, e.g. +++** to skip between 3 and 5 tokens.卸妆技巧
{eat} * up!eat up, ate up, eat it up, eaten all up, …
xch
{eat} + up!eat it up, eaten all up, … but not eat up, ate up
{eat} ++* up!up at a distance of 3 or 4 tokens after eat Advanced lexico-grammatical patterns
•U regular expression notation (Hoffmann et al. 2008, Sections 6.8 and
12.4) for alternatives, optional elements and repetition within a quence:
(_{A})?optional adjective
(_{A})*zero or more adjectives (optional)
(_{A})+one or more adjectives (non-optional)
(_{A}){2,4}between two and four adjectives
(…|…|…)matches one of the alternatives indicated by …
(…|…|…)*alternatives with repetition (optional)
(…|…|…)+alternatives with repetition (non-optional)
(…|…|…){2,4}between two and four repetitions of the given
alternatives (may be mixed in any order) •Regular expression notation can be nested to match complex patterns:
the (most _AJ0 | _AJS) {man}
!the biggest men, the most attractive man, …
the (most (_AV0)? _AJ0 | (_AV0)? _AJS) {man}
! plus: the very richest men, the most supremely stupid men, …
•Complex syntactic patterns can be formed, e.g. for a prepositional phra: _{PREP} (_{ART})? ((_{ADV})? _{A})* _{N} "a preposition; followed by an optional article; followed by any number of adjectives (zero or more), each of which may optionally be preceded by
an adverb; followed by a noun"XML tags
don jon•XML start and end tags can be inrted in query expression to match the boundaries of a region, e.g. the start (<s>) or end (</s>) of an s-unit: <s> but!  s-unit beginning with but (or But)
_{ART} </s>!  article at end of s-unit (mostly errors) •To match a complete region, skip all tokens between the start and end tag: <quote> (+)+ </quote>!  list of all quotations in the BNC
<mw> (+)+ </mw>!  list of all multiword units  •Some uful XML tags in the BNC:
<s> … </s>s-unit
<p> … </p>paragraph
<u> … </u>speaker turn
<head> … </head>heading or caption
<quote> … </quote>quotation
<item> … </item>list item
<hi> … </hi>highlighted text
<mw> … </mw>multiword unit
Proximity queries
•Special syntax for arching one item within a specified range of another: kick <<s>> bucket!kick and bucket in the same ntence
{kick/V} <<s>> bucket_NN1 (can u POS/lemma constraints)
day <<3>> night!day and night within range of  3 tokens
day <<5<< night!night … day (within 5 tokens)
day >>5>> night!day … night (within 5 tokens) •Only the left element ("target") will be highlighted on the result page. The right element is considered as a "constraint" that must be satisfied.
•Multiple constraints can be chained:
{day} <<5>> {month} <<5>> {year}
In this ca, day must co-occur with month as well as year in a 5-token
window; only day will be highlighted on the Query result page.
•Proximity queries can be nested with parenthes:
{waste/V} <<s>> (time <<3>> money)
Here, the verb waste must co-occur with time as well as money in the
same ntence; but time and money must be clor together (within a 3-
token window). Again, only instances of waste will be highlighted.
•Proximity queries cannot be combined with lexico-grammatical patterns!
List of part-of-speech tags in the BNC (CLAWS-5 tagt)
八年级英语下册教学计划
Tag  Description
AJ0  Adjective (general or positive) (e.g. good, old, beautiful)
AJC  Comparative adjective (e.g. better, older)
AJS  Superlative adjective (e.g. best, oldest)
AT0  Article (e.g. the, a, an, no)
AV0  General adverb: an adverb not subclassified as AVP or AVQ (e below) (e.g. often, well, longer (adv.), furthest)
AVP  Adverb particle (e.g. up, off, out)
AVQ  Wh-adverb (e.g. when, where, how, why, wherever)
CJC  Coordinating conjunction (e.g. and, or, but)
CJS  Subordinating conjunction (e.g. although, when)
CJT  The subordinating conjunction that
CRD  Cardinal number (e.g. one, 3, fifty-five, 3609)
DPS  Posssive determiner-pronoun (e.g. your, their, his)
DT0  General determiner-pronoun: i.e. a determiner-pronoun which is not a DTQ or an AT0.
DTQ  Wh-determiner-pronoun (e.g. which, what, who, whichever)
EX0  Existential there, i.e. there occurring in the or
construction
鹅蛋脸适合什么发型ITJ  Interjection or other isolate (e.g. oh, yes, mhm, wow)
NN0  Common noun, neutral for number (e.g. aircraft, data, committee) NN1  Singular common noun (e.g. pencil, goo, time, revelation)
NN2  Plural common noun (e.g. pencils, gee, times, revelations)
NP0  Proper noun (e.g. London, Michael, Mars, IBM)
ORD  Ordinal numeral (e.g. first, sixth, 77th, last) .
PNI  Indefinite pronoun (e.g. none, everything, one (as pronoun), nobody) PNP  Personal pronoun (e.g. I, you, them, ours)
PNQ  Wh-pronoun (e.g. who, whoever, whom)
PNX  Reflexive pronoun (e.g. mylf, yourlf, itlf, ourlves)
儿童妆POS  The posssive or genitive marker 's or '
PRF  The preposition of
PRP  Preposition (except of) (e.g. about, at, in, on, with)
PUL  Punctuation: left bracket, i.e. (  or  [
五杀英文
PUN  Punctuation: general parating mark ( . , !  : ;  –  and ?)
PUQ  Punctuation: quotation mark (' and ")
PUR  Punctuation: right bracket, i.e. ) or ]
TO0  Infinitive marker to
UNC  Unclassified items which are not appropriately considered as items of the English lexicon.
are, 'm, 're and be (subjunctive or imperative)
VBD  The past ten forms of the verb BE: was and were
VBG  The -ing form of the verb BE: being
VBI  The infinitive form of the verb BE: be
VBN  The past participle form of the verb BE: been
VBZ  The -s form of the verb BE: is, 's
VDB  The finite ba form of the verb DO: do
VDD  The past ten form of the verb DO: did
VDG  The -ing form of the verb DO: doing
VDI  The infinitive form of the verb DO: do
VDN  The past participle form of the verb DO: done
VDZ  The -s form of the verb DO: does, 's
VHB  The finite ba form of the verb HAVE: have, 've
VHD  The past ten form of the verb HAVE: had, 'd
VHG  The -ing form of the verb HAVE: having
VHI  The infinitive form of the verb HAVE: have
VHN  The past participle form of the verb HAVE: had
VHZ  The -s form of the verb HAVE: has, 's
VM0  Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)
VVB  The finite ba form of lexical verbs, comprising the indicative, imperative and prent subjunctive (e.g. forget, nd, live, return)
VVD  The past ten form of lexical verbs (e.g. forgot, nt, lived, returned) VVG  The -ing form of lexical verbs (e.g. forgetting, nding, living, returning)
VVI  The infinitive form of lexical verbs (e.g. forget, nd, live, return)
VVN  The past participle form of lexical verbs (e.g. forgotten, nt, lived, returned)
VVZ  The -s form of lexical verbs (e.g. forgets, nds, lives, returns)
XX0  The negative particle not or n't
ZZ0  Alphabetical symbols (e.g. A, a, B, b, c, d)
References少年派的奇幻漂流 下载
Hoffmann, Sebastian; Evert, Stefan; Smith, Nicholas; Lee, David; Berglund Prytz, Ylva (2008). Corpus Linguistics with BNCweb – a Practical Guide. Volume 6 of English Corpus Linguistics. Peter Lang, Frankfurt am Main.

本文发布于:2023-05-21 12:16:51,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/117042.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:技巧   教学计划   适合   奇幻   卸妆   下册   发型
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图