Unicode Nearly Plain Text Encoding of Mathematics

更新时间:2023-06-18 11:24:46 阅读: 评论:0

Unicode Nearly Plain-Text Encoding of Mathematics
Version 3
Murray Sargent III
Publisher Text Services, Microsoft Corporation
10-Mar-10
1.Introduction (2)
2.Encoding Simple Math Expressions (3)
2.1Fractions (4)
2.2Subscripts and Superscripts (6)
2.3U of the Blank (Space) Character (7)
3.Encoding Other Math Expressions (8)
3.1Delimiters (8)
3.2Literal Operators (10)
3.3Prescripts and Above/Below Scripts (11)
3.4n-ary Operators (12)
3.5Mathematical Functions (13)
3.6Square Roots and Radicals (13)
3.7Enclosures (14)
3.8Stretchy Characters (15)
3.9Matrices (16)
3.10Accent Operators (16)
3.11Differential, Exponential, and Imaginary Symbols (17)
3.12Unicode Subscripts and Superscripts (18)
3.13Concatenation Operators (18)
3.14Comma, Period, and Colon (18)
3.15Ordinary Text Inside Math Zones (19)
3.16Space Characters (19)
3.17Phantoms and Smashes (21)
3.18Arbitrary Groupings (22)
3.19Equation Arrays (22)
3.20Math Zones (22)
3.21Equation Numbers (23)
3.22Linear Format Characters and Operands (23)
3.23Equation Breaking and Alignment (26)
3.24Size Overrides (26)
4.Input Methods (27)
4.1Character Translations (27)
4.2Math Keyboards (29)
4.3Hexadecimal Input (29)
4.4Pull-Down Menus, Toolbars, Context Menus (29)
4.5Macros (30)
4.6Linear Format Math Autocorrect List (30)
4.7Handwritten Input (30)
5.Recognizing Mathematical Expressions (31)
6.Using the Linear Format in Programming Languages (32)
6.1Advantages of Linear Format in Programs (33)
6.2Comparison of Programming Notations (34)
6.3Export to TeX (36)
7.Conclusions (37)
Acknowledgements (37)
Appendix A. Linear Format Grammar (38)
Appendix B. Character Keywords and Properties (39)
Version Differences (48)
References (48)
1.Introduction
Getting computers to understand human languages is important in increasing the utility of computers. Natural-language translation, speech recognition and gen-eration, and programming are typical ways in which such machine comprehension plays a role. The better this comprehension, the more uful the computer, and hence there has been considerable current effort devoted to the areas since the early 1960s. Ironically one truly international human language that tends to be ne-glected in this connection is mathematics itlf.
With a few conventions, Unicode1 can encode many mathematical expressions in readable nearly plain text. Technically this format is a “lightly marked up format”; hence the u of “nearly”. The format is linear, but it can be displayed in built-up prentation form. To distinguish the two kinds of formats in this paper, we refer to the nearly plain-text format as the linear format and to the built-up prentation format as the built-up format. This linear format can be ud with heuristics bad on the Unicode math properties to recognize mathematical expressions without the aid of explicit math-on/off commands. The recognition is facilitated by Unicode’s strong support for mathematical symbols.2Alternatively, the linear format can be ud in “math zones” explicitly controlled by the ur either with on-off characters as ud in TeX or with a character format attribute in a rich-text environment. U of math zones is desirable, since the recognition heuristics are not infallible.
The linear format is more compact and easy to read than [La]TeX,3,4or MathML.5 However unlike tho formats, it doesn’t attempt to include all typograph-ical embellishments. Instead we feel i t’s uful to handle some embellishments in the higher-level layer that handles rich text properties like text and background col-ors, font size, footnotes, comments, hyperlinks, etc. In principle one can extend the notation to include the properties of the higher-level layer, but at the cost of re-duced readability. Hence embedded in a rich-text environment, the linear format can faithfully reprent rich mathematical text, whereas embedded in a plain-text environment it lacks most rich-text properties and some mathematical typograph-ical properties. The linear format is primarily concerned with prentation, but it has some mantic features that might em to be only content oriented, e.g., n-
japane gayaryands and function-apply arguments (e Secs. 3.4 and 3.5). The have been in-cluded to aid in displaying built-up functions with proper typography, but they also help to interoperate with math-oriented programs.
Most mathematical expressions can be reprented unambiguously in the line-ar format, from which they can be exported to [La]TeX, MathML, C++, and symbolic manipulation programs. The linear format borrows notation from TeX for mathe-matical objects that don’t lend themlves well to a mathematical linear notation, e.g., for matrices.johnny hallyday
玄奥的意思A variety of syntax choices can be ud for a linear format. The choices made in this paper favor a number of criteria: efficient input of mathematical formulae, suffi-cient generality to support high-quality mathematical typography, the ability to round trip elegant mathematical text at least in a rich-text environment, and a for-mat that rembles a real mathematical notation. Obviously compromis between the goals had to be made.
The linear format is uful for 1) inputting mathematical expressions,6 2) dis-playing mathematics by text engines that cannot display a built-up format, and 3) computer programs. For more general storage and interchange of math expressions between math-aware programs, MathML and other higher-level languages are pre-ferred.
Section 2 motivates and illustrates the linear format for math using the fraction, subscripts, and superscripts along with a discussion of how the ASCII space U+0020 is ud to build up one construct at a time. Section 3 summarizes the usage of the other constructs along with their relative precedences, which are ud to simplify the notation. Section 4 discuss input methods. Section 5 gives ways to recognize mathematical expressions embedded in ordinary text. Section 6explains how Unicode plain text can be helpful in programming languages. Section 7 gives conclu-sions. The appendices prent a simplified linear-format grammar and a partial list of operators.
2.Encoding Simple Math Expressions
Given Unicode’s strong support for mathematics2 relative to ASCII, how much better can a plain-text encoding of mathematical expressions look using Unicode? The most well-known ASCII encoding of such expressions is that of TeX, so we u it for comparison. MathML is more verbo than TeX and some of the comparisons ap-ply to it as well. Notwithstanding TeX’s phenomenal success in the science and engi-neering communities, a casual glance at its reprentations of mathematical expres-sions reveals that they do not look very much like the expressions they reprent. It’s not easy to make algebraic calculations by hand directly using TeX’s notation. With Unicode, one can reprent mathematical expressions more readably, and the resulting nearly plain text can often be ud with few or no modifications for such calculations. This capability is considerably enhanced by using the linear format in a system that can also display and edit the mathematics in built-up form.
The prent ction introduces the linear format with fractions, subscripts, and superscripts. It concludes with a subction on how the ASCII space character U+0020 is ud to build up one construct at a time. This is a key idea that makes the linear format ideal for inputting mathematical formulae. In general where syntax and mantic choices were made, input convenience was given high priority.
2.1Fractions
One way to specify a fraction linearly is LaTeX’s \frac{numerator}{denominator}.  The { } are not printed when the fraction is built up. The simple rules immediately give a “plain text” that is unambiguous, but looks quite different from the corre-sponding mathematical notation, thereby making it harder to read.
Instead we define a simple operand to consist of all concutive letters and decimal digits, i.e., a span of alphanumeric characters, tho belonging to the L x and Nd General Categories (e The Unicode Standard 5.0,1 Table 4-2. General Category). As such, a simple numerator or denominator is terminated by most nonalphanumer-ic characters, including, for example, arithmetic operators, the blank (U+0020), and Unicode characters in the ranges U+2200..U+23FF, U+2500..U+27FF, and U+2900 .. U+2AFF.  The fraction operator is given by the usual solidus / (U+002F).  So the sim-ple built-up fraction
abc d .
appears in linear format as abc/d. To force a display of a normal-size linear fraction, one can u \/ (backslash followed by slash).
uk postcode
For more complicated operands (such as tho that include operators), paren-thes (), brackets [], or braces {} can be ud to enclo the desired character combinations.  If parenthes are ud and the outermost parenthes are preceded and followed by operators, tho parenthes are not displayed in built-up form, since usually one does not want to e such parenthes. So the plain text (a + c)/d displays as
a+c
d
.
In practice, this approach leads to plain text that is easier to read than LaTeX’s, e.g., \frac{a + c}{d}, since in many cas, parenthes are not needed, while TeX requires {}’s.  To force the display of the outermost parenthes, one enclos them, in turn, within parenthes, which then become the outermost parenthes. For example, ((a + c))/d displays as
(a+c)
.
A really neat feature of this notation is that the plain text is, in fact, often a legit-imate mathematical notation in its own right, so it is relatively easy to read. Contrast this with the MathML version, which (with no parenthes) reads as
introduction是什么意思
<mfrac>
<mrow>
<mi>a</mi>
<mo>+</mo>
<mi>c</mi>
</mrow>whittier
<mi>d</mi>
</mfrac>
Three built-up fraction variations are available: the “fraction slash” U+2044 (which one might input by
typing \sdiv) builds up to a skewed fraction, the “division slash” U+2215 (\ldiv) builds up to a potentially large linear fraction, and the circled slash ⊘ (U+2298, \ndiv) builds up a small numeric fraction (although characters other than digits can be ud as well). The three kinds of built-up fractions are illus-trated by
rounda
b+c d
e+f ,
a
b+c
优点的英文
d
e+f
⁄,(
a
b+c
)(
d
e
+f)
When building up the large linear fraction, the outermost parenthes should not be removed.
The same notational syntax is ud for a “stack” which is like a fraction with no fraction bar. The stack is ud to create binomial coefficients and the stack operator is ‘¦’ (\atop). For example, the binomial theorem
(a+b)n=∑(n
k
)a k b n−k
n
k=0
in linear format reads as (e Sec. 3.4 for a discussion of the n-aryand “glue” opera-tor ▒)
huanqiu
(a + b)^n = ∑_(k=0)^n▒ (n ¦ k) a^k b^(n-k),
where (n ¦ k) is the binomial coefficient for the combinations of n items grouped k at a time. The summation limits u the subscript/superscript notation discusd in the next subction.
中级会计考试科目
Since binomial coefficients are quite common, TeX has the \choo control word for them. In the linear format Version 3, this us the \choo operator ⒞in-stead of the \atop operator ¦. Accordingly the binomial coefficient in the binomial theorem above can be written as “n\choo k”, assuming that you type a space after the k. This shortcut is included primarily for compatibility with TeX, since (n¦k) is pretty easy to type.
When / is followed by an operator, it’s highly unlikely that a fraction is intend-ed. This fact leads to a simple way to enter negated operators like ≠, namely, just

本文发布于:2023-06-18 11:24:46,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/149306.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:考试   科目
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图