Electronic marking and identification techniques to discourage document copying

更新时间:2023-05-21 18:27:24 阅读：评论：0

Electronic Marking and Identiﬁcation Techniques to Discourage Document Copying Jack T.Brassil,Senior Member,IEEE,Steven Low,Member,IEEE,Nicholas F.Maxemchuk,Fellow,IEEE,

and Lawrence O’Gorman,Senior Member,IEEE

Abstract—Modern computer networks make it possible to distribute documents quickly and economically by electronic means rather than by conventional paper means.However,the widespread adoption of electronic distribution of copyrighted material is currently impeded by the ea of unauthorized copying and dismination.In this paper we propo techniques that discourage unauthorized distribution by embedding each doc-ument with a unique codeword.Our encoding techniques are indiscernible by readers,yet enable us to identify the sanctioned recipient of a document by examination of a recovered docu-ment.We propo three coding methods,describe one in detail, and prent experimental results showing that our identiﬁcation techniques are highly reliable,even after documents have been photocopied.

I.I NTRODUCTION

E LECTRONIC distribution of publications is increasingly

available through on-line text databas,CD-ROM’s, computer network bad retrieval rvices,and electronic li-braries[1]–[6].One electronic library,the RightPages1Service [7]–[9],has been in place within Bell Laboratories since1991, and has recently been installed at the University of California in San Francisco.Electronic publishing is being driven by the decreasing cost of computer processing and high quality printers and displays.Furthermore,the incread availability of low cost,high speed data communications makes it possible to distribute electronic documents to large groups quickly and inexpensively[10].

While photocopy infringements of copyright have always concerned publishers,the need for document curity is much greater for electronic document distribution[11],[12].The same advances that make electronic publishing and distribution of documents feasible also increa the threat of“bootlegged”copies.With far less effort than it takes to copy a paper doc-ument and mail it to a single person,an electronic document can be nt to a large group by electronic mail.In addition, while originals and photocopies of a paper document can look and feel different,copies of electronic documents are identical. In order for electronic publishing to become accepted, publishers must be assured that revenues will not be lost due

Manuscript received August8,1994;revid March1,1995.A preliminary version of this paper was pre

nted at IEEE INFOCOM’94.

The authors are with the AT&T Bell Laboratories,Murray Hill,NJ07974 USA.

IEEE Log Number9413489.

1RightPages is a trademark of AT& theft of copyrighted materials.Widespread unauthorized document dismination should ideally be at least as costly or difﬁcult as obtaining the documents legitimately.Here we deﬁne“unauthorized dismination”as distribution of documents without the knowledge of—and payment to—the publisher;this contrasts legitimate document distribution by the publisher or the publisher’s electronic document distribu-tor.This paper describes a means of discouraging unauthorized copying and dismination.A document is marked in an indiscernible way by a codeword identifying the registered owner to whom the document is nt[13].If a document copy is found that is suspected to have been disminated without authorization,that copy can be decoded and the registered owner identiﬁed.

The techniques we describe here are complementary to the curity practices that can be applied to the legitimate distribution of documents.For example,a document can be encrypted prior to transmission across a computer network [14],[15].Then even if the documentﬁle is intercepted or stol

en from a databa,it remains unreadable to tho not posssing the decrypting key.The techniques we describe in this paper provide curity after a document has been decrypted,and is thus readable to all.

In addition to discouraging unauthorized dismination of documents distributed by computer network,our propod encoding techniques can also make paper copies of documents traceable.In particular,the codeword embedded in each doc-ument survives plain paper copying.Hence,our techniques can also be applied to“cloly held”documents,such as conﬁdential,limited distribution correspondence.We describe this both as a potential application of the methods and an illustration of their robustness in noi.

II.D OCUMENT C ODING M ETHODS Document marking can be achieved by altering the text formatting,or by altering certain characteristics of textual ,characters).The goal in the design of coding methods is to develop alterations that are reliably decodable (even in the prence of noi)yet largely indiscernible to the reader.The criteria,reliable decoding and minimum visible change,are somewhat conﬂicting;herein lies the challenge in designing document marking techniques.

The marking techniques we describe can be applied to either an image reprentation of the document or to a doc-ument formatﬁle.The document formatﬁle is a computer

Fig.1.Example of line-shift coding.The cond line has been shifted up by 1=300

inch.

(a)

(b)名人励志演讲

Fig.2.Example of word-shift coding.In (a),the top text line has added spacing before the “for,”the bottom text line has the same spacing after the “for.”In (b),the same text lines are shown again without the vertical lines to demonstrate that either spacing appears natural.

ﬁle describing the document content and page layout (or formatting),using standard format description languages such as PostScript,2TeX,troff,etc.It is from this format ﬁle that the image—what the reader es—is generated.The image reprentation describes each page (or subpage)of a document as an array of pixels.The image may be bitmap (also called binary or black-and-white),g

ray-scale,or color.For this work,we describe both document format ﬁle and image coding techniques,however we restrict the latter to bitmaps encoded within the binary-valued text regions.

Common to each technique is that a codeword is embedded in the document by altering particular textual features.For instance,consider the codeword 1101(binary).Reading this code right to left from the least signiﬁcant bit,the ﬁrst document feature is altered for bit 1,the cond feature is not altered for bit 0,and the next two features are altered for the two 1bits.It is the type of feature that distinguishes each particular encoding method.We describe the features for each method below and give a simple comparison of the relative advantages and disadvantages of each technique.The three coding techniques that we propo illustrate different approaches rather than form an exhaustive list of document marking techniques.The techniques can be ud either parately or jointly.Each technique enjoys certain advantages or applicability as we discuss below.2012年大学排名

2PostScript

is a trademark of Adobe Systems,Inc.

A.Line-Shift Coding

This is a method of altering a document by vertically shifting the locations of text lines to encode the document uniquely.This encoding may be applied either to the format ﬁle or to the bitmap of a page image.The embedded codeword may be extracted from the format ﬁle or bitmap.In certain cas this decoding can be accomplished without need of the original image,since the original is known to have uniform line spacing (i.e.,“leading”)between adjacent lines within a paragraph.

B.Word-Shift Coding

This is a method of altering a document by horizontally shifting the locations of words within text lines to encode the document uniquely.This encoding can be applied to either the format ﬁle or to the bitmap of a page image.Decoding may be performed from the format ﬁle or bitmap.The method is least visible when applied to documents with variable spacing between adjacent words.Variable spacing in text documents is commonly ud to distribute white space when justifying text.Becau of this variable spacing,decoding requires the original image—or more speciﬁcally,the spacing between words in the unencoded document.See Fig.2for an example of word-shift coding.

Consider the following example of how a document might be encoded with word-shifting.For each text line,the largest

BRASSIL et al.:ELECTRONIC MARKING AND IDENTIFICATION TECHNIQUES TO DISCOURAGE DOCUMENT COPYING

1497

(a)

(b)

(c)

Fig.3.Example shows feature coding performed on a portion of text from a journal table of contents.I

n (a),no coding has been applied.In (b),feature coding has been applied to lect characters.In (c),the feature coding has been exaggerated to show feature alterations.

and smallest spacings between words are found.To code a line,the largest spacing is decremented by some amount and the smallest is augmented by the same amount.This maintains the text line length,and produces little qualitative change to the text image.

C.Feature Coding

This is a coding method that is applied either to a format ﬁle or to a bitmap image of a document.The image is examined for chon text features,and tho features are altered,or not altered,depending on the codeword.Decoding requires the original image,or more speciﬁcally,a speciﬁcation of the change in pixels at a feature.There are many possible choices of text features;here,we choo to alter upward,vertical endlines—that is the tops of letters,b ,d ,h ,etc.The endlines are altered by extending or shortening their lengths by one (or more)pixels,but otherwi not changing the endline feature.See Fig.3for an example of feature coding.

Among the propod encoding techniques,line-shifting is likely to be the most easily discernible by readers.However we also expect line-shifting to be the most robust type of encoding in the prence

of noi.This is becau the long lengths of text lines provide a relatively easily detectable feature.For this reason,line shifting is particularly well suited to marking documents to be distributed in paper form,where noi can be introduced in printing and photocopying.As we will show in Section III,our experiments indicate that we can easily encode documents with line shifts that are sufﬁciently small that they are not noticed by the casual reader,while still retaining the ability to decode reliably.

We expect that word-shifting will be less discernible to the reader than line-shifting,since the spacing between adjacent words on a line is often varied to support text justiﬁcation.Fea-ture encoding can accommodate a particularly large number of sanctioned document recipients,since there are frequently two or more features available for encoding in each word.Feature alterations are also largely indiscernible to readers.Feature encoding also has the additional advantage that it can

be applied simply to image ﬁles,which allows encoding to be introduced in the abnce of a format ﬁle.

Implementing any of the three document marking tech-niques described above incurs certain “costs”for the electronic document distributor.While the exact nature of the costs is implementation dependent,we can nonetheless make veral general remarks bad on our experience [16].Distributors must incur a small penalty in maintaining a library of “code-books”which contain a mapping of embedded codewords and recipients for each original (unmarked)document they mark and distribute.A larger penalty is paid in distributing images rather than higher level page descriptions—roughly 3–5times the number of bits must be transmitted to the subscriber.3

A technically sophisticated “attacker”can detect that a document has been encoded by any of the three techniques we have introduced.Such an attacker can also attempt to remove the encoding (e.g.,produce an unencoded document copy).Our goal in the design of encoding techniques is to make successful attacks extremely difﬁcult or costly.We will return to a discussion of the difﬁculty of defeating each of our encoding techniques in Section IV.

III.I MPLEMENTATION AND E XPERIMENTAL R ESULTS FOR L INE -S HIFT C ODING M ETHOD

In this ction we describe in detail the methods for coding and decoding we ud for testing the line-shift coding method.Each intended document recipient was preassigned a unique codeword.Eac

h codeword speciﬁed a t of text lines to be moved in the document speciﬁcally for that recipient.The length of each codeword equaled the maximum number of lines that were displaced in the area to be encoded.In our line-shift encoder,each codeword element belonged to the alphabet

stretcher

{

1498IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS,VOL.13,NO.8,OCTOBER

1995

Fig.4.Proﬁle of a recovered document page.Decoding a page with line shifting requires measuring the distances between adjacent text line centroids (marked with )or balines(marked with+)and deciding whether white space has been added or subtracted.

performance was greatly improved by constraining the t of

lines moved.In the results prented in this paper,we ud

a differential(or difference)encoding technique.With this

coding we kept every other line of text in each paragraph

unmoved,starting with theﬁrst line of each paragraph.Each

line between two unmoved lines was always moved either up

or down.That is,for each paragraph,the1st,3rd,5th,etc.lines

were unmoved,while the2nd,4th,etc.lines were moved.

This encoding was partially motivated by image defects we

will discuss later in this ction.Note that the conquence of

using differential encoding is that the length of each codeword

is cut approximately in half.While this reduces the potential

number of recipients for an encoded document,the number

can still be extremely large.In each of our experiments we

displaced at least19lines,which corresponds to a potential of

least

,.Then

the text line centroid is given

and is either

shifted up or down.In the unaltered document,the distance

between adjacent balines,or baline spacings,are the same.

Let

and be the distances between

balines,

and between

balines,respectively,in the altered

document.Then the baline detection decision rule

is:

(3.2)

Unlike baline spacings,centroid spacings between adjacent

text lines in the original unaltered document are not necessarily

uniformly spaced.In centroid-bad detection,the decision is

bad on the difference of centroid spacings in the altered and

unaltered documents.More speciﬁcally,

let

and be the

centroid spacings between

lines,and between

linesusual

,respectively,in the altered document;let

and be the corresponding centroid spacings in the unaltered

document.Then the centroid detection decision rule

is:

BRASSIL et al.:ELECTRONIC MARKING AND IDENTIFICATION TECHNIQUES TO DISCOURAGE DOCUMENT COPYING1499 every other line is moved and this information is known to the

decoder,fal alarms do not occur.

A.Experimental Results for Line-Shift Coding

We conducted two ts of experiments.Theﬁrst t tested

how well line-shift coding works with different font sizes and

different line spacing shifts in the prence of limited,but

typical,image noi.The cond t tested how well aﬁxed

line spacing shift could be detected as document degradation

became increasingly vere.In this ction,weﬁrst describe

the experiments and then prent our results.

The equipment we ud in both experiments was as follows:

a Ricoh FS1S400dpi Flat Bed Electronic Scanner,Apple

LarWriter IIntx300dpi lar printer,and a Xerox5052

plain paper copier.4The printer and copier were lected in

part becau they are typical of equipment found in wide u

in ofﬁce environments.The particular machines we ud could

be characterized as being heavily ud but well maintained.

Writing the software routine to implement a rudimentary

line-shift encoder for a PostScript inputﬁle was simple.

We cho the PostScript format becau:1)it is the most

common Page Description Language in u today,2)it enables

us to have sufﬁcientlyﬁne control of text placement,and

3)it permits us to encode documents produced by a widewithout tripping

variety of word processing applications.PostScript describes

the document content a page at a time.Roughly speaking,it

speciﬁes the content of a text line(or text line fragment such

as a phra,word,or character)and identiﬁes the location for

the text to be displayed.Text location is speciﬁed by an x-y

coordinate reprenting a position on a virtual page;this posi-

tion can typically be altered by arbitrarily small displacements.

However,most personal lar printers in common u today

have a300dpi“resolution,”so they are unable to distinctly

render text subject to a displacement of less than1/300inch.

1)Variable Font Size Experiment:Theﬁrst t of experi-

ments each ud a single-spaced page of text in the Times-

Roman font.The page was coded using the differential encod-

ing scheme.We performed nine experiments using font sizes

of8,10,or12points and shifting alternate lines(within each

paragraph)up or down by1,2,or3pixels.Each page of

8,10,and12point size text extended for23,21,and19

二建考试用书lines,respectively.Different numbers of encoded lines per

page ari naturally,since as the font size decreas,more lines

can be placed on the page,permitting more information to be

encoded.Since our printer has a300dpi resolution,each pixel

corresponds to

th copy;the,is produced

by copying the

1500IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS,VOL.13,NO.8,OCTOBER 1995

somewhat from copy to copy.This suggests that line spacing “information”is still prent in the text balines,and can perhaps be made available with some additional processing.We have reported the uncoded error performance of our marking scheme.But the 21line shifts ud in the experiment were not chon arbitrarily.The 21line shifts comprid 3concatenated codewords lected from a

Hamming

diba

Text line skew was largely removed by image rotation,at the expen of the introduction of some distortion due to bilinear interpolation of sampled data.

Blurring (i.e.,edge raggedness)also incread with the number of copies produced.However,blurring emed to have surprisingly minor implications in detection performance.It is possible that blurring introduces noi in a symmetrical

fashion on text lines,so it does not contribute signiﬁcantly to displacing centroid locations.Plain paper copies were produced at the copier’s nominal “copy darkness”tting;blurring typically increas with copy darkness.As the number of copies incread,copy darkness generally varied over a page;regions of vere fading were sometimes obrved.It is unclear whether blurring or fading is more detrimental to decoding performance.

Expansion or shrinking of copy size is another potential problem.It is not unusual to discover a 4%page length or width change after 10copies.Further,expansion along the length and width of a page can be markedly different.Copy size changes forced us to u differential encoding—that is,encoding information in the relative rather than absolute shifts between adjacent text lines.C.A Noi Model

In this subction we prent a simple model of the noi affecting text line centroids.We distinguish

two types of noi.The ﬁrst type of noi models the distortion in printing and scanning the document;the cond type models the distortion in copying.This cond type of noi increas with the number of copies while the ﬁrst type does not.An unaltered page of text

with vertical

coordinates

text lines is

effectively described

th line spacing

shift is positive if extra space has

been added,negative if space has been subtracted,and zero otherwi.This line spacing shift changes the

浙江大学怎么样

original

Let th centroid

spacing in

the

th centroid

spacing)of distortion introduced by printing,scanning,and

image processing.We assume that the printer

noi

is strictly additive and logically distorts the centroid spacings of the original paper copy

are independent and identically

distributed Gaussian random variables.This assumption is supported by our measurements [22],which yield a mean痘印怎么消除小妙招

and variance

be the random noi that summarizes the cumulative

effect of skewing,scaling,and other photographic distortions

introduced on the

by making

the th copy

are

(4.3)

Hence,the centroid

spacing

本文发布于:2023-05-21 18:27:24，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/722625.html

上一篇：ASP.NETWebAPI全局权限和全局异常处理

下一篇：CTPAT Security Asssment Questionnaire - -中英文版本-2020年

标签：演讲考试痘印名人

留言与评论（共有 0 条评论）