-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcharset-harmful.ps
More file actions
executable file
·728 lines (726 loc) · 26.2 KB
/
charset-harmful.ps
File metadata and controls
executable file
·728 lines (726 loc) · 26.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
%!PS-Adobe-3.0
%%BoundingBox: 54 72 558 720
%%Creator: Mozilla (NetScape) HTML->PS
%%DocumentData: Clean7Bit
%%Orientation: Portrait
%%Pages: 6
%%PageOrder: Ascend
%%Title: "Character Set" Considered Harmful
%%EndComments
%%BeginProlog
[ /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /space /exclam /quotedbl /numbersign /dollar /percent /ampersand /quoteright
/parenleft /parenright /asterisk /plus /comma /hyphen /period /slash /zero /one
/two /three /four /five /six /seven /eight /nine /colon /semicolon
/less /equal /greater /question /at /A /B /C /D /E
/F /G /H /I /J /K /L /M /N /O
/P /Q /R /S /T /U /V /W /X /Y
/Z /bracketleft /backslash /bracketright /asciicircum /underscore /quoteleft /a /b /c
/d /e /f /g /h /i /j /k /l /m
/n /o /p /q /r /s /t /u /v /w
/x /y /z /braceleft /bar /braceright /c0126 /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/space /exclamdown /cent /sterling /currency /yen /brokenbar /section /dieresis /copyright
/ordfeminine /guillemotleft /logicalnot /hyphen /registered /macron /degree /plusminus /twosuperior /threesuperior
/acute /mu /paragraph /periodcentered /cedilla /onesuperior /ordmasculine /guillemotright /onequarter /onehalf
/threequarters /questiondown /Agrave /Aacute /Acircumflex /Atilde /Adieresis /Aring /AE /Ccedilla
/Egrave /Eacute /Ecircumflex /Edieresis /Igrave /Iacute /Icircumflex /Idieresis /Eth /Ntilde
/Ograve /Oacute /Ocircumflex /Otilde /Odieresis /multiply /Oslash /Ugrave /Uacute /Ucircumflex
/Udieresis /Yacute /Thorn /germandbls /agrave /aacute /acircumflex /atilde /adieresis /aring
/ae /ccedilla /egrave /eacute /ecircumflex /edieresis /igrave /iacute /icircumflex /idieresis
/eth /ntilde /ograve /oacute /ocircumflex /otilde /odieresis /divide /oslash /ugrave
/uacute /ucircumflex /udieresis /yacute /thorn /ydieresis] /isolatin1encoding exch def
/c { matrix currentmatrix currentpoint translate
3 1 roll scale newpath 0 0 1 0 360 arc setmatrix } bind def
/F0
/Times-Roman findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f0 { /F0 findfont exch scalefont setfont } bind def
/F1
/Times-Bold findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f1 { /F1 findfont exch scalefont setfont } bind def
/F2
/Times-Italic findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f2 { /F2 findfont exch scalefont setfont } bind def
/F3
/Times-BoldItalic findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f3 { /F3 findfont exch scalefont setfont } bind def
/F4
/Courier findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f4 { /F4 findfont exch scalefont setfont } bind def
/F5
/Courier-Bold findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f5 { /F5 findfont exch scalefont setfont } bind def
/F6
/Courier-Oblique findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f6 { /F6 findfont exch scalefont setfont } bind def
/F7
/Courier-BoldOblique findfont
dup length dict begin
{1 index /FID ne {def} {pop pop} ifelse} forall
/Encoding isolatin1encoding def
currentdict end
definefont pop
/f7 { /F7 findfont exch scalefont setfont } bind def
/rhc {
{
currentfile read {
dup 97 ge
{ 87 sub true exit }
{ dup 48 ge { 48 sub true exit } { pop } ifelse }
ifelse
} {
false
exit
} ifelse
} loop
} bind def
/cvgray { % xtra_char npix cvgray - (string npix long)
dup string
0
{
rhc { cvr 4.784 mul } { exit } ifelse
rhc { cvr 9.392 mul } { exit } ifelse
rhc { cvr 1.824 mul } { exit } ifelse
add add cvi 3 copy put pop
1 add
dup 3 index ge { exit } if
} loop
pop
3 -1 roll 0 ne { rhc { pop } if } if
exch pop
} bind def
/smartimage12rgb { % w h b [matrix] smartimage12rgb -
/colorimage where {
pop
{ currentfile rowdata readhexstring pop }
false 3
colorimage
} {
exch pop 8 exch
3 index 12 mul 8 mod 0 ne { 1 } { 0 } ifelse
4 index
6 2 roll
{ 2 copy cvgray }
image
pop pop
} ifelse
} def
/cshow { dup stringwidth pop 2 div neg 0 rmoveto show } bind def
/rshow { dup stringwidth pop neg 0 rmoveto show } bind def
%%EndProlog
%%Page: 1 1
%%BeginPageSetup
/pagelevel save def
54 0 translate
%%EndPageSetup
gsave 0.97241 1 scale
12 f4 0 710.33 moveto
(HTML Working Group ) show
grestore
gsave 0.97241 1 scale
12 f4 439.2 710.33 moveto
(D. Connolly) show
grestore
gsave 0.97241 1 scale
12 f4 0 697.67 moveto
(INTERNET-DRAFT ) show
grestore
gsave 0.97241 1 scale
12 f4 439.2 697.67 moveto
(MIT/W3C) show
grestore
gsave 0.97241 1 scale
12 f4 0 685.01 moveto
(draft-ietf-html-charset-harmful-00.txt May 2, 1995) show
grestore
gsave 0.97241 1 scale
12 f4 0 672.35 moveto
(Expires November, 1995) show
grestore
24 f3 0 619.25 moveto
(Character Set) show
24 f1 136.75 619.25 moveto
( Considered Harmful) show
18 f1 0 582.2 moveto
(Status of this Document) show
12 f0 0 552.51 moveto
(This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering) show
12 f0 0 539.13 moveto
(Task Force \(IETF\), its areas, and its working groups. Note that other groups may also distribute working) show
12 f0 0 525.75 moveto
(documents as Internet-Drafts. ) show
12 f0 0 497.37 moveto
(Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or) show
12 f0 0 483.99 moveto
(obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material) show
12 f0 0 470.61 moveto
(or to cite them other than as "work in progress." ) show
12 f0 0 442.23 moveto
(To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in) show
12 f0 0 428.85 moveto
(the Internet-Drafts Shadow Directories on ftp.is.co.za \(Africa\), nic.nordu.net \(Europe\), munnari.oz.au) show
12 f0 0 415.47 moveto
(\(Pacific Rim\), ds.internic.net \(US East Coast\), or ftp.isi.edu \(US West Coast\). ) show
12 f0 0 387.09 moveto
(Distribution of this document is unlimited. Please send comments to the HTML working group) show
12 f0 0 373.71 moveto
(\(HTML-WG\) of the Internet Engineering Task Force \(IETF\) at ) show
12 f4 306.25 373.71 moveto
(<html-wg@oclc.org>) show
12 f0 435.85 373.71 moveto
(;. Discussions) show
12 f0 0 359.94 moveto
(of the group are archived at ) show
12 f4 135.28 359.94 moveto
(http://www.acl.lanl.gov/HTML_WG/archives.html) show
12 f0 459.28 359.94 moveto
(. ) show
18 f1 0 325.12 moveto
(Abstract) show
12 f0 0 295.43 moveto
(The term ) show
12 f2 46.65 295.43 moveto
(character set) show
12 f0 109.85 295.43 moveto
( is often used to describe a ditigal representation of text. ASCII is perhaps the) show
12 f0 0 282.05 moveto
(most widely deployed representation of text, and in the interest of interoperability, information systems) show
12 f0 0 268.67 moveto
(on the Internet traditionally rely on it exclusively. ) show
12 f0 0 240.29 moveto
(The Multipurpose Internet Mail Extensions \(MIME\) introduces Internet Media Types, including text) show
12 f0 0 226.91 moveto
(representations besides ASCII. The Hypertext Markup Language \(HTML\) used in the World-Wide Web) show
12 f0 0 213.53 moveto
(is a proposed Internet Media Type. But HTML is also an application of Standard Generalized Markup) show
12 f0 0 200.15 moveto
(Language \(SGML\). ) show
12 f0 0 171.77 moveto
(In the MIME and SGML specifications, the discussion of characters representation is notoriously) show
12 f0 0 158.39 moveto
(complex, and apparently subtly inconsistent or incompatible. This document presents a collection of) show
12 f0 0 145.01 moveto
(terms intended to reconcile the two specifications and serve as a basis for rigorous discussion of) show
12 f0 0 131.63 moveto
(characters and their digital representations. ) show
18 f1 0 97.2 moveto
(Introduction) show
pagelevel restore
showpage
%%Page: 2 2
%%BeginPageSetup
/pagelevel save def
54 0 translate
%%EndPageSetup
12 f0 0 709.22 moveto
(The term ) show
12 f2 46.65 709.22 moveto
(character set) show
12 f0 109.85 709.22 moveto
( is often used to describe a ditigal representation of text. The specification of such) show
12 f0 0 695.84 moveto
(a representation typically involves identifying a sufficiently expressive collection of characters, and) show
12 f0 0 682.46 moveto
(giving each of them a number. ) show
12 f0 0 654.08 moveto
(In conventional mathematics terminology then, a "character set" is not just a set of characters, but a) show
12 f2 1.76 640.7 moveto
(function) show
12 f0 41.09 640.7 moveto
( whose domain is a set of integers, and whose range is a set of characters. ) show
12 f0 0 612.32 moveto
(Some standards documents, including the SGML standard, make little or no use of such conventional) show
12 f0 0 598.94 moveto
(mathematical terms as function, domain and range. Perhaps the authors of those documents intend the) show
12 f0 0 585.56 moveto
(documents to be comprehensible without a prior understanding of mathematics. But the specification of) show
12 f0 0 572.18 moveto
(notions such as the conformance of an SGML document or SGML system are much more complex than) show
12 f0 0 558.8 moveto
(the basics of logic and mathematics. ) show
12 f0 0 530.42 moveto
(In his text on Calculus) show
12 f0 108 530.42 moveto
([Spivak]) show
12 f0 149.32 530.42 moveto
(, Michael Spivak writes: ) show
12 f0 30 502.04 moveto
(Every aspect of this book was influenced by the desire to present calculus not merely as a) show
12 f0 30 488.66 moveto
(prelude to but as the first real encounter with mathematics. Since the foundation of analysis) show
12 f0 30 475.28 moveto
(provided the arena in which modern modes of mathematical thinking developed, calculus) show
12 f0 30 461.9 moveto
(ought to be the place in which to expect, rather than avoid, the strengthening of insight with) show
12 f0 30 448.52 moveto
(logic. In addition to developing the students' intuition about the beautiful concepts of) show
12 f0 30 435.14 moveto
(analysis, it is surely equally important to persuade them that precision and rigor are neither) show
12 f0 30 421.76 moveto
(deterrents to intuition, nor ends in themselves, but the natural medium in which to formulate) show
12 f0 30 408.38 moveto
(and think about mathematical questions. ) show
12 f0 0 380 moveto
(This document is not intended as the first real encounter with mathematics. But neither will we make) show
12 f0 0 366.62 moveto
(any effort to avoid or apologize for mathematical terminology. The reader is referred to the large body) show
12 f0 0 353.24 moveto
(of literature on logic and set theory, including a history of writings on math and logic[SET] and Douglas) show
12 f0 0 339.86 moveto
(Hofstadter's fascinating book) show
12 f0 142.03 339.86 moveto
([GEB]) show
12 f0 174.02 339.86 moveto
(. ) show
18 f1 0 305.43 moveto
(Coded Character Sets) show
12 f0 0 275.74 moveto
(Using "character set" rather than something such as ) show
12 f2 250.4 275.74 moveto
(character table) show
12 f0 324.05 275.74 moveto
( or even ) show
12 f2 365.7 275.74 moveto
(character sequence) show
12 f0 459.33 275.74 moveto
( to) show
12 f0 0 262.36 moveto
(denote the functions that maps integers to characters is unfortunate, but it is water under the bridge, and) show
12 f0 0 248.98 moveto
(a lot of it by now. Rather than attempting to divert all that water at this point, we introduce the primitive) show
12 f0 0 235.6 moveto
(notion of character and use it to define the term ) show
12 f2 230.61 235.6 moveto
(coded character set) show
12 f0 325.47 235.6 moveto
( from [ISO10646] and other) show
12 f0 0 222.22 moveto
(standards: ) show
12 f0 0 193.84 moveto
(character ) show
12 f0 30 180.46 moveto
(An atom of information ) show
12 f0 0 167.08 moveto
(coded character set ) show
12 f0 30 153.7 moveto
(A function whose domain is a subset of the integers, and whose range is a set of characters. ) show
12 f0 0 125.32 moveto
(Note that by the term character, we do not mean a glyph, a name, a phoneme, nor a bit combination. A) show
12 f0 0 111.94 moveto
(character is simply an atomic unit of communication. It is typically a symbol whose various) show
12 f0 0 98.56 moveto
(representations are understood to mean the same thing by a community of people. ) show
pagelevel restore
showpage
%%Page: 3 3
%%BeginPageSetup
/pagelevel save def
54 0 translate
%%EndPageSetup
12 f0 0 709.22 moveto
(It might seem more intuitive to map from characters to integers, rather than the way it is defined here.) show
12 f0 0 695.84 moveto
(But in practice there are some coded character sets that assign two different numbers to the same) show
12 f0 0 682.46 moveto
(character) show
12 f0 43.99 682.46 moveto
([Lee]) show
12 f0 69.97 682.46 moveto
(, and so the inverse is not a function in the general case. ) show
12 f0 0 654.08 moveto
(There are two other terms used in standards such as [ISO10646] that we define in relation to the first) show
12 f0 0 640.7 moveto
(two: ) show
12 f0 0 612.32 moveto
(code position ) show
12 f0 30 598.94 moveto
(An integer. A coded character set and a code position from its domain determine a character. ) show
12 f0 0 585.56 moveto
(character repertoire ) show
12 f0 30 572.18 moveto
(A set of characters; that is, the range of a coded character set. ) show
18 f1 0 537.75 moveto
(Character Encoding Schemes) show
12 f0 0 508.06 moveto
(The only practical means for exchanging information on the Internet is to represent it as a sequence of) show
12 f0 0 494.68 moveto
(octets \(bytes\). ) show
12 f0 0 466.3 moveto
(One way to transmit a sequence of characters is to agree on a coded character set and transmit the) show
12 f0 0 452.92 moveto
(character numbers of each of the characters. ) show
12 f0 0 424.54 moveto
(But in practice, characters are encoded using a variety of optimizations of this brute-force approach:) show
12 f0 0 411.16 moveto
(code switching techniques, escape sequences, etc. The encoding of a sequence of characters is not, in) show
12 f0 0 397.78 moveto
(general, the result of encoding each character independently and then concatenating them. But it is) show
12 f0 0 384.4 moveto
(sufficiently general to note that sequences of characters are encoded as a sequence of bytes. So we) show
12 f0 0 371.02 moveto
(define: ) show
12 f0 0 342.64 moveto
(octet ) show
12 f0 30 329.26 moveto
(an element of the set {0, 1, 2, ..., 255} ) show
12 f0 0 315.88 moveto
(character encoding scheme ) show
12 f0 30 302.5 moveto
(a function whose domain is the set of sequences of octets, and whose range is the set of sequences) show
12 f0 30 289.12 moveto
(of characters over some character repertoire. ) show
18 f1 0 254.69 moveto
(Representation of SGML Text Entities) show
12 f0 0 225 moveto
(An SGML document is made up of entities: a text entity called the document entity, and possibly some) show
12 f0 0 211.62 moveto
(other text entities and data entities. ) show
12 f0 0 183.24 moveto
(A text entity is a sequence of characters. The representation of a text entity is not specified by the SGML) show
12 f0 0 169.86 moveto
(standard. For the purpose of MIME-based interchange of SGML text entities, we define the following: ) show
12 f0 0 141.48 moveto
(text entity ) show
12 f0 30 128.1 moveto
(a sequence of characters ) show
12 f0 0 114.72 moveto
(message entity ) show
12 f0 30 101.34 moveto
(a pair \(T, OS\) where T is an Internet Media Type and OS is a sequence of octets. ) show
pagelevel restore
showpage
%%Page: 4 4
%%BeginPageSetup
/pagelevel save def
54 0 translate
%%EndPageSetup
12 f0 0 709.22 moveto
(Note that each ) show
12 f4 72.31 709.22 moveto
(text/*) show
12 f0 115.51 709.22 moveto
( media type has an associated ) show
12 f4 260.14 709.22 moveto
(charset) show
12 f0 310.54 709.22 moveto
( parameter, which designates a) show
12 f0 0 695.45 moveto
(character encoding scheme. The character encoding scheme maps the body -- a sequence of octets -- to a) show
12 f0 0 682.07 moveto
(text entity -- a sequence of characters. Hence any message entity of type ) show
12 f4 349.89 682.07 moveto
(text/*) show
12 f0 393.09 682.07 moveto
( is equivalent to a text) show
12 f0 0 668.3 moveto
(entity. ) show
18 f1 0 633.87 moveto
(Numeric Character References) show
12 f0 0 604.18 moveto
(Numeric character references are a great source of confusion. The key insights are that: ) show
19.96 579.89 moveto
3.345 3.345 c fill
12 f0 30 575.8 moveto
( Every SGML document has exactly one document character set, which is a coded character set ) show
19.96 566.51 moveto
3.345 3.345 c fill
12 f0 30 562.42 moveto
( Numeric character references give code positions in the document character set ) show
14 f1 0 531.73 moveto
(Example: ISO2022 Encoding with ISO10646 Coded Character Set) show
12 f0 0 502.91 moveto
(Consider the following message entity: ) show
12 f4 0 475.64 moveto
(Date: Saturday, 29-Apr-95 03:53:33 GMT) show
12 f4 0 462.98 moveto
(MIME-version: 1.0) show
12 f4 0 450.32 moveto
(Content-Type: text/html; charset=iso-2022-jp) show
12 f4 0 425 moveto
(<TITLE>...</TITLE>) show
12 f4 0 412.34 moveto
(<BODY>) show
12 f4 0 399.68 moveto
(Here is some normal text.) show
12 f4 0 387.02 moveto
(Here is a 10646 numeric character reference \200.) show
12 f4 0 374.36 moveto
(Here is some ISO-2022-JP text: ...) show
12 f4 0 361.7 moveto
(</BODY>) show
12 f0 0 320.27 moveto
(To interpret the message entity, we notice that the ) show
12 f4 242.62 320.27 moveto
(Content-Type) show
12 f0 329.02 320.27 moveto
( is ) show
12 f4 343.02 320.27 moveto
(text/html) show
12 f0 407.82 320.27 moveto
(, so this represents) show
12 f0 0 306.5 moveto
(a text entity. The ) show
12 f4 84.32 306.5 moveto
(charset) show
12 f0 134.72 306.5 moveto
( parameter ) show
12 f4 188.69 306.5 moveto
(iso-2022-jp) show
12 f0 267.89 306.5 moveto
(, along with the octet sequence of the body,) show
12 f0 0 292.73 moveto
(determines a sequence of characters. The octets denoted above by '...' represent characters, as per) show
12 f4 0 279.35 moveto
(iso-2022-jp) show
12 f0 79.2 279.35 moveto
(. ) show
12 f0 0 250.97 moveto
(To parse the resulting text entity as per SGML, the sender and receiver must agree on an SGML) show
12 f0 0 237.59 moveto
(declaration, since none is present in the document entity. For this example, we assume that SGML) show
12 f0 0 224.21 moveto
(declaration specifies ISO10646 as the document character set. So the numeric character reference ) show
12 f4 471.18 224.21 moveto
(\200) show
12 f0 471.18 224.21 moveto
( is) show
12 f0 0 210.44 moveto
(resolved with respect to ISO10646. ) show
12 f0 0 182.06 moveto
(It may seem contradictory that the ) show
12 f4 167.3 182.06 moveto
(ISO-2022-JP) show
12 f0 246.5 182.06 moveto
( character encoding scheme is defined in terms of a) show
12 f0 0 168.29 moveto
(collection of coded character sets, none of which is ISO10646. But there is no contradiction. Each) show
12 f0 0 154.91 moveto
(character encoded by ) show
12 f4 104.95 154.91 moveto
(ISO-2022-JP) show
12 f0 184.15 154.91 moveto
( is in the repertoire of one of those coded character sets, each of) show
12 f0 0 141.14 moveto
(which is a subset of the repertoire of ISO10646. ) show
12 f0 0 112.76 moveto
(So while ) show
12 f4 45.33 112.76 moveto
(ISO-2022-JP) show
12 f0 124.53 112.76 moveto
( is not sufficient for every ISO10646 document, it is the case that ISO10646 is) show
12 f0 0 98.99 moveto
(a sufficient document character set for any entity encoded with ) show
12 f4 305.25 98.99 moveto
(ISO-2022-JP) show
12 f0 384.45 98.99 moveto
(. ) show
pagelevel restore
showpage
%%Page: 5 5
%%BeginPageSetup
/pagelevel save def
54 0 translate
%%EndPageSetup
14 f1 0 706.91 moveto
(Example: Reducing the Repertoire of an Entity) show
12 f0 0 678.09 moveto
(Suppose we have an SGML document D whose document character set is the coded character set) show
12 f0 0 664.71 moveto
(ISO10646. We find the document entity DE in the form of sequence of octets OS in a disk file, encoded) show
12 f0 0 651.33 moveto
(using the Unicode-UCS-2 character encoding scheme. ) show
12 f4 0 624.06 moveto
( Unicode-UCS-2\(OS\) = DE) show
12 f0 0 595.29 moveto
(We can reduce the character repertoire necessary to represent the document entity by replacing) show
12 f0 0 581.91 moveto
(characters outside the ISO-646-IRV character repertoire with numeric character references: ) show
gsave 0.902627 1 scale
12 f4 0 554.64 moveto
( DE' = reduce\(DE, ISO10646, ISO-646-IRV\)) show
grestore
gsave 0.902627 1 scale
12 f4 0.04 529.32 moveto
(where) show
grestore
gsave 0.902627 1 scale
12 f4 0 504 moveto
( reduce : SEQ\(char\) X Coded Character Set X Character Repertoire -> SEQ\(char\)) show
grestore
gsave 0.902627 1 scale
12 f4 0 478.68 moveto
(and) show
grestore
gsave 0.902627 1 scale
12 f4 0 453.36 moveto
( reduce\(c . rest, CCS, R\) = if c in R, c . reduce\(rest, CCS, R\)) show
grestore
gsave 0.902627 1 scale
12 f4 0 440.7 moveto
( else &#N; . reduce\(rest, CCS, R\)) show
grestore
gsave 0.902627 1 scale
12 f4 0 428.04 moveto
( where CCS\(N\) = c) show
grestore
12 f0 0 399.27 moveto
(The resulting entity, DE' can then be endoded using US-ASCII ) show
12 f4 0 372 moveto
( US-ASCII\(OS'\) = DE' = reduce\(DE, ISO10646, ISO-646-IRV\)) show
12 f0 0 343.23 moveto
(Hence, we can represent the document D as a message entity whose content type is "text/plain;) show
12 f0 0 329.85 moveto
(charset=US-ASCII" and whose body is OS'. ) show
18 f1 0 295.42 moveto
(Conclusion) show
12 f0 0 265.73 moveto
(It is critical to keep separate the notion of a simple table of characters and their numbers, i.e. a coded) show
12 f0 0 252.35 moveto
(character set, separate from the various algorithms to encoded sequences of characters, i.e. character) show
12 f0 0 238.97 moveto
(encoding schemes. This separation allows a representation of a text entity which is consistent with both) show
12 f0 0 225.59 moveto
(the MIME and SGML specifications. ) show
18 f1 0 191.16 moveto
(Acknowledgements) show
12 f0 0 161.47 moveto
(The idea for the title of this document actually came from John Klensin. The notion of character) show
12 f0 0 148.09 moveto
(encoding scheme was inspired by the MIME specification by Ned Freed. James Clark, Ed Levinson, and) show
12 f0 0 134.71 moveto
(several other members of the MIMESGML working group collaborated in discussions leading up to this) show
12 f0 0 121.33 moveto
(draft. Liam Quin from SoftQuad and Gavin Nicol from EBT have provided guidance on these issues in) show
12 f0 0 107.95 moveto
(the past. Erik Naggum has provided invaluable aid in understanding the SGML standard. ) show
pagelevel restore
showpage
%%Page: 6 6
%%BeginPageSetup
/pagelevel save def
54 0 translate
%%EndPageSetup
18 f1 0 703.17 moveto
(References) show
12 f0 0 673.48 moveto
([MIME]) show
12 f0 40.65 673.48 moveto
( ) show
12 f0 30 660.1 moveto
(N. Borenstein and N. Freed. "MIME \(Multipurpose Internet Mail Extensions\) Part One:) show
12 f0 30 646.72 moveto
(Mechanisms for Specifying and Describing the Format of Internet Message Bodies." RFC 1521,) show
12 f0 30 633.34 moveto
(Bellcore, Innosoft, September 1993. ) show
12 f0 0 619.96 moveto
([ASCII]) show
12 f0 39.32 619.96 moveto
( ) show
12 f0 30 606.58 moveto
(US-ASCII. Coded Character Set - 7-Bit American Standard Code for Information Interchange.) show
12 f0 30 593.2 moveto
(Standard ANSI X3.4-1986, ANSI, 1986. ) show
12 f0 0 579.82 moveto
([ISO-8859]) show
12 f0 55.32 579.82 moveto
( ) show
12 f0 30 566.44 moveto
(ISO 8859. International Standard -- Information Processing -- 8-bit Single-Byte Coded Graphic) show
12 f0 30 553.06 moveto
(Character Sets -- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987. Part 2: Latin alphabet No. 2,) show
12 f0 30 539.68 moveto
(ISO 8859-2, 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4: Latin alphabet No. 4,) show
12 f0 30 526.3 moveto
(ISO 8859-4, 1988. Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: Latin/Arabic) show
12 f0 30 512.92 moveto
(alphabet, ISO 8859-6, 1987. Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. Part 8:) show
12 f0 30 499.54 moveto
(Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin alphabet No. 5, ISO 8859-9, 1990. ) show
12 f0 0 486.16 moveto
([SGML]) show
12 f0 41.32 486.16 moveto
( ) show
12 f0 30 472.78 moveto
(ISO 8879. Information Processing -- Text and Office Systems -- Standard Generalized Markup) show
12 f0 30 459.4 moveto
(Language \(SGML\), 1986. ) show
12 f0 0 446.02 moveto
([Nicol]) show
12 f0 34.65 446.02 moveto
( ) show
12 f0 30 432.64 moveto
(The Multilingual World Wide Web) show
12 f0 199.98 432.64 moveto
(, Gavin T. Nicol, Electronic Book Technologies, Japan) show
12 f4 30 419.26 moveto
(gtn@ebt.com) show
12 f0 109.26 419.26 moveto
( ) show
12 f0 0 405.88 moveto
([Lee]) show
12 f0 25.98 405.88 moveto
( ) show
12 f0 30 392.5 moveto
(Private communication with Liam Quin, from SoftQuad. ) show
12 f0 0 379.12 moveto
([Spivak]) show
12 f0 41.32 379.12 moveto
( ) show
12 f0 30 365.74 moveto
(Spivak, Michael. Calculus. 2nd Ed. 1967 ISBN 0-914098-77-2 ) show
12 f0 0 352.36 moveto
([GEB]) show
12 f0 31.99 352.36 moveto
( ) show
12 f0 30 338.98 moveto
(Hofstadter, Douglas R. G\366del, Escher, Bach: An Eternal Golden Braid, 1979 ISBN) show
12 f0 30 325.6 moveto
(0-394-75682-7 ) show
12 f0 0 312.22 moveto
([SET] ) show
12 f0 30 298.84 moveto
("Investigations in the foundations of set theory I", in Jean van Heijenoort \(ed.\) _From Frege to) show
12 f0 30 285.46 moveto
(Godel: A Source Book in Mathematical Logic, 1879-1931_ \(Harvard U.P., 1967\) ) show
pagelevel restore
showpage
%%EOF