RFC373 Arbitrary Character Sets

0373 Arbitrary Character Sets. J. McCarthy. July 1972. (Format: TXT=7783 bytes) (Status: UNKNOWN)

日本語訳
RFC一覧

参照

NWG/RFC #373                                       14 July 1972
NIC 11058                                          SU-AI


                        ARBITRARY CHARACTER SETS

                            by John McCarthy

It would be nice to be able to have documents stored in computers that
could include arbitrary characters and to be able to display them on
any CRT screen, edit them using any keyboard, and print them on any
printer.  The object of this memorandum is to suggest how to get there
from here with special reference to the ARPA network.

Where are we now?

   (1) At present, there is 96 character ASCII, and everyone agrees that
   it should be included in any larger set.

   (2) Many installations are dependent on 64 character sets which do not
   even include the lower case latin alphabet.

   (3) At the Stanford Artificial Intelligence Laboratory, we have a 114
   character set that includes 96 character ASCII and which is
   implemented in our keyboards, displays, and line printer

   (4) Printers are becoming available that get their character designs
   out of memory, for example, the Xerox XGP printer, one of which we are
   getting.

   (5) The IMLAC type display has the character designs in main memory so
   that changing the displayed set is just a matter of reloading the
   memory.

   (6) Many display systems share the character generator among many
   display units.  In some of these, e.g. the Datadisc, arbitrary sets
   are probably feasible (using kludgery to be described later), but in
   other systems, e.g. our III's arbitrary sets are not feasible.

One possible approach to communication in expanded character sets is
to produce an expanded standard set of characters, perhaps using 8 or
9 bits and expect new equipment to implement this set.  This approach
has the disadvantage that it will be very hard to get agreement on
what the next step should be, and even if formal agreement is
realized, many groups will find it in their interest to ignore the
standard.





                                                                [Page 1]

NWG/RFC# 373                        JMC 14-JUL-72 12:41  11058
ARBITRARY CHARACTER SETS  by John McCarthy

Therefore, I would like to suggest that the next step be to arbitrary
character sets.  I suggest implementing this in the following way:

   (1) There be established a registry of characters.  Anyone can
   register a new character.  Each character has a unique number, 17 bits
   should be enough even to include Chinese.  Besides this, each
   character has a name in ASCII usually mnemonic.  Finally, the
   character has a design which is a picture on a 50 by 50 dot matrix.

   (2) Besides the registry of characters, there is a registry of
   characters sets, which different groups are using for different
   classes of documents.  A registered character set has a registry
   number and a table giving the correspondence between the character
   codes as bit sequences and the registered character numbers.

   (3) Associated with a document is a statement of the character code
   used therein.  This may be one of the registered codes or it may
   contain in addition modifications described by an auxiliary table
   giving the code correspondence with registered character numbers.  A
   character code may have an escape character that says that the next
   character is described by its registry number.  The statement of the
   character code may be a header on the document or the receiver may
   have to learn it by some other means, e.g.  because its library
   catalog entry contains this information.

   (4) Devices such as printers and displays draw characters in different
   ways and standardization doesn't seem feasible at present. Therefore,
   it is necessary to provide a way of going from the standard
   description of a character using a 50 by 50 dot matrix to whatever
   method the device uses.  This is up to the programmers who are
   supporting the device.  Some may choose to manually create files
   describing how registered characters are implemented.  They may find
   it too much work to provide for all the characters and to update their
   files when new characters are registered.  Others will provide
   programs for going from the registered descriptions to descriptions
   compatible with their implementations.  Perhaps most will hand tailor
   the characters most used and provide a program for the others.











                                                                [Page 2]

NWG/RFC# 373                        JMC 14-JUL-72 12:41  11058
ARBITRARY CHARACTER SETS  by John McCarthy

   (5) The easiest device to handle is the line printer because it is
   slow.  At the beginning of the print job, the SPOOL program will look
   up the character set and load the printer's memory with the character
   designs used in the particular document.  Sometimes, it may have to go
   through the network to one of the computers that stores the registry
   in order to find out what to do.

   (6) Display systems that have a character memory for each display unit
   can be handled in about the same way.  Users will occasionally
   experience delays when the display programs are surprised by
   unfamiliar characters.

   (7) Display systems that share character memories require more
   complicated treatment.  The object is to keep the memory large enough
   to keep all the characters that the current set of users is using and
   to handle the required table lookups from the different character
   codes in a nice way.  There will be limitations on the diversity of
   character sets that can be in use simultaneously. Systems like the
   Datadisc that only look up the character when it is first written can
   be extended to work with large sets.  Systems that have to look up
   each character code 30 times per second in order to maintain the
   display won't work so well.

I have no special ideas about how to make keyboards adaptable to
arbitrary sets.  Each user may have to fend for himself.

In this memorandum so far, I have ignored typography, i.e. the fact
that in printed documents the same letter may be printed in many
fonts.  Perhaps, each character in each font will require a separate
registered description, but with a constant difference between the
numbers of the same character in different fonts.  Installations will
again have to decide what font distinctions they will implement.

Some other issues that might be considered are whether means can be
provided to adapt texts automatically to the line and page lengths of
the different devices.

It seems to me most likely that the typographical problems cannot be
solved at this time, and it would be best to adopt conventions for
registering character designs at this time, and leave typography for
later.







                                                                [Page 3]

NWG/RFC# 373                        JMC 14-JUL-72 12:41  11058
ARBITRARY CHARACTER SETS  by John McCarthy

In my opinion, there is no real obstacle to establishing the registry
in the ARPA network now, getting the standards organization to work,
and being able to exchange documents in extended character sets as
soon as the various installations can acquire the printers and display
devices.

It is the present policy of the Stanford Artificial Intelligence
Laboratory to acquire no more devices that are wedded to fixed
character sets.





       [ This RFC was put into machine readable form for entry ]
       [ into the online RFC archives by BBN Corp. under the   ]
       [ direction of Alex McKenzie.                      1/97 ]































                                                                [Page 4]

一覧

 RFC 1〜100  RFC 1401〜1500  RFC 2801〜2900  RFC 4201〜4300 
 RFC 101〜200  RFC 1501〜1600  RFC 2901〜3000  RFC 4301〜4400 
 RFC 201〜300  RFC 1601〜1700  RFC 3001〜3100  RFC 4401〜4500 
 RFC 301〜400  RFC 1701〜1800  RFC 3101〜3200  RFC 4501〜4600 
 RFC 401〜500  RFC 1801〜1900  RFC 3201〜3300  RFC 4601〜4700 
 RFC 501〜600  RFC 1901〜2000  RFC 3301〜3400  RFC 4701〜4800 
 RFC 601〜700  RFC 2001〜2100  RFC 3401〜3500  RFC 4801〜4900 
 RFC 701〜800  RFC 2101〜2200  RFC 3501〜3600  RFC 4901〜5000 
 RFC 801〜900  RFC 2201〜2300  RFC 3601〜3700  RFC 5001〜5100 
 RFC 901〜1000  RFC 2301〜2400  RFC 3701〜3800  RFC 5101〜5200 
 RFC 1001〜1100  RFC 2401〜2500  RFC 3801〜3900  RFC 5201〜5300 
 RFC 1101〜1200  RFC 2501〜2600  RFC 3901〜4000  RFC 5301〜5400 
 RFC 1201〜1300  RFC 2601〜2700  RFC 4001〜4100  RFC 5401〜5500 
 RFC 1301〜1400  RFC 2701〜2800  RFC 4101〜4200 

スポンサーリンク

Eclipseの.projectファイルやThumbs.dbをコミットしないようにする設定

ホームページ製作・web系アプリ系の製作案件募集中です。

上に戻る