RFC2482 日本語訳

2482 Language Tagging in Unicode Plain Text. K. Whistler, G. Adams. January 1999. (Format: TXT=27800 bytes) (Status: INFORMATIONAL)
プログラムでの自動翻訳です。
RFC一覧
 英語原文

Network Working Group                                       K. Whistler
Request for Comments: 2482                                       Sybase
Category: Informational                                        G. Adams
                                                               Spyglass
                                                           January 1999

Network Working Group K. Whistler Request for Comments: 2482 Sybase Category: Informational G. Adams Spyglass January 1999

                 Language Tagging in Unicode Plain Text

Language Tagging in Unicode Plain Text

Status of this Memo

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1999).  All Rights Reserved.

IESG Note:

IESG Note:

   This document has been accepted by ISO/IEC JTC1/SC2/WG2 in meeting
   #34 to be submitted as a recommendation from WG2 for inclusion in
   Plane 14 in part 2 of ISO/IEC 10646.

This document has been accepted by ISO/IEC JTC1/SC2/WG2 in meeting #34 to be submitted as a recommendation from WG2 for inclusion in Plane 14 in part 2 of ISO/IEC 10646.

1.  Abstract

1. Abstract

   This document proposed a mechanism for language tagging in [UNICODE]
   plain text. A set of special-use tag characters on Plane 14 of
   [ISO10646] (accessible through UTF-8, UTF-16, and UCS-4 encoding
   forms) are proposed for encoding to enable the spelling out of
   ASCII-based string tags using characters which can be strictly
   separated from ordinary text content characters in ISO10646 (or
   UNICODE).

This document proposed a mechanism for language tagging in [UNICODE] plain text. A set of special-use tag characters on Plane 14 of [ISO10646] (accessible through UTF-8, UTF-16, and UCS-4 encoding forms) are proposed for encoding to enable the spelling out of ASCII-based string tags using characters which can be strictly separated from ordinary text content characters in ISO10646 (or UNICODE).

   One tag identification character and one cancel tag character are
   also proposed. In particular, a language tag identification character
   is proposed to identify a language tag string specifically; the
   language tag itself makes use of [RFC1766] language tag strings
   spelled out using the Plane 14 tag characters. Provision of a
   specific, low-overhead mechanism for embedding language tags in plain
   text is aimed at meeting the need of Internet Protocols such as ACAP,
   which require a standard mechanism for marking language in UTF-8
   strings.

One tag identification character and one cancel tag character are also proposed. In particular, a language tag identification character is proposed to identify a language tag string specifically; the language tag itself makes use of [RFC1766] language tag strings spelled out using the Plane 14 tag characters. Provision of a specific, low-overhead mechanism for embedding language tags in plain text is aimed at meeting the need of Internet Protocols such as ACAP, which require a standard mechanism for marking language in UTF-8 strings.

   The tagging mechanism as well the characters proposed in this
   document have been approved by the Unicode Consortium for inclusion
   in The Unicode Standard.  However, implementation of this decision

The tagging mechanism as well the characters proposed in this document have been approved by the Unicode Consortium for inclusion in The Unicode Standard. However, implementation of this decision

Whistler & Adams             Informational                      [Page 1]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

Whistler & Adams Informational [Page 1] RFC 2482 Language Tagging in Unicode Plain Text January 1999

   awaits formal acceptance by ISO JTC1/SC2/WG2, the working group
   responsible for ISO10646. Potential implementers should be aware that
   until this formal acceptance occurs, any usage of the characters
   proposed herein is strictly experimental and not sanctioned for
   standardized character data interchange.

awaits formal acceptance by ISO JTC1/SC2/WG2, the working group responsible for ISO10646. Potential implementers should be aware that until this formal acceptance occurs, any usage of the characters proposed herein is strictly experimental and not sanctioned for standardized character data interchange.

2.  Definitions and Notation

2. Definitions and Notation

   No attempt is made to define all terms used in this document. In
   particular, the terminology pertaining to the subject of coded
   character systems is not explicitly specified. See [UNICODE],
   [ISO10646], and [RFC2130] for additional definitions in this area.

No attempt is made to define all terms used in this document. In particular, the terminology pertaining to the subject of coded character systems is not explicitly specified. See [UNICODE], [ISO10646], and [RFC2130] for additional definitions in this area.

2.1 Requirements Notation

2.1 Requirements Notation

   This document occasionally uses terms that appear in capital letters.
   When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
   appear capitalized, they are being used to indicate particular
   requirements of this specification. A discussion of the meanings of
   these terms appears in [RFC2119].

This document occasionally uses terms that appear in capital letters. When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY" appear capitalized, they are being used to indicate particular requirements of this specification. A discussion of the meanings of these terms appears in [RFC2119].

2.2 Definitions

2.2 Definitions

   The terms defined below are used in special senses and thus warrant
   some clarification.

The terms defined below are used in special senses and thus warrant some clarification.

2.2.1 Tagging

2.2.1 Tagging

   The association of attributes of text with a point or range of the
   primary text. (The value of a particular tag is not generally
   considered to be a part of the "content" of the text. Typical
   examples of tagging is to mark language or font of a portion of
   text.)

The association of attributes of text with a point or range of the primary text. (The value of a particular tag is not generally considered to be a part of the "content" of the text. Typical examples of tagging is to mark language or font of a portion of text.)

2.2.2 Annotation

2.2.2 Annotation

   The association of secondary textual content with a point or range of
   the primary text. (The value of a particular annotation *is*
   considered to be a part of the "content" of the text. Typical
   examples include glossing, citations, exemplication, Japanese yomi,
   etc.)

The association of secondary textual content with a point or range of the primary text. (The value of a particular annotation *is* considered to be a part of the "content" of the text. Typical examples include glossing, citations, exemplication, Japanese yomi, etc.)

2.2.3 Out-of-band

2.2.3 Out-of-band

   An out-of-band channel conveys a tag in such a way that the textual
   content, as encoded, is completely untouched and unmodified. This is
   typically done by metadata or hyperstructure of some sort.

An out-of-band channel conveys a tag in such a way that the textual content, as encoded, is completely untouched and unmodified. This is typically done by metadata or hyperstructure of some sort.

Whistler & Adams             Informational                      [Page 2]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

Whistler & Adams Informational [Page 2] RFC 2482 Language Tagging in Unicode Plain Text January 1999

2.2.4 In-band

2.2.4 In-band

   An in-band channel conveys a tag along with the textual content,
   using the same basic encoding mechanism as the text itself. This is
   done by various means, but an obvious example is SGML markup, where
   the tags are encoded in the same character set as the text and are
   interspersed with and carried along with the text data.

An in-band channel conveys a tag along with the textual content, using the same basic encoding mechanism as the text itself. This is done by various means, but an obvious example is SGML markup, where the tags are encoded in the same character set as the text and are interspersed with and carried along with the text data.

3.0 Background

3.0 Background

   There has been much discussion over the last 8 years of language
   tagging and of other kinds of tagging of Unicode plain text. It is
   fair to say that there is more-or-less universal agreement that
   language tagging of Unicode plain text is required for certain
   textual processes. For example, language "hinting" of multilingual
   text is necessary for multilingual spell-checking based on multiple
   dictionaries to work well.  Language tagging provides a minimum level
   of required information for text-to-speech processes to work
   correctly.  Language tagging is regularly done on web pages, to
   enable selection of alternate content, for example.

There has been much discussion over the last 8 years of language tagging and of other kinds of tagging of Unicode plain text. It is fair to say that there is more-or-less universal agreement that language tagging of Unicode plain text is required for certain textual processes. For example, language "hinting" of multilingual text is necessary for multilingual spell-checking based on multiple dictionaries to work well. Language tagging provides a minimum level of required information for text-to-speech processes to work correctly. Language tagging is regularly done on web pages, to enable selection of alternate content, for example.

   However, there has been a great deal of controversy regarding the
   appropriate placement of language tags. Some have held that the only
   appropriate placement of language tags (or other kinds of tags) is
   out-of-band, making use of attributed text structures or metadata.
   Others have argued that there are requirements for lower-complexity
   in-band mechanisms for language tags (or other tags) in plain text.

However, there has been a great deal of controversy regarding the appropriate placement of language tags. Some have held that the only appropriate placement of language tags (or other kinds of tags) is out-of-band, making use of attributed text structures or metadata. Others have argued that there are requirements for lower-complexity in-band mechanisms for language tags (or other tags) in plain text.

   The controversy has been muddied by the existence and widespread use
   of a number of in-band text markup mechanisms (HTML, text/enriched,
   etc.) which enable language tagging, but which imply the use of
   general parsing mechanisms which are deemed too "heavyweight" for
   protocol developers and a number of other applications. The
   difficulty of using general in-band text markup for simple protocols
   derives from the fact that some characters are used both for textual
   content and for the text markup; this makes it more difficult to
   write simple, fast algorithms to find only the textual content and
   ignore the tags, or vice versa. (Think of this as the algorithmic
   equivalent of the difficulty the human reader has attempting to read
   just the content of raw HTML source text without a browser
   interpreting all the markup tags.)

The controversy has been muddied by the existence and widespread use of a number of in-band text markup mechanisms (HTML, text/enriched, etc.) which enable language tagging, but which imply the use of general parsing mechanisms which are deemed too "heavyweight" for protocol developers and a number of other applications. The difficulty of using general in-band text markup for simple protocols derives from the fact that some characters are used both for textual content and for the text markup; this makes it more difficult to write simple, fast algorithms to find only the textual content and ignore the tags, or vice versa. (Think of this as the algorithmic equivalent of the difficulty the human reader has attempting to read just the content of raw HTML source text without a browser interpreting all the markup tags.)

   The Plane 14 proposal addresses the recurrent and persistent call for
   a lighter-weight mechanism for text tagging than typical text markup
   mechanisms in Unicode. It proposes a special set of characters used
   *only* for tagging. These tag characters can be embedded into plain

The Plane 14 proposal addresses the recurrent and persistent call for a lighter-weight mechanism for text tagging than typical text markup mechanisms in Unicode. It proposes a special set of characters used *only* for tagging. These tag characters can be embedded into plain

Whistler & Adams             Informational                      [Page 3]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

Whistler & Adams Informational [Page 3] RFC 2482 Language Tagging in Unicode Plain Text January 1999

   text and can be identified and/or ignored with trivial algorithms,
   since there is no overloading of usage for these tag characters--they
   can only express tag values and never textual content itself.

text and can be identified and/or ignored with trivial algorithms, since there is no overloading of usage for these tag characters--they can only express tag values and never textual content itself.

   The Plane 14 proposal is not intended for general annotation of text,
   such as textual citations, phonetic readings (e.g.  Japanese Yomi),
   etc. In its present form, its use is intended to be restriced solely
   to specifying in-line language tags.  Future extensions may widen
   this scope of intended usage.

The Plane 14 proposal is not intended for general annotation of text, such as textual citations, phonetic readings (e.g. Japanese Yomi), etc. In its present form, its use is intended to be restriced solely to specifying in-line language tags. Future extensions may widen this scope of intended usage.

4.0 Proposal

4.0 Proposal

   This proposal suggests the use of 97 dedicated tag characters encoded
   at the start of Plane 14 of ISO/IEC 10646 consisting of a clone of
   the 94 printable 7-bit ASCII graphic characters and ASCII SPACE, as
   well as a tag identification character and a tag cancel character.

This proposal suggests the use of 97 dedicated tag characters encoded at the start of Plane 14 of ISO/IEC 10646 consisting of a clone of the 94 printable 7-bit ASCII graphic characters and ASCII SPACE, as well as a tag identification character and a tag cancel character.

   These tag characters are to be used to spell out any ASCII-based
   tagging scheme which needs to be embedded in Unicode plain text. In
   particular, they can be used to spell out language tags in order to
   meet the expressed requirements of the ACAP protocol and the likely
   requirements of other new protocols following the guidelines of the
   IAB character workshop (RFC 2130).

These tag characters are to be used to spell out any ASCII-based tagging scheme which needs to be embedded in Unicode plain text. In particular, they can be used to spell out language tags in order to meet the expressed requirements of the ACAP protocol and the likely requirements of other new protocols following the guidelines of the IAB character workshop (RFC 2130).

   The suggested range in Plane 14 for the block reserved for tag
   characters is as follows, expressed in each of the three most
   generally used encoding schemes for ISO/IEC 10646:

The suggested range in Plane 14 for the block reserved for tag characters is as follows, expressed in each of the three most generally used encoding schemes for ISO/IEC 10646:

   UCS-4

UCS-4

   U-000E0000 .. U-000E007F

U-000E0000 .. U-000E007F

   UTF-16

UTF-16

   U+DB40 U+DC00 .. U+DB40 U+DC7F

U+DB40 U+DC00 .. U+DB40 U+DC7F

   UTF-8

UTF-8

   0xF3 0xA0 0x80 0x80 .. 0xF3 0xA0 0x81 0xBF

0xF3 0xA0 0x80 0x80 .. 0xF3 0xA0 0x81 0xBF

   Of this range, U-000E0020 .. U-000E007E is the suggested range for
   the ASCII clone tag characters themselves.

Of this range, U-000E0020 .. U-000E007E is the suggested range for the ASCII clone tag characters themselves.

4.1 Names for the Tag Characters

4.1 Names for the Tag Characters

   The names for the ASCII clone tag characters should be exactly the
   ISO 10646 names for 7-bit ASCII, prefixed with the word "TAG".

The names for the ASCII clone tag characters should be exactly the ISO 10646 names for 7-bit ASCII, prefixed with the word "TAG".

Whistler & Adams             Informational                      [Page 4]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

Whistler & Adams Informational [Page 4] RFC 2482 Language Tagging in Unicode Plain Text January 1999

   In addition, there is one tag identification character and a CANCEL
   TAG character. The use and syntax of these characters is described in
   detail below.

In addition, there is one tag identification character and a CANCEL TAG character. The use and syntax of these characters is described in detail below.

   The entire encoding for the proposed Plane 14 tag characters and
   names of those characters can be derived from the following list.
   (The encoded values here and throughout this proposal are listed in
   UCS-4 form, which is easiest to interpret. It is assumed that most
   Unicode applications will, however, be making use either of UTF-16 or
   UTF-8 encoding forms for actual implementation.)

The entire encoding for the proposed Plane 14 tag characters and names of those characters can be derived from the following list. (The encoded values here and throughout this proposal are listed in UCS-4 form, which is easiest to interpret. It is assumed that most Unicode applications will, however, be making use either of UTF-16 or UTF-8 encoding forms for actual implementation.)

   U-000E0000  <reserved>
   U-000E0001  LANGUAGE TAG
   U-000E0002  <reserved>
   U-000E001F  <reserved>
   U-000E0020  TAG SPACE
   U-000E0021  TAG EXCLAMATION MARK
   U-000E0041  TAG LATIN CAPITAL LETTER A
   U-000E007A  TAG LATIN SMALL LETTER Z
   U-000E007E  TAG TILDE
   U-000E007F  CANCEL TAG

U-000E0000 <reserved> U-000E0001 LANGUAGE TAG U-000E0002 <reserved> U-000E001F <reserved> U-000E0020 TAG SPACE U-000E0021 TAG EXCLAMATION MARK U-000E0041 TAG LATIN CAPITAL LETTER A U-000E007A TAG LATIN SMALL LETTER Z U-000E007E TAG TILDE U-000E007F CANCEL TAG

4.2 Range Checking for Tag Characters

4.2 Range Checking for Tag Characters

   The range checks required for code testing for tag characters would
   be as follows. The same range check is expressed here in C for each
   of the three significant encoding forms for 10646.

The range checks required for code testing for tag characters would be as follows. The same range check is expressed here in C for each of the three significant encoding forms for 10646.

Range check expressed in UCS-4:

Range check expressed in UCS-4:

if ( ( *s >= 0xE0000 ) || ( *s <= 0xE007F ) )

if ( ( *s >= 0xE0000 ) || ( *s <= 0xE007F ) )

Range check expressed in UTF-16 (Unicode):

Range check expressed in UTF-16 (Unicode):

if ( ( *s == 0xDB40 ) && ( *(s+1) >= 0xDC00 ) && ( *(s+1) <= 0xDC7F ) )

if ( ( *s == 0xDB40 ) && ( *(s+1) >= 0xDC00 ) && ( *(s+1) <= 0xDC7F ) )

Expressed in UTF-8:

Expressed in UTF-8:

if ( ( *s == 0xF3 ) && ( *(s+1) == 0xA0 ) && ( *(s+2) & 0xE0 == 0x80 )

if ( ( *s == 0xF3 ) && ( *(s+1) == 0xA0 ) && ( *(s+2) & 0xE0 == 0x80 )

   Because of the choice of the range for the tag characters, it would
   also be possible to express the range check for UCS-4 or UTF-16 in
   terms of bitmask operations, as well.

Because of the choice of the range for the tag characters, it would also be possible to express the range check for UCS-4 or UTF-16 in terms of bitmask operations, as well.

Whistler & Adams             Informational                      [Page 5]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

Whistler & Adams Informational [Page 5] RFC 2482 Language Tagging in Unicode Plain Text January 1999

4.3 Syntax for Embedding Tags

4.3 Syntax for Embedding Tags

   The use of the Plane 14 tag characters is very simple. In order to
   embed any ASCII-derived tag in Unicode plain text, the tag is simply
   spelled out with the tag characters instead, prefixed with the
   relevant tag identification character. The resultant string is
   embedded directly in the text.

The use of the Plane 14 tag characters is very simple. In order to embed any ASCII-derived tag in Unicode plain text, the tag is simply spelled out with the tag characters instead, prefixed with the relevant tag identification character. The resultant string is embedded directly in the text.

   The tag identification character is used as a mechanism for
   identifying tags of different types. This enables multiple types of
   tags to coexist amicably embedded in plain text and solves the
   problem of delimitation if a tag is concatenated directly onto
   another tag. Although only one type of tag is currently specified,
   namely the language tag, the encoding of other tag identification
   characters in the future would allow for distinct tag types to be
   used.

The tag identification character is used as a mechanism for identifying tags of different types. This enables multiple types of tags to coexist amicably embedded in plain text and solves the problem of delimitation if a tag is concatenated directly onto another tag. Although only one type of tag is currently specified, namely the language tag, the encoding of other tag identification characters in the future would allow for distinct tag types to be used.

   No termination character is required for a tag. A tag terminates
   either when the first non Plane 14 Tag Character (i.e. any other
   normal Unicode value) is encountered, or when the next tag
   identification character is encountered.

No termination character is required for a tag. A tag terminates either when the first non Plane 14 Tag Character (i.e. any other normal Unicode value) is encountered, or when the next tag identification character is encountered.

   All tag arguments must be encoded only with the tag characters U-
   000E0020 .. U-000E007E. No other characters are valid for expressing
   the tag argument.

All tag arguments must be encoded only with the tag characters U- 000E0020 .. U-000E007E. No other characters are valid for expressing the tag argument.

   A detailed BNF syntax for tags is listed below.

A detailed BNF syntax for tags is listed below.

4.4   Tag Scope and Nesting

4.4 Tag Scope and Nesting

   The value of an established tag continues from the point the tag is
   embedded in text until either:

The value of an established tag continues from the point the tag is embedded in text until either:

      A. The text itself goes out of scope, as defined by the
         application. (E.g. for line-oriented protocols, when reaching
         the end-of-line or end-of-string; for text streams, when
         reaching the end-of-stream; etc.)

A. The text itself goes out of scope, as defined by the application. (E.g. for line-oriented protocols, when reaching the end-of-line or end-of-string; for text streams, when reaching the end-of-stream; etc.)

or

      B. The tag is explicitly cancelled by the CANCEL TAG character.

B. The tag is explicitly cancelled by the CANCEL TAG character.

   Tags of the same type cannot be nested in any way. The appearance of
   a new embedded language tag, for example, after text which was
   already language tagged, simply changes the tagged value for
   subsequent text to that specified in the new tag.

Tags of the same type cannot be nested in any way. The appearance of a new embedded language tag, for example, after text which was already language tagged, simply changes the tagged value for subsequent text to that specified in the new tag.

Whistler & Adams             Informational                      [Page 6]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

Whistler & Adams Informational [Page 6] RFC 2482 Language Tagging in Unicode Plain Text January 1999

   Tags of different type can have interdigitating scope, but not
   hierarchical scope. In effect, tags of different type completely
   ignore each other, so that the use of language tags can be completely
   asynchronous with the use of character set source tags (or any other
   tag type) in the same text in the future.

Tags of different type can have interdigitating scope, but not hierarchical scope. In effect, tags of different type completely ignore each other, so that the use of language tags can be completely asynchronous with the use of character set source tags (or any other tag type) in the same text in the future.

4.5 Cancelling Tag Values

4.5 Cancelling Tag Values

   U-000E007F CANCEL TAG is provided to allow the specific cancelling of
   a tag value. The use of CANCEL TAG has the following syntax.  To
   cancel a tag value of a particular type, prefix the CANCEL TAG
   character with the tag identification character of the appropriate
   type. For example, the complete string to cancel a language tag is:

U-000E007F CANCEL TAG is provided to allow the specific cancelling of a tag value. The use of CANCEL TAG has the following syntax. To cancel a tag value of a particular type, prefix the CANCEL TAG character with the tag identification character of the appropriate type. For example, the complete string to cancel a language tag is:

   U-000E0001 U-000E007F

U-000E0001 U-000E007F

   The value of the relevant tag type returns to the default state for
   that tag type, namely: no tag value specified, the same as untagged
   text.

The value of the relevant tag type returns to the default state for that tag type, namely: no tag value specified, the same as untagged text.

   The use of CANCEL TAG without a prefixed tag identification character
   cancels *any* Plane 14 tag values which may be defined. Since only
   language tags are currently provided with an explicit tag
   identification character, only language tags are currently affected.

The use of CANCEL TAG without a prefixed tag identification character cancels *any* Plane 14 tag values which may be defined. Since only language tags are currently provided with an explicit tag identification character, only language tags are currently affected.

   The main function of CANCEL TAG is to make possible such operations
   as blind concatenation of strings in a tagged context without the
   propagation of inappropriate tag values across the string boundaries.
   For example, a string tagged with a Japanese language tag can have
   its tag value "sealed off" with a terminating CANCEL TAG before
   another string of unknown language value is concatenated to it. This
   would prevent the string of unknown language from being erroneously
   marked as being Japanese simply because of a concatenation to a
   Japanese string.

The main function of CANCEL TAG is to make possible such operations as blind concatenation of strings in a tagged context without the propagation of inappropriate tag values across the string boundaries. For example, a string tagged with a Japanese language tag can have its tag value "sealed off" with a terminating CANCEL TAG before another string of unknown language value is concatenated to it. This would prevent the string of unknown language from being erroneously marked as being Japanese simply because of a concatenation to a Japanese string.

4.6 Tag Syntax Description

4.6 Tag Syntax Description

   An extended BNF (Backus-Naur Form) description of the tags specified
   in this proposal is found below.  Note the following BNF extensions
   used in this formalism:

この提案で指定されたタグの拡張BNF(BN記法)記述は以下で見つけられます。この形式に使用される以下のBNF拡張子に注意してください:

   1. Semantic constraints are specified by rules in the form of an
      assertion specified between double braces; the variable $$ denotes
      the string consisting of all terminal symbols matched by the this
      non-terminal.

1. 意味規制は二重支柱の間で指定された主張の形の規則で指定されます。可変$$はこれほど非端末によって合わせられたすべての終端記号から成るストリングを指示します。

      Example:   {{ Assert ( $$[0] == '?' ); }}

例: Assert ( $$[0] == '?' );

Whistler & Adams             Informational                      [Page 7]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

ユニコードで1999年1月にプレーンテキストにタグ付けをする口笛を吹く人とアダムス情報[7ページ]のRFC2482Language

      Meaning:   The first character of the string matched by this
                 non-terminal must be '?'

意味: この非端末によって合わせられたストリングの最初のキャラクタは'?'であるに違いありません。

   2. A number of predicate functions are employed in semantic
      constraint rules which are not otherwise defined; their name is
      sufficient for determining their predication.

2. 多くの述部機能が別の方法で定義されない意味規制規則で使われます。それらの名前は彼らの述語的叙述を決定するのに十分です。

      Example:   IsRFC1766LanguageIdentifier ( tag-argument )

例: IsRFC1766LanguageIdentifier(タグ議論)

      Meaning:   tag-argument is a valid RFC1766 language identifier

意味: タグ議論は有効なRFC1766言語識別子です。

   3. A lexical expander function, TAG, is employed to denote the tag
      form of an ASCII character; the argument to this function is
      either a character or a character set specified by a range or
      enumeration expression.

3. 語彙エキスパンダ機能(TAG)はASCII文字のタグフォームを指示するのに使われます。この機能への議論は、範囲か列挙式によって指定されたキャラクタか文字集合のどちらかです。

      Example:   TAG('-')

例: タグ('-')

      Meaning:   TAG HYPHEN-MINUS

意味: タグハイフンマイナス

      Example:   TAG([A-Z])

例: タグ([A-Z])

      Meaning:   TAG LATIN CAPITAL LETTER A ...
                 TAG LATIN CAPITAL LETTER Z

意味: ラテン語の大文字Aにタグ付けをしてください… タグラテン語大文字Z

   4. A macro is employed to denote terminal symbols that are character
      literals which can't be directly represented in ASCII. The
      argument to the macro is the UNICODE (ISO/IEC 10646) character
      name.

4. マクロは、ASCIIで直接表すことができないキャラクタリテラルである終端記号を指示するのに使われます。マクロへの議論はユニコード(ISO/IEC10646)キャラクタ名です。

      Example:   '${TAG CANCEL}'

例: '$タグキャンセル'

      Meaning:   character literal whose code value is U-000E007F

意味: コード値がU-000 007E Fであるキャラクタリテラル

   5. Occurrence indicators used are '+' (one or more) and '*' (zero or
      more); optional occurrence is indicated by enclosure in '[' and
      ']'.

5. 使用される発生インディケータは、'+'(1以上)と'*'(ゼロか以上)です。任意の発生が中に包囲によって示される、'、['']'。

4.6.1 Formal Tag Syntax

4.6.1 正式なタグ構文

tag                     :   language-tag
                        |   cancel-all-tag
                        ;

以下にタグ付けをしてください。言語タグ| キャンセルオールタグ。

language-tag            :   language-tag-introducer language-tag-argument
                        ;

言語タグ: 言語タグ誘導子言語タグ議論。

Whistler & Adams             Informational                      [Page 8]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

ユニコードで1999年1月にプレーンテキストにタグ付けをする口笛を吹く人とアダムス情報[8ページ]のRFC2482Language

language-tag-argument   :   tag-argument
              {{ Assert ( IsRFC1766LanguageIdentifier ( $$ ); }}
                        |   tag-cancel
                        ;

言語タグ議論: タグ議論 Assert ( IsRFC1766LanguageIdentifier ( $$ ); | タグキャンセル。

cancel-all-tag          :   tag-cancel
                        ;

キャンセルオールタグ: タグキャンセル。

tag-argument            :   tag-character+
                        ;

タグ議論: タグキャラクタ+。

tag-character           :   { c : c in
              TAG( { a : a in printable ASCII characters or SPACE } ) }
                        ;

タグキャラクタ: c: TAG(印刷可能なASCII文字かSPACEの1:1)のc。

language-tag-introducer :   '${TAG LANGUAGE}'
                        ;

言語タグ誘導子: '$タグ言語'。

tag-cancel              :   '${TAG CANCEL}'
                        ;

タグキャンセル: '$タグキャンセル'。

5.0 Tag Types

5.0 タグタイプ

5.1 Language Tags

5.1 言語タグ

   Language tags are of general interest and should have a high degree
   of interoperability for protocol usage. To this end, a specific
   LANGUAGE TAG tag identification character is provided.  A Plane 14
   tag string prefixed by U-000E0001 LANGUAGE TAG is specified to
   constitute a language tag. Furthermore, the tag values for the
   language tag are to be spelled out as specified in RFC 1766, making
   use only of registered tag values or of user-defined language tags
   starting with the characters "x-".

言語タグには、一般的興味があって、プロトコル用法のための高度合いの相互運用性があるはずです。このために、特定のLANGUAGE TAGタグ識別キャラクタを提供します。 U-000の0001EのLANGUAGE TAGによって前に置かれたPlane14タグストリングは、言語タグを構成するために指定されます。その上、言語タグのためのタグ値はRFC1766で指定されるように詳しく説明されることです、キャラクタ「x」から始めて、登録されたタグ値だけかユーザによって定義された言語の使用をタグにして。

   For example, to embed a language tag for Japanese, the Plane 14
   characters would be used as follows. The Japanese tag from RFC 1766
   is "ja" (composed of ISO 639 language id) or, alternatively, "ja-JP"
   (composed of ISO 639 language id plus ISO 3166 country id).  Since
   RFC 1766 specifies that language tags are not case significant, it is
   recommended that for language tags, the entire tag be lowercased
   before conversion to Plane 14 tag characters. (This would not be
   required for Unicode conformance, but should be followed as general
   practice by protocols making use of RFC 1766 language tags, to
   simplify and speed up the processing for operations which need to
   identify or ignore language tags embedded in text.) Lowercasing,

例えば、日本語のために言語タグを埋め込むために、Plane14キャラクタは以下の通り使用されるでしょう。 RFC1766からの日本のタグは、"ja"(ISO639言語イドで構成される)かあるいはまた、「ja-JP」(ISO639言語イドと3166年のISO国のイドで構成される)です。 RFC1766が、言語タグがケース重要でないと指定するので、それは言語タグのためにそれに推薦されます、全体のタグ。変換の前にPlane14タグキャラクタに小文字で印刷されてください。 (テキストに埋め込まれた言語タグを特定するか、または無視する必要がある操作のための処理を簡素化して、早くするためにRFCの使用を1766個の言語タグにするプロトコルが一般診療としてあとに続くべきである以外に、これはユニコード順応に必要でないでしょう。) 小文字

Whistler & Adams             Informational                      [Page 9]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

ユニコードで1999年1月にプレーンテキストにタグ付けをする口笛を吹く人とアダムス情報[9ページ]のRFC2482Language

   rather than uppercasing, is recommended because it follows the
   majority practice of expressing language tag values in lowercase
   letters.

むしろ、小文字で言語タグ値を言い表す大多数習慣に続くので、大文字するのはお勧めです。

   Thus the entire language tag (in its longer form) would be converted
   to Plane 14 tag characters as follows:

したがって、全体の言語タグ(より長いフォームでの)は以下のPlane14タグキャラクタに変換されるでしょう:

   U-000E0001 U-000E006A U-000E0061 U-000E002D U-000E006A U-000E0070

U-000 0001EのU-000 006A U-000Eの0061EのU-000Eの002D U-000Eの006A U-000E0070

   The language tag (in its shorter, "ja" form) could be expressed as
   follows:

以下の通り、言語タグ(より短い"ja"フォームでの)を急送できました:

   U-000E0001 U-000E006A U-000E0061

U-000 0001EのU-000Eの006A U-000E0061

   The value of this string is then expressed in whichever encoding form
   (UCS-4, UTF-16, UTF-8) is required and embedded in text at the
   relevant point.

そして、このストリングの値は関連ポイントのテキストに必要であり、埋め込まれているコード化フォーム(UCS-4、UTF-16、UTF-8)で言い表されます。

5.2 Additional Tags

5.2 追加タグ

   Additional tag identification characters might be defined in the
   future. An example would be a CHARACTER SET SOURCE TAG, or a GENERIC
   TAG for private definition of tags.

追加タグ識別キャラクタは将来、定義されるかもしれません。例は、CHARACTER SET SOURCE TAG、またはタグの個人的な定義のためのGENERIC TAGでしょう。

   In each case, when a specific tag identification character is
   encoded, a corresponding reference standard for the values of the
   tags associated with the identifier should be designated, so that
   interoperating parties which make use of the tags will know how to
   interpret the values the tags may take.

特定のタグ識別キャラクタがコード化されるとき、その都度、識別子に関連しているタグの値のための対応する基準ゲージは指定されるべきです、タグを利用するパーティーを共同利用するとタグが取るかもしれない値を解釈する方法が知るように。

6.0 Display Issues

6.0 ディスプレイ問題

   All characters in the tag character block are considered to have no
   visible rendering in normal text. A process which interprets tags may
   choose to modify the rendering of text based on the tag values (as
   for example, changing font to preferred style for rendering Chinese
   versus Japanese). The tag characters themselves have no display; they
   may be considered similar to a U+200B ZERO WIDTH SPACE in that
   regard. The tag characters also do not affect breaking, joining, or
   any other format or layout properties, except insofar as the process
   interpreting the tag chooses to impose such behavior based on the tag
   value.

タグキャラクタブロックのすべてのキャラクタには正常なテキストにおけるどんな目に見えるレンダリングもないと考えられます。タグの機械言語に翻訳処理をするプロセスは、タグ値(例のようにフォントを日本語に対して中国語を訳すための都合のよいスタイルに変える)に基づくテキストのレンダリングを変更するのを選ぶかもしれません。タグキャラクタ自体には、ディスプレイが全くありません。それらはその点においてU+200B ZERO WIDTH SPACEと同様であると考えられるかもしれません。タグキャラクタも壊すこと、接合、いかなる他の形式またはレイアウトの特性にも影響しません、タグの機械言語に翻訳処理をするプロセスが、タグ値に基づくそのような振舞いを課すのを選ぶ限り。

   For debugging or other operations which must render the tags
   themselves visible, it is advisable that the tag characters be
   rendered using the corresponding ASCII character glyphs (perhaps
   modified systematically to differentiate them from normal ASCII

タグ自体を目に見えるようにしなければならないデバッグか他の操作に、タグキャラクタが対応するASCII文字glyphsを使用することでレンダリングされるのが、賢明である、(正常なASCIIと彼らを区別するように恐らく系統的に変更されます。

Whistler & Adams             Informational                     [Page 10]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

ユニコードで1999年1月にプレーンテキストにタグ付けをする口笛を吹く人とアダムス情報[10ページ]のRFC2482Language

   characters). But, as noted below, the tag character values are chosen
   so that even without display support, the tag characters will be
   interpretable in most debuggers.

キャラクタ) しかし、以下に述べられるように、タグ文字値はタグキャラクタがディスプレイサポートがなくてもほとんどのデバッガで解明できるように、選ばれています。

7.0 Unicode Conformance Issues

7.0 ユニコード順応問題

   The basic rules for Unicode conformance for the tag characters are
   exactly the same as for any other Unicode characters. A conformant
   process is not required to interpret the tag characters. If it does
   not interpret tag characters, it should leave their values
   undisturbed and do whatever it does with any other uninterpreted
   characters. If it does interpret them, it should interpret them
   according to the standard, i.e. as spelled-out tags.

タグキャラクタのためのユニコード順応のための基本的なルールはまさにいかなる他のユニコード文字のようにも同じです。 conformantプロセスは、タグキャラクタを解釈するのに必要ではありません。タグキャラクタを解釈しないなら、それは、彼らの値を乱されていない状態でおいて、いかなる他の非解釈されたキャラクタと共にもすることなら何でもするべきです。それらを解釈するなら、すなわち、詳しく説明しているタグとしての規格に従って、それはそれらを解釈するべきです。

   So for a non-TagAware Unicode application, any language tag
   characters (or any other kind of tag expressed with Plane 14 tag
   characters) encountered would be handled exactly as for uninterpreted
   Tibetan from the BMP, uninterpreted Linear B from Plane 1, or
   uninterpreted Egyptian hieroglyphics from private use space in Plane
   15.

それで、非TagAwareユニコードアプリケーションのために、キャラクタ(いかなる他の種類のタグもPlane14タグでキャラクタを示した)が遭遇したどんな言語タグも、ちょうど非解釈されたチベット語のようにPlane1からのBMP、uninterpreted Linear Bから扱われただろうか、またはPlane15の私用スペースからエジプトの象形文字を非解釈しました。

   A TagAware but TagPhobic Unicode application can recognize the tag
   character range in Plane 14 and choose to deliberately strip them out
   completely to produce plain text with no tags.

TagAwareにもかかわらず、TagPhobicユニコードアプリケーションは、Plane14のタグキャラクタ範囲を認めて、タグなしでプレーンテキストを製作するために故意にそれらを完全に取り除くのを選ぶことができます。

   The presence of a correctly formed tag cannot be taken as a guarantee
   that the data so tagged is correctly tagged. For example, nothing
   prevents an application from erroneously labelling French data as
   Spanish, or from labelling JIS-derived data as Japanese, even if it
   contains Greek or Cyrillic characters.

したがってデータがタグ付けをした保証が正しくタグ付けをされるとき、正しく形成されたタグの存在を取ることができません。例えば、何も誤ってフランスのデータをスペインであるとラベルするか、またはJISによって派生させられたデータを日本であるとラベルすることからのアプリケーションを防ぎません、ギリシアの、または、キリル文字のキャラクタを含んでも。

7.1 Note on Encoding Language Tags

7.1 言語タグをコード化することに関する注

   The fact that this proposal for encoding tag characters in Unicode
   includes a mechanism for specifying language tag values does not mean
   that Unicode is departing from one of its basic encoding principles:

ユニコードでタグキャラクタをコード化するためのこの提案が言語タグ値を指定するためのメカニズムを含んでいるという事実は、ユニコードが基本的なコード化原則の1つから出発していることを意味しません:

       Unicode encodes scripts, not languages.

ユニコードは言語ではなく、スクリプトをコード化します。

   This is still true of the Unicode encoding (and ISO/IEC 10646), even
   in the presence of a mechanism for specifying language tags in plain
   text.  There is nothing obligatory about the use of Plane 14 tags,
   whether for language tags or any other kind of tags.

これはユニコードコード化(そして、ISO/IEC10646)に関してまだ本当です、プレーンテキストで言語タグを指定するためのメカニズムがあるときさえ。そこでは、Planeの使用に関して何かも義務的なものが言語タグかタグの種類いかなる他のもにかかわらず14個のタグではありませんか?

   Language tagging in no way impacts current encoded characters or the
   encoding of future scripts.

言語タグ付けは現在のコード化されたキャラクタか将来のスクリプトのコード化に決して影響を与えません。

Whistler & Adams             Informational                     [Page 11]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

ユニコードで1999年1月にプレーンテキストにタグ付けをする口笛を吹く人とアダムス情報[11ページ]のRFC2482Language

   It is fully anticipated that implementations of Unicode which already
   make use of out-of-band mechanisms for language tagging or "heavy-
   weight" in-band mechanisms such as HTML will continue to do exactly
   what they are doing and will ignore Plane 14 tag characters
   completely.

HTMLなどの言語タグ付けかバンドにおける、「重い重さ」メカニズムにバンドの外で既にメカニズムを利用するユニコードの実装がまさにそれらがしていることをし続けて、Plane14タグキャラクタを完全に無視すると完全に予期されます。

8.0 Security Considerations

8.0 セキュリティ問題

   There are no known security issues raised by this document.

このドキュメントによって提起された安全保障問題は知られていません。

References

参照

   [ISO10646] ISO/IEC 10646-1:1993 International Organization for
              Standardization.  "Information Technology -- Universal
              Multiple-Octet Coded Character Set (UCS) -- Part 1:
              Architecture and Basic Multilingual Plane", Geneva, 1993.

[ISO10646]ISO/IEC10646-1: 1993 国際標準化機構。「情報技術--普遍的な複数の八重奏コード化文字集合(UCS)--第1部:、」「アーキテクチャ的、そして、基本的な多言語飛行機」、ジュネーブ、1993。

   [RFC1766]  Alvestrand, H., "Tags for the Identification of
              Languages", RFC 1766, March 1995.

[RFC1766]Alvestrand、H.、「言語の識別のためのタグ」、RFC1766、1995年3月。

   [RFC2070]  Yergeau, F., Nicol, G. Adams, G. and M. Duerst,
              "Internationalization of the Hypertext Markup Language",
              RFC 2070, January 1997.

[RFC2070] YergeauとF.とニコルとG.アダムスとG.とM.Duerst、「ハイパーテキストマークアップランゲージの国際化」、RFC2070、1997年1月。

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119] ブラドナー、S.、「Indicate Requirement LevelsへのRFCsにおける使用のためのキーワード」、BCP14、RFC2119、1997年3月。

   [RFC2130]  Weider, C. Preston, C., Simonsen, K., Alvestrand, H.,
              Atkinson, R., Crispin, M. and P. Svanberg, "The Report of
              the IAB Character Set Workshop held 29 February - 1 March,
              1996", RFC 2130, April 1997.

[RFC2130]ワイダー、C.プレストン、C.、シモンセン、K.、Alvestrand、H.、アトキンソン、R.、クリスピン、M.、およびP.スバンベルク、「IAB文字コードWorkshopのReportは2月29日に成立しました--1996年3月1日」、RFC2130、1997年4月。

   [UNICODE]  The Unicode Standard, Version 2.0, The Unicode Consortium,
              Addison-Wesley, July 1996.

[ユニコード] ユニコード規格、バージョン2.0、ユニコード共同体、アディソン-ウエスリー、1996年7月。

Acknowledgements

承認

   The following people also contributed to this document, directly or
   indirectly: Chris Newman, Mark Crispin, Rick McGowan, Joe Becker,
   John Jenkins, and Asmus Freytag. This document also was reviewed by
   the Unicode Technical Committee, and the authors wish to thank all of
   the UTC representatives for their input. The authors are, of course,
   responsible for any errors or omissions which may remain in the text.

また、これに寄付された以下の人々は直接か間接的に以下を記録します。クリス・ニューマン、クリスピン、リック・マガウアン、ジョー・ベッカー、John Jenkins、およびAsmusフライタークをマークしてください。このドキュメントもユニコードTechnical Committeeによって再検討されました、そして、作者は彼らの入力についてUTC代表に皆、感謝したがっています。作者はもちろんテキストに残っているどんな誤りや省略にも責任があります。

Whistler & Adams             Informational                     [Page 12]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

ユニコードで1999年1月にプレーンテキストにタグ付けをする口笛を吹く人とアダムス情報[12ページ]のRFC2482Language

Authors' Addresses

作者のアドレス

   Ken Whistler
   Sybase, Inc.
   6475 Christie Ave.
   Emeryville, CA 94608-1050

ケン口笛を吹く人Sybase Inc.6475クリスティAve。エマリービル、カリフォルニア94608-1050

   Phone: +1 510 922 3611
   EMail: kenw@sybase.com

以下に電話をしてください。 +1 3611年の510 922メール: kenw@sybase.com

   Glenn Adams
   Spyglass, Inc.
   One Cambridge Center
   Cambridge, MA 02142

ケンブリッジ、グレンアダムススパイグラスInc.1ケンブリッジのセンターMA 02142

   Phone: +1 617 679 4652
   EMail: glenn@spyglass.com

以下に電話をしてください。 +1 4652年の617 679メール: glenn@spyglass.com

Whistler & Adams             Informational                     [Page 13]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999

ユニコードで1999年1月にプレーンテキストにタグ付けをする口笛を吹く人とアダムス情報[13ページ]のRFC2482Language

Full Copyright Statement

完全な著作権宣言文

   Copyright (C) The Internet Society (1999).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

それに関するこのドキュメントと翻訳は、コピーして、それが批評するか、またはそうでなければわかる他のもの、および派生している作品に提供するか、または準備されているかもしれなくて、コピーされて、発行されて、全体か一部分配された実装を助けるかもしれません、どんな種類の制限なしでも、上の版権情報とこのパラグラフがそのようなすべてのコピーと派生している作品の上に含まれていれば。しかしながら、このドキュメント自体は何らかの方法で変更されないかもしれません、インターネット協会か他のインターネット組織の版権情報か参照を取り除くのなどように、それを英語以外の言語に翻訳するのが著作権のための手順がインターネットStandardsプロセスで定義したどのケースに従わなければならないか、必要に応じてさもなければ、インターネット標準を開発する目的に必要であるのを除いて。

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

上に承諾された限られた許容は、永久であり、インターネット協会、後継者または案配によって取り消されないでしょう。

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

このドキュメントとそして、「そのままで」という基礎とインターネットの振興発展を目的とする組織に、インターネット・エンジニアリング・タスク・フォースが速達の、または、暗示しているすべての保証を放棄するかどうかというここにことであり、他を含んでいて、含まれて、情報の使用がここに侵害しないどんな保証も少しもまっすぐになるという情報か市場性か特定目的への適合性のどんな黙示的な保証。

Whistler & Adams             Informational                     [Page 14]

口笛を吹く人とアダムスInformationalです。[14ページ]

一覧

RFC 1～100	RFC 1401～1500	RFC 2801～2900	RFC 4201～4300
RFC 101～200	RFC 1501～1600	RFC 2901～3000	RFC 4301～4400
RFC 201～300	RFC 1601～1700	RFC 3001～3100	RFC 4401～4500
RFC 301～400	RFC 1701～1800	RFC 3101～3200	RFC 4501～4600
RFC 401～500	RFC 1801～1900	RFC 3201～3300	RFC 4601～4700
RFC 501～600	RFC 1901～2000	RFC 3301～3400	RFC 4701～4800
RFC 601～700	RFC 2001～2100	RFC 3401～3500	RFC 4801～4900
RFC 701～800	RFC 2101～2200	RFC 3501～3600	RFC 4901～5000
RFC 801～900	RFC 2201～2300	RFC 3601～3700	RFC 5001～5100
RFC 901～1000	RFC 2301～2400	RFC 3701～3800	RFC 5101～5200
RFC 1001～1100	RFC 2401～2500	RFC 3801～3900	RFC 5201～5300
RFC 1101～1200	RFC 2501～2600	RFC 3901～4000	RFC 5301～5400
RFC 1201～1300	RFC 2601～2700	RFC 4001～4100	RFC 5401～5500
RFC 1301～1400	RFC 2701～2800	RFC 4101～4200

RFC2482 日本語訳

一覧

リンク

メニュー

コメント

お問い合わせ

プライバシーポリシー