RFC2781 日本語訳

2781 UTF-16, an encoding of ISO 10646. P. Hoffman, F. Yergeau. February 2000. (Format: TXT=29870 bytes) (Status: INFORMATIONAL)
プログラムでの自動翻訳です。
RFC一覧
 英語原文

Network Working Group                                        P. Hoffman
Request for Comments: 2781                     Internet Mail Consortium
Category: Informational                                      F. Yergeau
                                                      Alis Technologies
                                                          February 2000

コメントを求めるワーキンググループP.ホフマン要求をネットワークでつないでください: 2781年のインターネットメール共同体カテゴリ: 情報のF.Yergeau Alis技術2000年2月

                    UTF-16, an encoding of ISO 10646

UTF-16、ISO10646のコード化

Status of this Memo

このMemoの状態

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

このメモはインターネットコミュニティのための情報を提供します。それはどんな種類のインターネット標準も指定しません。このメモの分配は無制限です。

Copyright Notice

版権情報

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

1. Introduction

1. 序論

   This document describes the UTF-16 encoding of Unicode/ISO-10646,
   addresses the issues of serializing UTF-16 as an octet stream for
   transmission over the Internet, discusses MIME charset naming as
   described in [CHARSET-REG], and contains the registration for three
   MIME charset parameter values: UTF-16BE (big-endian), UTF-16LE
   (little-endian), and UTF-16.

このドキュメントは、ユニコード/ISO-10646のUTF-16コード化について説明して、インターネットを利用した伝送のための八重奏の流れとしてUTF-16を連載する問題を記述して、[CHARSET-REG]で説明されるようにMIME charset命名について議論して、3つのMIME charsetパラメタ値のための登録を含んでいます: UTF-16BE(ビッグエンディアン)、UTF-16LE(リトルエンディアン)、およびUTF-16。

1.1 Background and motivation

1.1 バックグラウンドと動機

   The Unicode Standard [UNICODE] and ISO/IEC 10646 [ISO-10646] jointly
   define a coded character set (CCS), hereafter referred to as Unicode,
   which encompasses most of the world's writing systems [WORKSHOP].
   UTF-16, the object of this specification, is one of the standard ways
   of encoding Unicode character data; it has the characteristics of
   encoding all currently defined characters (in plane 0, the BMP) in
   exactly two octets and of being able to encode all other characters
   likely to be defined (the next 16 planes) in exactly four octets.

ユニコードStandard[ユニコード]とISO/IEC10646[ISO-10646]は共同で世界の書記体系[WORKSHOP]の大部分を包含する今後ユニコードと呼ばれたコード化文字集合(CCS)を定義します。 UTF-16(この仕様の物)はユニコード文字データをコード化する標準の方法の1つです。それには、まさに2つの八重奏ですべての現在定義されたキャラクタ(飛行機0のBMP)をコード化して、まさに4つの八重奏で定義されそうな他のすべてのキャラクタ(次の16機の飛行機)はコード化できる特性があります。

   The Unicode Standard further defines additional character properties
   and other application details of great interest to implementors. Up
   to the present time, changes in Unicode and amendments to ISO/IEC
   10646 have tracked each other, so that the character repertoires and
   code point assignments have remained in sync. The relevant
   standardization committees have committed to maintain this very
   useful synchronism, as well as not to assign characters outside of
   the 17 planes accessible to UTF-16.

ユニコードStandardはさらにすごく作成者への関心の添字の特性と他のアプリケーションの詳細を定義します。今までのところ時間、ユニコードにおける変化とISO/IEC10646の修正は互いを追跡しました、キャラクタレパートリーとコードポイント課題が同時性に残ったように。関連標準化委員会は、この非常に役に立つ同時性を維持して、UTF-16にアクセスしやすい17機の飛行機の外にキャラクタを選任しないように公約しました。

Hoffman & Yergeau            Informational                      [Page 1]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[1ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   The IETF policy on character sets and languages [CHARPOLICY] says
   that IETF protocols MUST be able to use the UTF-8 character encoding
   scheme [UTF-8]. Some products and network standards already specify
   UTF-16, making it an important encoding for the Internet. This
   document is not an update to the [CHARPOLICY] document, only a
   description of the UTF-16 encoding.

文字の組と言語[CHARPOLICY]に関するIETF方針は、IETFプロトコルが計画[UTF-8]をコード化するUTF-8キャラクタを使用できなければならないと言います。インターネットのための重要なコード化をそれにして、いくつかの製品とネットワーク規格は既にUTF-16を指定します。このドキュメントは[CHARPOLICY]ドキュメントへの最新版でなく、UTF-16の唯一の記述はコード化です。

1.2 Terminology

1.2 用語

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [MUSTSHOULD].

キーワード“MUST"、「必須NOT」が「必要です」、“SHALL"、「」、“SHOULD"、「「推薦され」て、「5月」の、そして、「任意」のNOTはRFC2119[MUSTSHOULD]で説明されるように本書では解釈されることであるべきですか?

   Throughout this document, character values are shown in hexadecimal
   notation. For example, "0x013C" is the character whose value is the
   character assigned the integer value 316 (decimal) in the CCS.

このドキュメント中では、文字値は16進法で示されます。例えば、"0x013C"は値が整数価値316の(小数)がCCSで割り当てられたキャラクタであるキャラクタです。

2. UTF-16 definition

2. UTF-16定義

   UTF-16 is described in the Unicode Standard, version 3.0 [UNICODE].
   The definitive reference is Annex Q of ISO/IEC 10646-1 [ISO-10646].
   The rest of this section summarizes the definition is simple terms.

UTF-16はユニコードStandardで説明されて、バージョン3.0は[ユニコード]です。決定的な参照はISO/IEC10646-1[ISO-10646]のAnnex Qです。このセクションの残りが定義をまとめる、簡単な用語はそうです。

   In ISO 10646, each character is assigned a number, which Unicode
   calls the Unicode scalar value. This number is the same as the UCS-4
   value of the character, and this document will refer to it as the
   "character value" for brevity. In the UTF-16 encoding, characters are
   represented using either one or two unsigned 16-bit integers,
   depending on the character value. Serialization of these integers for
   transmission as a byte stream is discussed in Section 3.

ISO10646では、数は各キャラクタに割り当てられて、どのユニコード呼び出しがユニコードスカラ価値であるか。この数はキャラクタのUCS-4値と同じです、そして、このドキュメントは簡潔さのための「文字値」とそれを呼ぶでしょう。 UTF-16コード化では、文字値によって、キャラクタは、どちらかか2つの無記名の16ビットの整数を使用することで代理をされます。セクション3でバイト・ストリームとしてのトランスミッションのためのこれらの整数の連載について議論します。

   The rules for how characters are encoded in UTF-16 are:

キャラクタがUTF-16でどうコード化されるか規則は以下の通りです。

   -  Characters with values less than 0x10000 are represented as a
      single 16-bit integer with a value equal to that of the character
      number.

- 値0x10000があるキャラクターはキャラクタ番号のものと等しい値に従ったただ一つの16ビットの整数として表されます。

   -  Characters with values between 0x10000 and 0x10FFFF are
      represented by a 16-bit integer with a value between 0xD800 and
      0xDBFF (within the so-called high-half zone or high surrogate
      area) followed by a 16-bit integer with a value between 0xDC00 and
      0xDFFF (within the so-called low-half zone or low surrogate area).

- 0xDC00と0xDFFF(いわゆる低い半分ゾーンか低い代理の領域の中の)の間には、値がある状態で16ビットの整数が0xD800と0xDBFF(いわゆる高半分ゾーンか高い代理の領域の中の)の間であとに続いていて、0×10000と0x10FFFFの間には、値があるキャラクターは値に従った16ビットの整数によって表されます。

   -  Characters with values greater than 0x10FFFF cannot be encoded in
      UTF-16.

- UTF-16で値が0x10FFFFより大きいキャラクターをコード化できません。

   Note: Values between 0xD800 and 0xDFFF are specifically reserved for
   use with UTF-16, and don't have any characters assigned to them.

以下に注意してください。 0xD800と0xDFFFの間の値で、UTF-16との使用のために明確に予約されて、どんなキャラクタも彼らに選任しません。

Hoffman & Yergeau            Informational                      [Page 2]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[2ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

2.1 Encoding UTF-16

2.1 UTF-16をコード化すること。

   Encoding of a single character from an ISO 10646 character value to
   UTF-16 proceeds as follows. Let U be the character number, no greater
   than 0x10FFFF.

以下の通り単独のキャラクタのISO10646文字値からUTF-16までのコード化しかける。 Uがキャラクタ番号、より0x10FFFF以下であることをさせてください。

   1) If U < 0x10000, encode U as a 16-bit unsigned integer and
      terminate.

1) U<0x10000であるなら、16ビットの符号のない整数としてUをコード化してください、そして、終わってください。

   2) Let U' = U - 0x10000. Because U is less than or equal to 0x10FFFF,
      U' must be less than or equal to 0xFFFFF. That is, U' can be
      represented in 20 bits.

2) 'Uをさせてください'はUと等しいです--0×10000。 'Uが、より0x10FFFF以下であるので、U'は、より0xFFFFF以下であるに違いありません。 20ビットで'すなわち、U'を表すことができます。

   3) Initialize two 16-bit unsigned integers, W1 and W2, to 0xD800 and
      0xDC00, respectively. These integers each have 10 bits free to
      encode the character value, for a total of 20 bits.

3) それぞれ2つの16ビットの符号のない整数、W1、およびW2を0xD800と0xDC00に初期化してください。これらの整数には、それぞれ空き領域に文字値を合計20ビットコード化できる10ビットがあります。

   4) Assign the 10 high-order bits of the 20-bit U' to the 10 low-order
      bits of W1 and the 10 low-order bits of U' to the 10 low-order
      bits of W2. Terminate.

4) 'W2の10下位のビットまでのW1の10下位のビットとU'の10下位のビットに20ビットのU'の10高位のビットを割り当ててください。終わってください。

   Graphically, steps 2 through 4 look like:
   U' = yyyyyyyyyyxxxxxxxxxx
   W1 = 110110yyyyyyyyyy
   W2 = 110111xxxxxxxxxx

グラフィカルに、ステップ2～4に似ています: 'U'=yyyyyyyyyyxxxxxxxxxx W1=110110yyyyyyyyyy W2は110111xxxxxxxxxxと等しいです。

2.2 Decoding UTF-16

2.2 UTF-16を解読すること。

   Decoding of a single character from UTF-16 to an ISO 10646 character
   value proceeds as follows. Let W1 be the next 16-bit integer in the
   sequence of integers representing the text. Let W2 be the (eventual)
   next integer following W1.

以下の通り単独のキャラクタのUTF-16からISO10646文字値までの解読しかける。 W1がテキストを表す整数の系列の次の16ビットの整数であることをさせてください。 W1に続いて、W2が次の(最後)の整数であることをさせてください。

   1) If W1 < 0xD800 or W1 > 0xDFFF, the character value U is the value
      of W1. Terminate.

1) W1<0xD800かW1>0xDFFFであるなら、文字値UはW1の値です。終わってください。

   2) Determine if W1 is between 0xD800 and 0xDBFF. If not, the sequence
      is in error and no valid character can be obtained using W1.
      Terminate.

2) 0xD800と0xDBFFの間には、W1があるか決定してください。そうでなければ、系列は間違っています、そして、W1を使用することでどんな有効なキャラクタも得ることができません。終わってください。

   3) If there is no W2 (that is, the sequence ends with W1), or if W2
      is not between 0xDC00 and 0xDFFF, the sequence is in error.
      Terminate.

3) W2が全くないか(すなわち、系列はW1と共に終わります)、または0xDC00と0xDFFFの間には、W2がないなら、系列は間違っています。終わってください。

   4) Construct a 20-bit unsigned integer U', taking the 10 low-order
      bits of W1 as its 10 high-order bits and the 10 low-order bits of
      W2 as its 10 low-order bits.

4) '20ビットの符号のない整数Uを構成してください'、10下位のビットとして10高位のビットとしてのW1の10下位のビットとW2の10下位のビット取って。

Hoffman & Yergeau            Informational                      [Page 3]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[3ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   5) Add 0x10000 to U' to obtain the character value U. Terminate.

5) 'U'に0×10000を加えて、文字値U.Terminateを入手してください。

   Note that steps 2 and 3 indicate errors. Error recovery is not
   specified by this document. When terminating with an error in steps 2
   and 3, it may be wise to set U to the value of W1 to help the caller
   diagnose the error and not lose information. Also note that a string
   decoding algorithm, as opposed to the single-character decoding
   described above, need not terminate upon detection of an error, if
   proper error reporting and/or recovery is provided.

ステップ2と3が誤りを示すことに注意してください。エラー回復はこのドキュメントによって指定されません。ステップ2と3における誤りで終わるとき、訪問者が誤りを診断して、情報を失わないのを助けるのはW1の値へのセットUに賢明であるかもしれません。また、上で説明された単独のキャラクタ解読と対照的にアルゴリズムを解読するストリングが誤りの検出のときに終わる必要はないことに注意してください、適切な誤り報告、そして/または、回復を供給するなら。

3. Labelling UTF-16 text

3. テキストとUTF-16をラベルします。

   Appendix A of this specification contains registrations for three
   MIME charsets: "UTF-16BE", "UTF-16LE", and "UTF-16". MIME charsets
   represent the combination of a CCS (a coded character set) and a CES
   (a character encoding scheme). Here the CCS is Unicode/ISO 10646 and
   the CES is the same in all three cases, except for the serialization
   order of the octets in each character, and the external determination
   of which serialization is used.

この仕様の付録Aは3MIME charsetsのための登録証明書を含んでいます: "UTF-16BE"、"UTF-16LE"、および「UTF-16インチ。」 MIME charsetsはCCS(コード化文字集合)とCES(計画をコード化するキャラクタ)の組み合わせを表します。ここで、CCSはユニコード/ISO10646です、そして、CESはすべての3つの場合で同じです、各キャラクタにおける、八重奏の連載命令、およびどの連載が使用されているかに関する外部の決断を除いて。

   This section describes which of the three labels to apply to a stream
   of text. Section 4 describes how to interpret the labels on a stream
   of text.

このセクションは、3個のラベルのどれをテキストの流れに適用したらよいかを説明します。セクション4はテキストの流れのときにラベルを解釈する方法を説明します。

3.1 Definition of big-endian and little-endian

3.1 ビッグエンディアンとリトルエンディアンの定義

   Historically, computer hardware has processed two-octet entities such
   as 16-bit integers in one of two ways. So-called "big-endian"
   hardware handles two-octet entities with the higher-order octet
   first, that is at the lower address in memory; when written out to
   disk or to a network interface (serializing), the high-order octet
   thus appears first in the data stream. On the other hand, "Little-
   endian" hardware handles two-octet entities with the lower-order
   octet first. Hardware of both kinds is common today.

歴史的に、コンピュータ・ハードウェアは2つの方法の1つで16ビットの整数などの2八重奏の実体を処理しました。いわゆる「ビッグエンディアン」ハードウェアは最初に高次な八重奏で2八重奏の実体を扱って、それはメモリの低いアドレスにいます。ディスク、または、ネットワーク・インターフェース(連載する)に書き上げられると、その結果、高位八重奏は最初にデータ・ストリームに現れます。他方では、「少ないエンディアン」ハードウェアは最初に、下層階級八重奏で2八重奏の実体を扱います。両方の種類のハードウェアは今日、一般的です。

   For example, the unsigned 16-bit integer that represents the decimal
   number 258 is 0x0102. The big-endian serialization of that number is
   the octet 0x01 followed by the octet 0x02. The little-endian
   serialization of that number is the octet 0x02 followed by the octet
   0x01. The following C code fragment demonstrates a way to write 16-
   bit quantities to a file in big-endian order, irrespective of the
   hardware's native byte order.

例えば、10進数258を表す無記名の16ビットの整数は0×0102です。その数のビッグエンディアン連載は八重奏0x02がいうことになった八重奏0x01です。その数のリトルエンディアン連載は八重奏0x01がいうことになった八重奏0x02です。以下のCコード断片はビッグエンディアンオーダーにおけるファイルへの量を16ビットに書く方法を示します、ハードウェアのネイティブのバイトオーダーの如何にかかわらず。

  void write_be(unsigned short u, FILE f)  /* assume short is 16 bits */
  {
    putc(u >> 8,   f);                     /* output high-order byte */
    putc(u & 0xFF, f);                     /* then low-order */
  }

空間が書く、_こと*が急に仮定する(無記名のショートu、FILE f)/が16ビット*/であるになってください。putc(u>>8、f); /*出力高位バイト*/putc(uと0xFF、f); /*次に、少ないオーダーの*/

Hoffman & Yergeau            Informational                      [Page 4]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[4ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   The term "network byte order" has been used in many RFCs to indicate
   big-endian serialization, although that term has yet to be formally
   defined in a standards-track document. Although ISO 10646 prefers
   big-endian serialization (section 6.3 of [ISO-10646]), little-endian
   order is also sometimes used on the Internet.

「ネットワークバイトオーダー」という用語はビッグエンディアン連載を示すのに多くのRFCsで使用されました、その用語が標準化過程文書ではまだ正式に定義されていませんが。 ISO10646はビッグエンディアン連載([ISO-10646]のセクション6.3)を好みますが、また、リトルエンディアンオーダーはインターネットで時々使用されます。

3.2 Byte order mark (BOM)

3.2 バイト・オーダー・マーク(BOM)

   The Unicode Standard and ISO 10646 define the character "ZERO WIDTH
   NON-BREAKING SPACE" (0xFEFF), which is also known informally as "BYTE
   ORDER MARK" (abbreviated "BOM"). The latter name hints at a second
   possible usage of the character, in addition to its normal use as a
   genuine "ZERO WIDTH NON-BREAKING SPACE" within text. This usage,
   suggested by Unicode section 2.4 and ISO 10646 Annex F (informative),
   is to prepend a 0xFEFF character to a stream of Unicode characters as
   a "signature"; a receiver of such a serialized stream may then use
   the initial character both as a hint that the stream consists of
   Unicode characters and as a way to recognize the serialization order.
   In serialized UTF-16 prepended with such a signature, the order is
   big-endian if the first two octets are 0xFE followed by 0xFF; if they
   are 0xFF followed by 0xFE, the order is little-endian. Note that
   0xFFFE is not a Unicode character, precisely to preserve the
   usefulness of 0xFEFF as a byte-order mark.

ユニコードStandardとISO10646はキャラクタ「幅の非壊れているスペースがない」(0xFEFF)を定義します。(また、それは、「バイト・オーダー・マーク」("BOM"を簡略化する)として非公式に知られています)。後者は1秒のヒントをキャラクタの可能な使用法と命名します、テキストの中の本物の「幅の非の壊さないスペース」としての通常の使用に加えて。この用法であって、ユニコード部2.4とISO10646Annex Fによって勧められて(有益な)、「署名」としてのユニコード文字の流れにはprependへの0xFEFFキャラクタがあります。そして、そのような連載された流れの受信機は流れがユニコード文字から成るというヒントとして連載命令を認識する方法として初期のキャラクタを使用するかもしれません。そのような署名がある連載されたUTF-16 prependedでは、最初の2つの八重奏が0xFFによっていうことになられた0xFEであるなら、オーダーはビッグエンディアンです。それらが0xFEによっていうことになられた0xFFであるなら、オーダーはリトルエンディアンです。 0xFFFEがまさにバイト・オーダー・マークとして0xFEFFの有用性を保存するためにはユニコード文字でないことに注意してください。

   It is important to understand that the character 0xFEFF appearing at
   any position other than the beginning of a stream MUST be interpreted
   with the semantics for the zero-width non-breaking space, and MUST
   NOT be interpreted as a byte-order mark. The contrapositive of that
   statement is not always true: the character 0xFEFF in the first
   position of a stream MAY be interpreted as a zero-width non-breaking
   space, and is not always a byte-order mark. For example, if a process
   splits a UTF-16 string into many parts, a part might begin with
   0xFEFF because there was a zero-width non-breaking space at the
   beginning of that substring.

流れの始まり以外のどんな位置にも現れるキャラクタ0xFEFFは意味論で無幅の非壊れているスペースに解釈しなければならなくて、バイト・オーダー・マークとして解釈されてはいけないのを理解しているのは重要です。その声明のcontrapositiveはいつも本当であるというわけではありません: 流れの第1ポジションのキャラクタ0xFEFFは無幅の非壊れているスペースとして解釈されるかもしれなくて、いつもバイト・オーダー・マークであるというわけではありません。例えば、過程がUTF-16ストリングを多くの部品に分けるなら、そのサブストリングの始めに、無幅の非壊れているスペースがあったので、部分は0xFEFFと共に始まるかもしれません。

   The Unicode standard further suggests than an initial 0xFEFF
   character may be stripped before processing the text, the rationale
   being that such a character in initial position may be an artifact of
   the encoding (an encoding signature), not a genuine intended "ZERO
   WIDTH NON-BREAKING SPACE". Note that such stripping might affect an
   external process at a different layer (such as a digital signature or
   a count of the characters) that is relying on the presence of all
   characters in the stream.

ユニコード規格はテキストを処理する前に初期の0xFEFFキャラクタを裸にするかもしれないよりさらに示されます、原理が最初の位置がコード化(コード化署名)の人工物であるかもしれません、a本物でない中のそんなにそのようなキャラクタが「幅の非壊れているスペースがありません」を意図したということであり。そのようなストリップが流れにおける、すべてのキャラクタの存在を当てにしている異なった層(デジタル署名かキャラクタのカウントなどの)で外部過程に影響するかもしれないことに注意してください。

   In particular, in UTF-16 plain text it is likely, but not certain,
   that an initial 0xFEFF is a signature. When concatenating two
   strings, it is important to strip out those signatures, because
   otherwise the resulting string may contain an unintended "ZERO WIDTH

それは、UTF-16プレーンテキストでは、特に、ありそうですが、確かでなく、それはイニシャルです。0xFEFFは署名です。 2個のストリングを連結するとき、それらの署名を外に剥取るのは重要です、さもなければ、結果として起こるストリングが含むかもしれないので故意でなさ、「幅のゼロを合わせてください」

Hoffman & Yergeau            Informational                      [Page 5]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[5ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   NON-BREAKING SPACE" at the connection point. Also, some
   specifications mandate an initial 0xFEFF character in objects
   labelled as UTF-16 and specify that this signature is not part of the
   object.

接続の"NON-BREAKING SPACE"は指します。また、いくつかの仕様が、UTF-16としてラベルされた物で初期の0xFEFFキャラクタを強制して、この署名が物の一部でないと指定します。

3.3 Choosing a label for UTF-16 text

3.3 UTF-16テキストのためのラベルを選ぶこと。

   Any labelling application that uses UTF-16 character encoding, and
   explicitly labels the text, and knows the serialization order of the
   characters in text, SHOULD label the text as either "UTF-16BE" or
   "UTF-16LE", whichever is appropriate based on the endianness of the
   text. This allows applications processing the text, but unable to
   look inside the text, to know the serialization definitively.

いずれもUTF-16キャラクタコード化を使用して、明らかにテキストを分類して、テキストにおける、キャラクタの連載命令を知っているアプリケーションを分類して、SHOULDは"UTF-16BE"か"UTF-16LE"のどちらかとしてテキストを分類します、どれがテキストのエンディアンに基づいて適切であっても。これは、決定的に連載を知るためにテキストにもかかわらず、テキストでは見ることができない処理をアプリケーションに許します。

   Text in the "UTF-16BE" charset MUST be serialized with the octets
   which make up a single 16-bit UTF-16 value in big-endian order.
   Systems labelling UTF-16BE text MUST NOT prepend a BOM to the text.

ビッグエンディアンオーダーにおけるただ一つの16ビットのUTF-16値を作る八重奏で"UTF-16BE"charsetのテキストを連載しなければなりません。テキストとUTF-16BEをラベルするシステムはテキストにBOMをprependしてはいけません。

   Text in the "UTF-16LE" charset MUST be serialized with the octets
   which make up a single 16-bit UTF-16 value in little-endian order.
   Systems labelling UTF-16LE text MUST NOT prepend a BOM to the text.

リトルエンディアンオーダーにおけるただ一つの16ビットのUTF-16値を作る八重奏で"UTF-16LE"charsetのテキストを連載しなければなりません。テキストとUTF-16LEをラベルするシステムはテキストにBOMをprependしてはいけません。

   Any labelling application that uses UTF-16 character encoding, and
   puts an explicit charset label on the text, and does not know the
   serialization order of the characters in text, MUST label the text as
   "UTF-16", and SHOULD make sure the text starts with 0xFEFF.

いくらか、UTF-16キャラクタコード化を使用して、明白なcharsetラベルをテキストに載せて、テキストにおける、キャラクタの連載命令を知らないで、テキストを分類しなければならないアプリケーションを分類する、「UTF-16インチ、テキストが0xFEFFから始まるのを確実にするべきである、」

   An exception to the "SHOULD" rule of using "UTF-16BE" or "UTF-16LE"
   would occur with document formats that mandate a BOM in UTF-16 text,
   thereby requiring the use of the "UTF-16" tag only.

"UTF-16BE"か"UTF-16LE"を使用する“SHOULD"規則への例外はUTF-16テキストでBOMを強制するドキュメント・フォーマットで起こるでしょう、その結果、「UTF-16インチのタグ専用」の使用を必要とします。

4. Interpreting text labels

4. テキストラベルを解釈します。

   When a program sees text labelled as "UTF-16BE", "UTF-16LE", or
   "UTF-16", it can make some assumptions, based on the labelling rules
   given in the previous section. These assumptions allow the program to
   then process the text.

プログラムが、テキストが"UTF-16BE"、"UTF-16LE"、または「UTF-16インチ、いくつかの仮定をすることができます、前項で与えられたラベル規則に基づいて」分類されるのを見るとき。そして、これらの仮定で、プログラムはテキストを処理できます。

4.1 Interpreting text labelled as UTF-16BE

4.1 UTF-16BEとして分類されたテキストを解釈すること。

   Text labelled "UTF-16BE" can always be interpreted as being big-
   endian.  The detection of an initial BOM does not affect de-
   serialization of text labelled as UTF-16BE. Finding 0xFF followed by
   0xFE is an error since there is no Unicode character 0xFFFE.

大きいエンディアンであるのでいつも"UTF-16BE"に分類されたテキストは解釈できます。初期のBOMの検出はUTF-16BEとして分類されたテキストの反-連載に影響しません。ユニコード文字0xFFFEが全くないので、0xFEによって続かれた0xFFを見つけるのは、誤りです。

Hoffman & Yergeau            Informational                      [Page 6]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[6ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

4.2 Interpreting text labelled as UTF-16LE

4.2 UTF-16LEとして分類されたテキストを解釈すること。

   Text labelled "UTF-16LE" can always be interpreted as being little-
   endian. The detection of an initial BOM does not affect de-
   serialization of text labelled as UTF-16LE. Finding 0xFE followed by
   0xFF is an error since there is no Unicode character 0xFFFE, which
   would be the interpretation of those octets under little-endian
   order.

エンディアンであるのでいつもほとんど"UTF-16LE"に分類されたテキストは解釈できるというわけではありません。初期のBOMの検出はUTF-16LEとして分類されたテキストの反-連載に影響しません。そこ以来0xFFによって続かれた0xFEが誤りであることがわかるのは、ユニコード文字0xFFFEではありません。(その0xFFFEはリトルエンディアンオーダーでのそれらの八重奏の解釈でしょう)。

4.3 Interpreting text labelled as UTF-16

4.3 UTF-16として分類されたテキストを解釈すること。

   Text labelled with the "UTF-16" charset might be serialized in either
   big-endian or little-endian order. If the first two octets of the
   text is 0xFE followed by 0xFF, then the text can be interpreted as
   being big-endian. If the first two octets of the text is 0xFF
   followed by 0xFE, then the text can be interpreted as being little-
   endian. If the first two octets of the text is not 0xFE followed by
   0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be
   interpreted as being big-endian.

「UTF-16インチのcharsetはビッグエンディアンかリトルエンディアンオーダーのどちらかで連載されるかもしれないこと」で分類されたテキスト。テキストの最初の2つの八重奏が0xFFによっていうことになられた0xFEであるなら、ビッグエンディアンであるのでテキストを解釈できます。テキストの最初の2つの八重奏が0xFEによっていうことになられた0xFFであるなら、エンディアンであるのでほとんどテキストを解釈できません。 0xFEはテキストの最初の2つの八重奏であるなら0xFFによって続かれていません、そして、0xFFは0xFE、次に、テキストSHOULDによって続かれていません。ビッグエンディアンであるので、解釈されます。

   All applications that process text with the "UTF-16" charset label
   MUST be able to read at least the first two octets of the text and be
   able to process those octets in order to determine the serialization
   order of the text. Applications that process text with the "UTF-16"
   charset label MUST NOT assume the serialization without first
   checking the first two octets to see if they are a big-endian BOM, a
   little-endian BOM, or not a BOM. All applications that process text
   with the "UTF-16" charset label MUST be able to interpret both big-
   endian and little-endian text.

それがテキストを処理するすべてのアプリケーション、「テキストのcharsetラベルが少なくとも1番目を読むことができなければならないUTF-16インチ2つの八重奏、テキストの連載順番を決定するためにそれらの八重奏を処理できてください、」「BOMではなく、charsetラベルが、それらがビッグエンディアンであるなら最初に最初の2つの八重奏をチェックすることのない連載が見ると仮定してはいけないUTF-16インチBOM、リトルエンディアンBOM」と共にテキストを処理するアプリケーション。「UTF-16インチのcharsetラベルは大きいエンディアンとリトルエンディアンテキストの両方を解釈できなければならないこと」でテキストを処理するすべてのアプリケーション。

5. Examples

5. 例

   For the sake of example, let's suppose that there is a hieroglyphic
   character representing the Egyptian god Ra with character value
   0x12345 (this character does not exist at present in Unicode).

例のために、文字値0x12345でエジプト人の神Raを表す象形文字のキャラクタがあると思いましょう(このキャラクタは現在のところ、ユニコードで存在しません)。

   The examples here all evaluate to the phrase:

ここの例はすべて、句に以下を評価します。

   *=Ra

*=Ra

   where the "*" represents the Ra hieroglyph (0x12345).

「*」がRa象形文字(0×12345)を表すところ。

   Text labelled with UTF-16BE, without a BOM:
   D8 08 DF 45 00 3D 00 52 00 61

テキストはUTF-16BEでBOMなしでラベルしました: D8 08DF45 00 3D00 52 00 61

   Text labelled with UTF-16LE, without a BOM:
   08 D8 45 DF 3D 00 52 00 61 00

テキストはUTF-16LEでBOMなしでラベルしました: 08 D8 45DF3D00 52 00 61 00

Hoffman & Yergeau            Informational                      [Page 7]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[7ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   Big-endian text labelled with UTF-16, with a BOM:
   FE FF D8 08 DF 45 00 3D 00 52 00 61

UTF-16、BOMで分類されたビッグエンディアンテキスト: FE ff D8 08DF45 00 3D00 52 00 61

   Little-endian text labelled with UTF-16, with a BOM:
   FF FE 08 D8 45 DF 3D 00 52 00 61 00

UTF-16、BOMで分類されたリトルエンディアンテキスト: ff FE08D8 45DF3D00 52 00 61 00

6. Versions of the standards

6. 規格のバージョン

   ISO/IEC 10646 is updated from time to time by published amendments;
   similarly, different versions of the Unicode standard exist: 1.0,
   1.1, 2.0, 2.1, and 3.0 as of this writing. Each new version replaces
   the previous one, but implementations, and more significantly data,
   are not updated instantly.

発行された修正でISO/IEC10646を時々アップデートします。同様に、ユニコード規格の異なった見解は存在しています: 1.0 1.1 2.0 2.1 そして、この書くこと現在3.0。それぞれの新しいバージョンが前のもの、しかし、実現をよりかなり取り替える、データ、即座に、アップデートしません。

   In general, the changes amount to adding new characters, which does
   not pose particular problems with old data. Amendment 5 to ISO/IEC
   10646, however, has moved and expanded the Korean Hangul block,
   thereby making any previous data containing Hangul characters invalid
   under the new version. Unicode 2.0 has the same difference from
   Unicode 1.1. The official justification for allowing such an
   incompatible change was that no significant implementations and data
   containing Hangul existed, a statement that is likely to be true but
   remains unprovable. The incident has been dubbed the "Korean mess",
   and the relevant committees have pledged to never, ever again make
   such an incompatible change.

一般に、変化は、新しいキャラクタを加えるのに等しいです(古いデータに関する特定の問題を引き起こしません)。しかしながら、ISO/IEC10646の修正5は、韓国のハングルブロックを動かして、広くしました、その結果、ハングルキャラクタを含むどんな前のデータも新しいバージョンの下で無効にします。ユニコード2.0には、ユニコード1.1からの同じ違いがあります。そのような非互換な変化を許容するための公式の正当化はハングルを含む重要な実現とデータが全く存在しなかったということでした、本当であることがありそうですが、立証不可能なままで残っている声明。事件は「韓国の混乱」と呼ばれました、そして、関連委員会は二度とそのような非互換な変更を行わないと誓約しました。

   New versions, and in particular any incompatible changes, have
   consequences regarding MIME character encoding labels, to be
   discussed in Appendix A.

新しいバージョン、および特にどんな非互換な変化にも、Appendix Aで議論するためにラベルをコード化するMIMEキャラクタに関して結果があります。

7. IANA Considerations

7. IANA問題

   IANA is to register the character sets found in Appendixes A.1, A.2,
   and A.3 according to RFC 2278, using registration templates found in
   those appendixes.

IANAはRFC2278によると、Appendixes A.1、A.2、およびA.3で見つけられた文字の組を登録することになっています、それらの付属物で見つけられた登録テンプレートを使用して。

8. Security Considerations

8. セキュリティ問題

   UTF-16 is based on the ISO 10646 character set, which is frequently
   being added to, as described in Section 6 and Appendix A of this
   document. Processors must be able to handle characters that are not
   defined at the time that the processor was created in such a way as
   to not allow an attacker to harm a recipient by including unknown
   characters.

UTF-16はISO10646文字の組に基づいています、このドキュメントのセクション6とAppendix Aで説明されるように。(文字の組は頻繁に加えられています)。プロセッサは未知のキャラクタを含んでいることによって攻撃者が受取人に危害を加えるのを許容しないほどプロセッサがそのような方法で作成された時に定義されないキャラクタを扱うことができなければなりません。

   Processors that handle any type of text, including text encoded as
   UTF-16, must be vigilant in checking for control characters that
   might reprogram a display terminal or keyboard. Similarly, processors

UTF-16としてコード化されたテキストを含むどんなタイプのテキストも扱うプロセッサはディスプレー装置かキーボードのプログラムを変えるかもしれない制御文字がないかどうかチェックするのにおいて用心深いに違いありません。同様である、プロセッサ

Hoffman & Yergeau            Informational                      [Page 8]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[8ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   that interpret text entities (such as looking for embedded
   programming code), must be careful not to execute the code without
   first alerting the recipient.

それは、テキスト実体(埋め込まれたプログラミング・コードを探すことなどの)を解釈して、最初に受取人を警告しないでコードを実行しないように注意していなければなりません。

   Text in UTF-16 may contain special characters, such as the OBJECT
   REPLACEMENT CHARACTER (0xFFFC), that might cause external processing,
   depending on the interpretation of the processing program and the
   availability of an external data stream that would be executed. This
   external processing may have side-effects that allow the sender of a
   message to attack the receiving system.

UTF-16のテキストは外部の処理を引き起こすかもしれないOBJECT REPLACEMENT CHARACTERなどの特殊文字(0xFFFC)を含むかもしれません、実行される処理プログラムの解釈と外部のデータ・ストリームの有用性によって。この外部の処理には、受電方式を攻撃するメッセージの送付者を許容する副作用があるかもしれません。

   Implementors of UTF-16 need to consider the security aspects of how
   they handle illegal UTF-16 sequences (that is, sequences involving
   surrogate pairs that have illegal values or unpaired surrogates). It
   is conceivable that in some circumstances an attacker would be able
   to exploit an incautious UTF-16 parser by sending it an octet
   sequence that is not permitted by the UTF-16 syntax, causing it to
   behave in some anomalous fashion.

UTF-16の作成者は、セキュリティがそれらがどう、不法なUTF-16系列(すなわち、不法な値か非対にされた代理がいる代理の組にかかわる系列)を扱うかに関する局面であると考える必要があります。いくつかの事情では、攻撃者がUTF-16構文で受入れられない八重奏系列をそれに送ることによって軽率なUTF-16パーサを利用できるだろうというのが想像できます、それが何らかの変則的なファッションで反応することを引き起こして。

9. References

9. 参照

   [CHARPOLICY]  Alvestrand, H., "IETF Policy on Character Sets and
                 Languages", BCP 18, RFC 2277, January 1998.

[CHARPOLICY] Alvestrand、H.、「文字コードと言語に関するIETF方針」、BCP18、RFC2277、1998年1月。

   [CHARSET-REG] Freed, N. and J. Postel, "IANA Charset Registration
                 Procedures", BCP 19, RFC 2278, January 1998.

解放された[CHARSET-REG]とN.とJ.ポステル、「IANA Charset登録手順」、BCP19、RFC2278、1998年1月。

   [HTTP-1.1]    Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
                 Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext
                 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

[HTTP-1.1] フィールディング、R.、Gettys、J.、ムガール人、J.、Frystyk、H.、Masinter、L.、リーチ、P.、およびT.バーナーズ・リー、「HTTP/1.1インチ、RFC2616、1999年ハイパーテキスト転送プロトコル--6月」。

   [ISO-10646]   ISO/IEC 10646-1:1993. International Standard --
                 Information technology -- Universal Multiple-Octet
                 Coded Character Set (UCS) -- Part 1: Architecture and
                 Basic Multilingual Plane. 22 amendments and two
                 technical corrigenda have been published up to now.
                 UTF-16 is described in Annex Q, published as Amendment
                 1. Many other amendments are currently at various
                 stages of standardization. A second edition is in
                 preparation, probably to be published in 2000; in this
                 new edition, UTF-16 will probably be described in Annex
                 C.

[ISO-10646]ISO/IEC10646-1:1993。国際規格--情報技術--普遍的なMultiple-八重奏Coded文字コード(UCS)--第1部: 構造と基本多言語水準。 22の修正と2の技術的な正誤表はこれまで発行されました。 UTF-16はAmendment1として発行されたAnnex Qで説明されます。現在、様々なステージの標準化には他の多くの修正があります。第2版は準備たぶん2000年に発行されるために中です。この新版では、UTF-16はたぶんAnnex Cで説明されるでしょう。

   [MUSTSHOULD]  Bradner, S., "Key words for use in RFCs to Indicate
                 Requirement Levels", BCP 14, RFC 2119, March 1997.

[MUSTSHOULD] ブラドナー、S.、「Indicate Requirement LevelsへのRFCsにおける使用のためのキーワード」、BCP14、RFC2119、1997年3月。

   [UNICODE]     The Unicode Consortium, "The Unicode Standard --
                 Version 3.0", ISBN 0-201-61633-5. Described at

[ユニコード] ユニコード共同体、「ユニコード規格--バージョン3インチ、ISBN、0-201-61633-5」説明されます。

Hoffman & Yergeau            Informational                      [Page 9]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[9ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   <http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.

<http://www.unicode.org/ユニコード/規格/バージョン/Unicode3.0.html>。

   [UTF-8]       Yergeau, F., "UTF-8, a transformation format of ISO
                 10646", RFC 2279, January 1998.

[UTF-8]Yergeau、1998年1月のF.、「UTF-8、ISO10646の変化形式」RFC2279。

   [WORKSHOP]    Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
                 Atkinson, R., Crispin., M. and P. Svanberg, "Report of
                 the IAB Character Set Workshop", RFC 2130, April 1997.

[ワークショップ] ワイダーとC.とプレストンとC.とシモンセンとK.とAlvestrandとH.とアトキンソン、R.(クリスピン)とM.とP.スバンベルク、「IAB文字コードワークショップのレポート」RFC2130(1997年4月)。

10. Acknowledgments

10. 承認

   Deborah Goldsmith wrote a great deal of the initial wording for this
   specification. Martin Duerst proposed numerous significant changes.
   Other significant contributors include:

デボラ・ゴールドスミスはこの仕様のために多くの初期の言葉遣いを書きました。マーチンDuerstは多数の著しい変化を提案しました。他の重要な貢献者は:

   Mati Allouche
   Walt Daniels
   Mark Davis
   Ned Freed
   Asmus Freytag
   Lloyd Honomichl
   Dan Kegel
   Murata Makoto
   Larry Masinter
   Markus Scherer
   Keld Simonsen
   Ken Whistler

Mati Alloucheウォルト・ダニエル・マーク・デイビス・ネッドはAsmusフライタークロイドHonomichlダンケーゲルムラタ誠ラリーMasinterマーカスシェーラーKeldシモンセンケンウィスラーを解放しました。

   Some of the text in this specification was copied from [UTF-8], and
   that document was worked on by many people. Please see the
   acknowledgments section in that document for more people who may have
   contributed indirectly to this document.

この仕様によるテキストのいくつかが[UTF-8]からコピーされました、そして、多くの人々によってそのドキュメントを扱われました。間接的なこのドキュメントに貢献したかもしれないより多くの人々のためにそのドキュメントの承認部を見てください。

Hoffman & Yergeau            Informational                     [Page 10]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[10ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

A. Charset registrations

A。 Charset登録証明書

   This memo is meant to serve as the basis for registration of three
   MIME charsets [CHARSET-REG]. The proposed charsets are "UTF-16BE",
   "UTF-16LE", and "UTF-16". These strings label objects containing text
   consisting of characters from the repertoire of ISO/IEC 10646
   including all amendments at least up to amendment 5 (Korean block),
   encoded to a sequence of octets using the encoding and serialization
   schemes outlined above.

このメモは登録の基礎として3MIME charsets[CHARSET-REG]について機能することになっています。提案されたcharsetsは"UTF-16BE"と、"UTF-16LE"と、「UTF-16インチ」です。これらのストリングは修正5(韓国のブロック)に少なくとも上がっていて、コード化を使用する八重奏と連載計画の系列にコード化されたすべての修正が上に概説したISO/IECの10646包含のレパートリーからキャラクタから成るテキストを含む物をラベルします。

   Note that "UTF-16BE", "UTF-16LE", and "UTF-16" are NOT suitable for
   use in media types under the "text" top-level type, because they do
   not encode line endings in the way required for MIME "text" media
   types. An exception to this is HTTP, which uses a MIME-like
   mechanism, but is exempt from the restrictions on the text top-level
   type (see section 19.4.2 of HTTP 1.1 [HTTP-1.1]).

その"UTF-16BE"、"UTF-16LE"という注意で「彼らが「テキスト」トップレベルタイプの下におけるメディアタイプにおける使用MIME「テキスト」メディアタイプに必要である方法で線結末をコード化しないのでUTF-16インチがそうであるNOTの適当です」。これへの例外は、HTTPですが、テキストトップレベルタイプにおける制限によって免除されています(.2のセクション19.4HTTP1.1[HTTP-1.1]を見てください)。(それは、MIMEのようなメカニズムを使用します)。

   It is noteworthy that the labels described here do not contain a
   version identification, referring generically to ISO/IEC 10646. This
   is intentional, the rationale being as follows:

一般的にISO/IEC10646について言及して、ここで説明されたラベルがバージョン識別を含まないのは、注目に値します。原理は以下の通りであり、これは意図的です:

   A MIME charset is designed to give just the information needed to
   interpret a sequence of bytes received on the wire into a sequence of
   characters, nothing more (see RFC 2045, section 2.2, in [MIME]). As
   long as a character set standard does not change incompatibly,
   version numbers serve no purpose, because one gains nothing by
   learning from the tag that newly assigned characters may be received
   that one doesn't know about. The tag itself doesn't teach anything
   about the new characters, which are going to be received anyway.

MIME charsetはまさしくワイヤの上にキャラクタの系列に受け取られたバイトの系列を解釈するのに必要である情報を教えるように設計されています、それ以上何も(RFC2045を見てください、セクション2.2、[MIME]で)。文字の組規格が相容れないほどに変化しない限り、バージョン番号は目的に全く役立ちません、1つがものが知らないタグから新たに割り当てられたキャラクタが受け取られるかもしれないことを学ぶのによる何も獲得しないので。タグ自体は新しいキャラクタに関して何も教えません。(キャラクタはとにかく受け取られるでしょう)。

   Hence, as long as the standards evolve compatibly, the apparent
   advantage of having labels that identify the versions is only that,
   apparent. But there is a disadvantage to such version-dependent
   labels: when an older application receives data accompanied by a
   newer, unknown label, it may fail to recognize the label and be
   completely unable to deal with the data, whereas a generic, known
   label would have triggered mostly correct processing of the data,
   which may well not contain any new characters.

したがって、規格が矛盾なく発展する限り、バージョンを特定するラベルを持つ見かけの利点はそれであるだけです、明らかです。しかし、そのようなバージョン依存するラベルへの難点があります: それは、ラベルを認識して、一般的で、知られているラベルはたぶん少しの新しいキャラクタも含まないだろうデータのほとんど正しい処理の引き金となったでしょうが、より古いアプリケーションが、より新しくて、未知のラベルによって伴われたデータを受け取るとき、完全にデータに対処できないかもしれないというわけではありません。

   The "Korean mess" (ISO/IEC 10646 amendment 5) is an incompatible
   change, in principle contradicting the appropriateness of a version
   independent MIME charset as described above. But the compatibility
   problem can only appear with data containing Korean Hangul characters
   encoded according to Unicode 1.1 (or equivalently ISO/IEC 10646
   before amendment 5), and there is arguably no such data to worry
   about, this being the very reason the incompatible change was deemed
   acceptable.

「韓国の混乱」(ISO/IEC10646の修正5)は非互換な変化です、原則として上で説明されるようにバージョンから独立しているMIME charsetの適切さに矛盾して。しかし、データがユニコード1.1によると、コード化された韓国人のハングルキャラクタを含んでいて互換性の問題が現れることができるだけである、(同等である、修正5の前のISO/IEC10646)、これが非互換な変化が許容できると考えられたまさしくその理由であり心配するように、論証上どんなそのようなデータも周囲にありません。

Hoffman & Yergeau            Informational                     [Page 11]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[11ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   In practice, then, a version-independent label is warranted, provided
   the label is understood to refer to all versions after Amendment 5,
   and provided no incompatible change actually occurs. Should
   incompatible changes occur in a later version of ISO/IEC 10646, the
   MIME charsets defined here will stay aligned with the previous
   version until and unless the IETF specifically decides otherwise.

実際には、次に、バージョンインディペンデント・レーベルは保証されます、ラベルがAmendment5の後のすべてのバージョンを示すのが理解されて、どんな非互換な変化も実際に起こらないなら。そして、非互換な変化がISO/IEC10646の後のバージョンに起こるはずであると、ここで定義されたMIME charsetsが旧バージョンで並べられたままである、IETFが別の方法で明確に決めない場合。

A.1 Registration for UTF-16BE

UTF-16BEのためのA.1登録

   To: ietf-charsets@iana.org
   Subject: Registration of new charset

To: ietf-charsets@iana.org Subject: 新しいcharsetの登録

   Charset name(s): UTF-16BE

Charsetは(s)を命名します: UTF-16BE

   Published specification(s): This specification

広められた仕様: この仕様

   Suitable for use in MIME content types under the
   "text" top-level type: No

「テキスト」トップレベルタイプの下でMIME内容タイプにおける使用に適する: いいえ

   Person & email address to contact for further information:
   Paul Hoffman <phoffman@imc.org>
   Francois Yergeau <fyergeau@alis.com>

詳細のために連絡する人とEメールアドレス: ポール Hoffman <phoffman@imc.org 、gt;、フランソア Yergeau <fyergeau@alis.com 、gt。

A.2 Registration for UTF-16LE

UTF-16LEのためのA.2登録

   To: ietf-charsets@iana.org
   Subject: Registration of new charset

To: ietf-charsets@iana.org Subject: 新しいcharsetの登録

   Charset name(s): UTF-16LE

Charsetは(s)を命名します: UTF-16LE

   Published specification(s): This specification

広められた仕様: この仕様

   Suitable for use in MIME content types under the
   "text" top-level type: No

「テキスト」トップレベルタイプの下でMIME内容タイプにおける使用に適する: いいえ

   Person & email address to contact for further information:
   Paul Hoffman <phoffman@imc.org>
   Francois Yergeau <fyergeau@alis.com>

詳細のために連絡する人とEメールアドレス: ポール Hoffman <phoffman@imc.org 、gt;、フランソア Yergeau <fyergeau@alis.com 、gt。

A.3 Registration for UTF-16

UTF-16のためのA.3登録

   To: ietf-charsets@iana.org
   Subject: Registration of new charset

To: ietf-charsets@iana.org Subject: 新しいcharsetの登録

   Charset name(s): UTF-16

Charsetは(s)を命名します: UTF-16

   Published specification(s): This specification

広められた仕様: この仕様

Hoffman & Yergeau            Informational                     [Page 12]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[12ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

   Suitable for use in MIME content types under the
   "text" top-level type: No

「テキスト」トップレベルタイプの下でMIME内容タイプにおける使用に適する: いいえ

   Person & email address to contact for further information:
   Paul Hoffman <phoffman@imc.org>
   Francois Yergeau <fyergeau@alis.com>

詳細のために連絡する人とEメールアドレス: ポール Hoffman <phoffman@imc.org 、gt;、フランソア Yergeau <fyergeau@alis.com 、gt。

Authors' Addresses

作者のアドレス

   Paul Hoffman
   Internet Mail Consortium
   127 Segre Place
   Santa Cruz, CA  95060 USA

ポールホフマンインターネットメール共同体127セグレ・Placeカリフォルニア95060サンタクルス(米国)

   EMail: phoffman@imc.org

メール: phoffman@imc.org

   Francois Yergeau
   Alis Technologies
   100, boul. Alexis-Nihon, Suite 600
   Montreal  QC  H4M 2P2 Canada

フランソアYergeau Alis Technologies100、boul。アレックサス-日本、スイート600モントリオールQC H4M 2P2カナダ

   EMail: fyergeau@alis.com

メール: fyergeau@alis.com

Hoffman & Yergeau            Informational                     [Page 13]

RFC 2781            UTF-16, an encoding of ISO 10646       February 2000

ホフマンとYergeau Informational[13ページ]RFC2781UTF-16、ISO10646 2000年2月のコード化

Full Copyright Statement

完全な著作権宣言文

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

それに関するこのドキュメントと翻訳は、コピーして、それが批評するか、またはそうでなければわかる他のもの、および派生している作品に提供するか、または準備されているかもしれなくて、コピーされて、発行されて、全体か一部広げられた実現を助けるかもしれません、どんな種類の制限なしでも、上の版権情報とこのパラグラフがそのようなすべてのコピーと派生している作品の上に含まれていれば。しかしながら、このドキュメント自体は何らかの方法で変更されないかもしれません、インターネット協会か他のインターネット組織の版権情報か参照を取り除くのなどように、それを英語以外の言語に翻訳するのが著作権のための手順がインターネットStandardsの過程で定義したどのケースに従わなければならないか、必要に応じてさもなければ、インターネット標準を開発する目的に必要であるのを除いて。

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

上に承諾された限られた許容は、永久であり、インターネット協会、後継者または案配によって取り消されないでしょう。

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

このドキュメントとそして、「そのままで」という基礎とインターネットの振興発展を目的とする組織に、インターネット・エンジニアリング・タスク・フォースが速達の、または、暗示しているすべての保証を放棄するかどうかというここにことであり、他を含んでいて、含まれて、情報の使用がここに侵害しないどんな保証も少しもまっすぐになるという情報か市場性か特定目的への適合性のどんな黙示的な保証。

Acknowledgement

承認

   Funding for the RFC Editor function is currently provided by the
   Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。

Hoffman & Yergeau            Informational                     [Page 14]

ホフマンとYergeau情報です。[14ページ]

一覧

RFC 1～100	RFC 1401～1500	RFC 2801～2900	RFC 4201～4300
RFC 101～200	RFC 1501～1600	RFC 2901～3000	RFC 4301～4400
RFC 201～300	RFC 1601～1700	RFC 3001～3100	RFC 4401～4500
RFC 301～400	RFC 1701～1800	RFC 3101～3200	RFC 4501～4600
RFC 401～500	RFC 1801～1900	RFC 3201～3300	RFC 4601～4700
RFC 501～600	RFC 1901～2000	RFC 3301～3400	RFC 4701～4800
RFC 601～700	RFC 2001～2100	RFC 3401～3500	RFC 4801～4900
RFC 701～800	RFC 2101～2200	RFC 3501～3600	RFC 4901～5000
RFC 801～900	RFC 2201～2300	RFC 3601～3700	RFC 5001～5100
RFC 901～1000	RFC 2301～2400	RFC 3701～3800	RFC 5101～5200
RFC 1001～1100	RFC 2401～2500	RFC 3801～3900	RFC 5201～5300
RFC 1101～1200	RFC 2501～2600	RFC 3901～4000	RFC 5301～5400
RFC 1201～1300	RFC 2601～2700	RFC 4001～4100	RFC 5401～5500
RFC 1301～1400	RFC 2701～2800	RFC 4101～4200

RFC2781 日本語訳

一覧

リンク

メニュー

コメント

お問い合わせ

プライバシーポリシー