RFC4042 日本語訳

4042 UTF-9 and UTF-18 Efficient Transformation Formats of Unicode. M.Crispin. April 1 2005. (Format: TXT=19123 bytes) (Status: INFORMATIONAL)
プログラムでの自動翻訳です。
英語原文

Network Working Group                                         M. Crispin
Request for Comments: 4042                             Panda Programming
Category: Informational                                     1 April 2005

コメントを求めるワーキンググループM.クリスピン要求をネットワークでつないでください: 4042年のパンダプログラミングカテゴリ: 情報の2005年4月1日

                           UTF-9 and UTF-18
              Efficient Transformation Formats of Unicode

ユニコードのUTF-9とUTF-18の効率的な変化形式

Status of This Memo

このメモの状態

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

このメモはインターネットコミュニティのための情報を提供します。それはどんな種類のインターネット標準も指定しません。このメモの分配は無制限です。

Copyright Notice

版権情報

   Copyright (C) The Internet Society (2005).

Abstract

要約

   ISO-10646 defines a large character set called the Universal
   Character Set (UCS), which encompasses most of the world's writing
   systems.  The same set of codepoints is defined by Unicode, which
   further defines additional character properties and other
   implementation details.  By policy of the relevant standardization
   committees, changes to Unicode and amendments and additions to
   ISO/IEC 646 track each other, so that the character repertoires and
   code point assignments remain in synchronization.

ISO-10646はUniversal文字コード(UCS)と呼ばれる大きい文字の組を定義します。文字コードは世界の書記体系の大部分を包含します。codepointsの同じセットはユニコードによって定義されます。(さらに、それは、添字の特性と他の実現の詳細を定義します)。関連標準化委員会の方針で、ユニコードと修正への変化とISO/IEC646への追加は互いを追跡します、キャラクタレパートリーとコードポイント課題が同期に残るように。

   The current representation formats for Unicode (UTF-7, UTF-8, UTF-16)
   are not storage and computation efficient on platforms that utilize
   the 9 bit nonet as a natural storage unit instead of the 8 bit octet.

ユニコード(UTF-7、UTF-8、UTF-16)のための現在の表現形式は、8ビットの八重奏の代わりに自然な記憶装置として9ビットの九重奏曲を利用するプラットホームで効率的な格納と計算ではありません。

   This document describes a transformation format of Unicode that takes
   advantage of the nonet so that the format will be storage and
   computation efficient.

このドキュメントは形式が格納と計算効率的になるように九重奏曲を利用するユニコードの変化形式について説明します。

1.  Introduction

1. 序論

   A number of Internet sites utilize platforms that are not based upon
   the traditional 8-bit byte or octet.  One such platform is the PDP-
   10, which is based upon a 36-bit word.  On these platforms, it is
   wasteful to represent data in octets, since 4 bits are left unused in
   each word.  The 9-bit nonet is a much more sensible representation.

多くのインターネット・サイトが伝統的な8ビットのバイトか八重奏に基づいていないプラットホームを利用します。そのようなプラットホームの1つはPDP10です。(そのPDPは36ビットの単語に基づいています)。これらのプラットホームでは、4ビットが各単語で未使用で残されるので、八重奏におけるデータを表すのが無駄です。 9ビットの九重奏曲ははるかに分別がある表現です。

   Although these platforms support IETF standards, many of these
   platforms still utilize a text representation based upon the septet,

これらのプラットホームはIETF規格を支持しますが、これらのプラットホームの多くがまだ七重奏に基づくテキスト表現を利用しています。

Crispin                      Informational                      [Page 1]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[1ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

   which is only suitable for [US-ASCII] (although it has been used for
   various ISO 10646 national variants).

どれが単に[米国-ASCII]に適当であるか、(それが様々なISOに使用された、10646の国家の異形)

   To maximize international and multi-lingual interoperability, the IAB
   has recommended ([IAB-CHARACTER]) that [ISO-10646] be the default
   coded character set.

国際的で多言語の相互運用性を最大にするために、IABは、[ISO-10646]がデフォルトコード化文字集合であることを推薦しました([IAB-キャラクター])。

   Although other transformation formats of [UNICODE] exist, and
   conceivably can be used on nonet-oriented machines (most notably
   [UTF-8]), they suffer significant disadvantages:

[ユニコード]の他の変化形式は、存在していて、多分九重奏曲指向のマシン(最も著しく[UTF-8])の上で使用できますが、重要な損失を受けます:

      [UTF-8]
         requires one to three octets to represent codepoints in the
         Basic Multilingual Plane (BMP), four octets to represent
         [UNICODE] codepoints outside the BMP, and six octets to
         represent non-[UNICODE] codepoints.  When stored in nonets,
         this results in as many as four wasted bits per [UNICODE]
         character.

[UTF-8]は基本多言語水準(BMP)でcodepointsを表す3つの八重奏、BMPの外に[ユニコード]codepointsを表す4つの八重奏、および非[ユニコード]codepointsを表す6つの八重奏に1を必要とします。九重奏曲に格納されると、これは[ユニコード]キャラクタあたり無駄な最大4ビットをもたらします。

      [UTF-16]
         requires a hexadecet to represent codepoints in the BMP, and
         two hexadecets to represent [UNICODE] codepoints outside the
         BMP.  When stored in nonet pairs, this results in as many as
         four wasted bits per [UNICODE] character.  This transformation
         format requires complex surrogates to represent codepoints
         outside the BMP, and can not represent non-[UNICODE] codepoints
         at all.

[UTF-16]は、BMPの外に[ユニコード]codepointsを表すためにhexadecetがBMP、および2hexadecetsにcodepointsを表すのを必要とします。九重奏曲組で格納されると、これは[ユニコード]キャラクタあたり無駄な最大4ビットをもたらします。この変化形式は、複雑な代理がBMPの外にcodepointsを表すのが必要であり、全く非[ユニコード]codepointsを表すことができません。

      [UTF-7]
         requires one to five septets to represent codepoints in the
         BMP, and as many as eight septets to represent codepoints
         outside the BMP.  When stored in nonets, this results in as
         many as sixteen wasted bits per character.  This transformation
         format requires very complex and computationally expensive
         shifting and "modified BASE64" processing, and can not
         represent non-[UNICODE] codepoints at all.

[UTF-7]は、BMPの外にcodepointsを表すためにBMPにcodepointsを表す1～5つの七重奏、および最大8つの七重奏を必要とします。九重奏曲に格納されると、これは1キャラクタあたり無駄な最大16ビットをもたらします。この変化形式は、非常に複雑で計算上高価な移行を必要として、「BASE64"処理を変更して、全く非[ユニコード]codepointsを表すことができません」。

   By comparison, UTF-9 uses one to two nonets to represent codepoints
   in the BMP, three nonets to represent [UNICODE] codepoints outside
   the BMP, and three or four nonets to represent non-[UNICODE]
   codepoints.  There are no wasted bits, and as the examples in this
   document demonstrate, the computational processing is minimal.

比較で、UTF-9はBMPにcodepointsを表す1～2つの九重奏曲、BMPの外に[ユニコード]codepointsを表す3つの九重奏曲、および非[ユニコード]codepointsを表す3か4つの九重奏曲を使用します。どんな無駄なビットもありません、そして、例が本書では示すように、コンピュータの処理は最小限です。

   Transformation between [UTF-8] and UTF-9 is straightforward, with
   most of the complexity in the handling of [UTF-8].  It is hoped that
   future extensions to protocols such as SMTP will permit the use of
   UTF-9 in these protocols between nonet platforms without the use of
   [UTF-8] as an "on the wire" format.

[UTF-8]とUTF-9の間の変化は[UTF-8]の取り扱いで複雑さの大部分に簡単です。 SMTPなどのプロトコルへの今後の拡大が「ワイヤ」形式として[UTF-8]の使用なしで九重奏曲プラットホームの間のこれらのプロトコルにおけるUTF-9の使用を可能にすることが望まれています。

Crispin                      Informational                      [Page 2]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[2ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

   Similarly, transformation between [UNICODE] codepoints and UTF-18 is
   also quite simple.  Although (like UCS-2) UTF-18 only represents a
   subset of the available [UNICODE] codepoints, it encompasses the
   non-private codepoints that are currently assigned in [UNICODE].

また、同様に、[ユニコード]のcodepointsとUTF-18の間の変化もかなり簡単です。 (UCS-2のような)UTF-18は利用可能な[ユニコード]codepointsの部分集合を表すだけですが、それは現在[ユニコード]で割り当てられる非個人的なcodepointsを取り囲みます。

1.1.  Conventions Used in This Document

1.1. 本書では使用されるコンベンション

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [KEYWORDS].

キーワード“MUST"、「必須NOT」が「必要です」、“SHALL"、「」、“SHOULD"、「「推薦され」て、「5月」の、そして、「任意」のNOTはBCP14RFC2119[キーワード]で説明されるように本書では解釈されることであるべきです。

2.  Overview

2. 概観

   UTF-9 encodes [UNICODE] codepoints in the low order 8 bits of a
   nonet, using the high order bit to indicate continuation.  Surrogates
   are not used.

安値におけるUTF-9エンコード[ユニコード]codepointsは九重奏曲の8ビットを配置します、継続を示すのに高位のビットを使用して。代理は使用されていません。

   [UNICODE] codepoints in the range U+0000 - U+00FF ([US-ASCII] and
   Latin 1) are represented by a single nonet; codepoints in the range
   U+0100 - U+FFFF (the remainder of the BMP) are represented by two
   nonets; and codepoints in the range U+1000 - U+10FFFF (remainder of
   [UNICODE]) are represented by three nonets.

U+0000--範囲U+00FFの[ユニコード]codepoints([米国-ASCII]とラテン語1)はただ一つの九重奏曲によって表されます。 codepointsはU+0100--範囲U+FFFF(BMPの残り)に2つの九重奏曲によって表されます。そして、codepointsは+ 範囲U+1000--U10FFFF([ユニコード]の残り)に3つの九重奏曲によって表されます。

   Non-[UNICODE] codepoints in [ISO-10646] (that is, codepoints in the
   range 0x110000 - 0x7fffffff) can also be represented in UTF-9 by
   obvious extension, but this is not discussed further as these
   codepoints have been removed from [ISO-10646] by ISO.

また、UTF-9に明白な拡大で[ISO-10646](すなわち、範囲0x110000のcodepoints--0x7fffffff)の非[ユニコード]codepointsを表すことができますが、さらにこれらのcodepointsがISOによって[ISO-10646]から取り外されたようにこれについて議論しません。

   UTF-18 encodes [UNICODE] codepoints in the Basic Multilingual Plane
   (BMP, plane 0), Supplementary Multilingual Plane (SMP, plane 1),
   Supplementary Ideographic Plane (SIP, plane 2), and Supplementary
   Special-purpose Plane (SSP, plane 14) in a single 18-bit value.  It
   does not encode planes 3 though 13, which are currently unused; nor
   planes 15 or 16, which are private spaces.

UTF-18はただ一つの18ビットの値における基本多言語水準(BMP、飛行機0)、Supplementary Multilingual Plane(SMP、飛行機1)、Supplementary Ideographic Plane(SIP、飛行機2)、およびSupplementary Special-目的Plane(SSP、飛行機14)で[ユニコード]codepointsをコード化します。 13ですが、それは飛行機3をコード化しません。(13は現在、未使用です)。または、飛行機15か16。(その飛行機は個人的な空間です)。

   Normally, UTF-9 and UTF-18 should only be used in the context of 9
   bit storage and transport.  Although some protocols, e.g., [FTP],
   support transport of nonets, the current IETF protocol suite is quite
   deficient in this area.  The IETF is urged to take action to improve
   IETF protocol support for nonets.

通常、UTF-9とUTF-18は9ビットの格納と輸送の文脈で使用されるだけであるはずです。いくつかのプロトコル例えば、[FTP]が九重奏曲の輸送を支持しますが、現在のIETFプロトコル群はこの領域でかなり不完全です。 IETFが九重奏曲のIETFプロトコルサポートを改良するために行動を取るよう促されます。

3.  UTF-9 Definition

3. UTF-9定義

   A UTF-9 stream represents [ISO-10646] codepoints using 9 bit nonets.
   The low order 8-bits of a nonet is an octet, and the high order bit
   indicates continuation.

UTF-9の流れは、9ビットの九重奏曲を使用することで[ISO-10646]codepointsを表します。九重奏曲下位の8ビットは八重奏です、そして、高位のビットは継続を示します。

Crispin                      Informational                      [Page 3]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[3ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

   UTF-9 does not use surrogates; consequently a UTF-16 value must be
   transformed into the UCS-4 equivalent, and U+D800 - U+DBFF are never
   transmitted in UTF-9.

UTF-9は代理を使用しません。その結果、UTF-16値をUCS-4同等物、およびU+D800に変えなければなりません--U+DBFFはUTF-9で決して伝えられません。

   Octets of the [UNICODE] codepoint value are then copied into
   successive UTF-9 nonets, starting with the most-significant non-zero
   octet.  All but the least significant octet have the continuation bit
   set in the associated nonet.

そして、最も多くの重要な非ゼロ八重奏から始まって、[ユニコード]codepoint価値の八重奏は連続したUTF-9九重奏曲にコピーされます。最も重要でない八重奏以外のすべてが関連九重奏曲に継続ビットを設定させます。

   Examples:

例:

   Character  Name                                UTF-9 (in octal)
   ---------  ----                                ----------------
    U+0041    LATIN CAPITAL LETTER A              101
    U+00C0    LATIN CAPITAL LETTER A WITH GRAVE   300
    U+0391    GREEK CAPITAL LETTER ALPHA          403 221
    U+611B    <CJK ideograph meaning "love">      541 33
    U+10330   GOTHIC LETTER AHSA                  401 403 60
    U+E0041   TAG LATIN CAPITAL LETTER A          416 400 101
    U+10FFFD  <Plane 16 Private Use, Last>        420 777 375
   0x345ecf1b (UCS-4 value not in [UNICODE])      464 536 717 33

キャラクターName UTF-9(8進の)--------- ---- ---------------- U+0041LATIN CAPITAL LETTER A101U+00C0LATIN CAPITAL LETTER A WITH GRAVE300U+0391GREEK CAPITAL LETTER ALPHA403 221U+611B<CJK表意文字意味、「「>541 33U+10330GOTHIC LETTER AHSA401 403 60 0041U+EのTAG LATIN CAPITAL LETTER A416 400 101U+10FFFD<Plane16兵士のUse、Last>420 777 375 0x345ecf1b([ユニコード]でないのによるUCS-4値)464 536 717 33」を愛してください。

4.  UTF-18 Definition

4. UTF-18定義

   A UTF-18 stream represents [ISO-10646] codepoints using a pair of 9
   bit nonets to form an 18-bit value.

UTF-18の流れは、18ビットの値を形成するのに1組の9ビットの九重奏曲を使用することで[ISO-10646]codepointsを表します。

   UTF-18 does not use surrogates; consequently a UTF-16 value must be
   transformed into the UCS-4 equivalent, and U+D800 - U+DBFF are never
   transmitted in UTF-18.

UTF-18は代理を使用しません。その結果、UTF-16値をUCS-4同等物、およびU+D800に変えなければなりません--U+DBFFはUTF-18で決して伝えられません。

   [UNICODE] codepoint values in the range U+0000 - U+2FFFF are copied
   as the same value into a UTF-18 value.  [UNICODE] codepoint values in
   the range U+E0000 - U+EFFFF are copied as values 0x30000 - 0x3ffff;
   that is, these values are shifted by 0x70000.  Other codepoint values
   can not be represented in UTF-18.

U+0000--範囲U+2FFFFの[ユニコード]codepoint値は同じ値としてUTF-18値にコピーされます。 0000U+E--範囲U+EFFFFの[ユニコード]codepoint値は値0x30000としてコピーされます--0x3ffff すなわち、これらの値は0×70000で移行します。 UTF-18に他のcodepoint値を表すことができません。

   Examples:

例:

   Character  Name                                UTF-18 (in octal)
   ---------  ----                                ----------------
    U+0041    LATIN CAPITAL LETTER A              000101
    U+00C0    LATIN CAPITAL LETTER A WITH GRAVE   000300
    U+0391    GREEK CAPITAL LETTER ALPHA          001621
    U+611B    <CJK ideograph meaning "love">      060433
    U+10330   GOTHIC LETTER AHSA                  201460
    U+E0041   TAG LATIN CAPITAL LETTER A          600101

キャラクターName UTF-18(8進の)--------- ---- ---------------- U+0041LATIN CAPITAL LETTER A000101U+00C0LATIN CAPITAL LETTER A WITH GRAVE000300U+0391GREEK CAPITAL LETTER ALPHA001621U+611B<CJK表意文字意味、「「0041>060433U+10330GOTHIC LETTER AHSA201460U+EのTAG LATIN CAPITAL LETTER A600101」を愛してください。

Crispin                      Informational                      [Page 4]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[4ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

5.  Sample Routines

5. サンプルルーチン

5.1.  [UNICODE] Codepoint to UTF-9 Conversion

5.1. [ユニコード] UTF-9変換へのCodepoint

   The following routines demonstrate conversion from UCS-4 to UTF-9.
   For simplicity, these routines do not do any validity checking.
   Routines used in applications SHOULD reject invalid UTF-9 sequences;
   that is, the first nonet with a value of 400 octal (0x100), or
   sequences that result in an overflow (exceeding 0x10ffff for
   [UNICODE]), or codepoints used for UTF-16 surrogates.

以下のルーチンはUCS-4からUTF-9までの変換を示します。簡単さのために、これらのルーチンはどんな正当性の照合もしません。アプリケーションSHOULDで使用されるルーチンは無効のUTF-9系列を拒絶します。すなわち、400 8進(0×100)の値、オーバーフロー([ユニコード]のための上回っている0x10ffff)をもたらす系列、またはUTF-16代理に使用されるcodepointsがある最初の九重奏曲。

   ; Return UCS-4 value from UTF-9 string (PDP-10 assembly version)
   ; Accepts: P1/ 9-bit byte pointer to UTF-9 string
   ; Returns +1: Always, T1/ UCS-4 value, P1/ updated byte pointer
   ; Clobbers T2

; 値をUTF-9ストリング(PDP-10アセンブリバージョン)からUCS-4に返してください。受け入れます: UTF-9ストリングへのP1/ 9-ビットバイトポインタ。リターン+1: T1/ UCS-4値、いつもP1/アップデートされたバイトポインタ。 T2を打ち負かします。

   UT92U4: TDZA T1,T1              ; start with zero
   U92U41:  XOR T1,T2              ; insert octet into UCS-4 value
           LSH T1,^D8              ; shift UCS-4 value
           ILDB T2,P1              ; get next nonet
           TRZE T2,400             ; extract octet, any continuation?
            JRST U92U41            ; yes, continue
           XOR T1,T2               ; insert final octet
           POPJ P,

UT92U4: TDZA T1、T1。 U92U41から全く始まらないでください: XOR T1、T2。 ^UCS-4値のLSH T1、D8に八重奏を挿入してください。シフトUCS-4はILDB T2、P1を評価します。次の九重奏曲TRZE T2,400を手に入れてください。何か八重奏、継続を抽出しますか? JRST U92U41。 XOR T1、T2は、はいを続けています。最終的な八重奏POPJ Pを挿入してください。

   /* Return UCS-4 value from UTF-9 string (C version)
    * Accepts: pointer to pointer to UTF-9 string
    * Returns: UCS-4 character, nonet pointer updated
    */

UTF-9ストリング(Cバージョン)*からの/*リターンUCS-4価値は受け入れます: UTF-9ストリング*へのポインタへのポインタは戻ります: UCS-4キャラクタ、九重奏曲のポインタのアップデートされた*/

   UINT31 UTF9_to_UCS4 (UINT9 **utf9PP)
   {
     UINT9 nonet;
     UINT31 ucs4;
     for (ucs4 = (nonet = *(*utf9PP)++) & 0xff;
          nonet & 0x100;
          ucs4 |= (nonet = *(*utf9PP)++) & 0xff)
       ucs4 <<= 8;
     return ucs4;
   }

_UCS4(UINT9**utf9PP)へのUINT31 UTF9_(ucs4=(九重奏曲=*(*utf9PP)++)と0xff; 九重奏曲と0×100; ucs4| =(九重奏曲=*(*utf9PP)++)と0xff)ucs4<<のためのUINT9九重奏曲(UINT31 ucs4)は8と等しいです; リターンucs4

5.2.  UTF-9 to UCS-4 Conversion

5.2. UCS-4変換へのUTF-9

   The following routines demonstrate conversion from UTF-9 to UCS-4.
   For simplicity, these routines do not do any validity checking.
   Routines used in applications SHOULD reject invalid UCS-4 codepoints;
   that is, codepoints used for UTF-16 surrogates or codepoints with
   values exceeding 0x10ffff for [UNICODE].

以下のルーチンはUTF-9からUCS-4までの変換を示します。簡単さのために、これらのルーチンはどんな正当性の照合もしません。アプリケーションSHOULDで使用されるルーチンは無効のUCS-4 codepointsを拒絶します。すなわち、codepointsはUTF-16に代理か値が[ユニコード]のために0x10ffffを超えているcodepointsを使用しました。

Crispin                      Informational                      [Page 5]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[5ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

   ; Write UCS-4 character to UTF-9 string (PDP-10 assembly version)
   ; Accepts: P1/ 9-bit byte pointer to UTF-9 string
   ;          T1/ UCS-4 character to write
   ; Returns +1: Always, P1/ updated byte pointer
   ; Clobbers T1, T2; (T1, T2) must be an accumulator pair

; UTF-9ストリング(PDP-10アセンブリバージョン)へのUCS-4キャラクタに書いてください。受け入れます: UTF-9ストリングへのP1/ 9-ビットバイトポインタ。書くT1/ UCS-4キャラクタ。リターン+1: いつもP1/アップデートされたバイトポインタ。 T1、T2を打ち負かします。 (T1、T2) アキュムレータが組であったならそうしなければなりません。

   U42UT9: SETO T2,            ; we'll need some of these 1-bits later
           ASHC T1,-^D8        ; low octet becomes nonet with high 0-bit
   U32U91: JUMPE T1,U42U9X     ; done if no more octets
           LSHC T1,-^D8        ; shift next octet into T2
           ROT T2,-1           ; turn it into nonet with high 1 bit
           PUSHJ P,U42U91      ; recurse for remainder
   U42U9X: LSHC T1,^D9         ; get next nonet back from T2
           IDPB T1,P1          ; write nonet
           POPJ P,

U42UT9: 瀬戸T2。私たちがこれらの1ビットのいくつかより遅いASHC T1-^を必要とする、D8。低い八重奏は高い0ビットのU32U91と共に九重奏曲になります: JUMPE T1、U42U9X。それ以上の八重奏LSHC T1-^でないならする、D8。 T2 ROT T2-1に次の八重奏を移行させてください。 1ビットの高いPUSHJ P、U42U91と共にそれを九重奏曲に変えてください。残りU42U9Xのための「再-呪い」: ^LSHC T1、D9。 T2 IDPB T1、P1から次の九重奏曲を取り戻してください。九重奏曲POPJ Pに書いてください。

   /* Write UCS-4 character to UTF-9 string (C version)
    * Accepts: pointer to nonet string
    *          UCS-4 character to write
    * Returns: updated pointer
    */

*を結ぶ(Cバージョン)*が、UTF-9へのUCS-4キャラクタに書く/が受け入れます: *を書く九重奏曲ストリング*UCS-4キャラクタへのポインタは戻ります: アップデートされたポインタ*/

   UINT9 *UCS4_to_UTF9 (UINT9 *utf9P,UINT31 ucs4)
   {
     if (ucs4 > 0x100) {
       if (ucs4 > 0x10000) {
         if (ucs4 > 0x1000000)
           *utf9P++ = 0x100 | ((ucs4 >> 24) & 0xff);
         *utf9P++ = 0x100 | ((ucs4 >> 16) & 0xff);
       }
       *utf9P++ = 0x100 | ((ucs4 >> 8) & 0xff);
     }
     *utf9P++ = ucs4 & 0xff;
     return utf9P;
   }

_UTF9(UINT9*utf9P、UINT31 ucs4)へのUINT9*UCS4_{(ucs4>0x1000000)*utf9Pであるなら、+ +は0×100と等しいです| ((ucs4>>24)と0xff)*utf9P++=0x100| ((ucs4>>16)と0xff)}という(ucs4>0x10000)*utf9Pであるなら、+ +は0×100と等しいです| ((ucs4>>8)と0xff)、(ucs4>0x100)であるなら*utf9P++はucs4と0xffと等しいです; リターンutf9P

6.  Implementation Experience

6. 実現経験

   As the sample routines demonstrate, it is quite simple to implement
   UTF-9 and UTF-18 on a nonet-based architecture.  More sophisticated
   routines can be found in ftp://panda.com/tops-20/utools.mac.txt or
   from lingling.panda.com via the file <UTF9>UTOOLS.MAC via ANONYMOUS
   [FTP].

サンプルルーチンが示すように、九重奏曲ベースの構造でUTF-9とUTF-18を実行するのはかなり簡単です。 ftp://panda.com/tops-20/utools.mac.txt か更生会[FTP]を通したファイル<UTF9>UTOOLS.MACを通したlingling.panda.comから、より精巧なルーチンを見つけることができます。

Crispin                      Informational                      [Page 6]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[6ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

   We are now in the process of implementing support for nonet-based
   text files and automated transformation between septet, octet, and
   nonet textual data.

現在、九重奏曲ベースのテキストファイルのサポートと七重奏と、八重奏と、九重奏曲の原文のデータの間の自動化された変化を実行することの途中に私たちはいます。

7.  References

7. 参照

7.1.  Normative References

7.1. 引用規格

   [FTP]           Postel, J. and J. Reynolds, "File Transfer Protocol",
                   STD 9, RFC 959, October 1985.

[FTP] ポステルとJ.とJ.レイノルズ、「ファイル転送プロトコル」、STD9、RFC959、1985年10月。

   [IAB-CHARACTER] Weider, C., Preston, C., Simonsen, K., Alvestrand,
                   H., Atkinson, R., Crispin, M., and P. Svanberg, "The
                   Report of the IAB Character Set Workshop held 29
                   February - 1 March, 1996", RFC 2130, April 1997.

[IAB-キャラクター] ワイダー、C.、プレストン、C.、シモンセン、K.、Alvestrand、H.、アトキンソン、R.、クリスピン、M.、およびP.スバンベルク、「IAB文字コードWorkshopのReportは2月29日に成立しました--1996年3月1日」、RFC2130、1997年4月。

   [ISO-10646]     International Organization for Standardization,
                   "Information Technology - Universal Multiple-octet
                   coded Character Set (UCS)", ISO/IEC Standard 10646,
                   comprised of ISO/IEC 10646-1:2000, "Information
                   technology - Universal Multiple-Octet Coded Character
                   Set (UCS) - Part 1: Architecture and Basic
                   Multilingual Plane", ISO/IEC 10646-2:2001,
                   "Information technology - Universal Multiple-Octet
                   Coded Character Set (UCS) - Part 2:  Supplementary
                   Planes" and ISO/IEC 10646-1:2000/Amd 1:2002,
                   "Mathematical symbols and other characters".

[ISO-10646]国際標準化機構、「情報Technology--普遍的なMultiple-八重奏は文字コード(UCS)をコード化しました」、ISO/IEC10646-1:2000から成るISO/IEC Standard10646、「情報技術--普遍的なMultiple-八重奏Coded文字コード(UCS)--第1部:、」「構造と基本多言語水準」、ISO/IEC10646-2:2001、「情報技術--普遍的なMultiple-八重奏Coded文字コード(UCS)--第2部:、」「補っているプラネス」とISO/IEC10646-1: 2000/Amd1:2002と、「数学記号と他のキャラクタ。」

   [KEYWORDS]      Bradner, S., "Key words for use in RFCs to Indicate
                   Requirement Levels", BCP 14, RFC 2119, March 1997.

[KEYWORDS]ブラドナー、S.、「Indicate Requirement LevelsへのRFCsにおける使用のためのキーワード」、BCP14、RFC2119、1997年3月。

   [UNICODE]       The Unicode Consortium, "The Unicode Standard -
                   Version 3.2", defined by The Unicode Standard,
                   Version 3.0 (Reading, MA, Addison-Wesley, 2000.  ISBN
                   0-201-61633-5), as amended by the Unicode Standard
                   Annex #27: Unicode 3.1 and by the Unicode Standard
                   Annex #28: Unicode 3.2, March 2002.

[ユニコード] ユニコードConsortium、「ユニコード規格によってバージョン3.2インチであって、定義されたユニコード規格、バージョン3.0(読書、MA、アディソン-ウエスリー、2000ISBN0-201-61633-5)、ユニコードによって修正されるように、規格は#27、を付加します」。ユニコード3.1とユニコード規格別館#28で: 2002年3月のユニコード3.2。

7.2.  Informative References

7.2. 有益な参照

   [US-ASCII]      American National Standards Institute, "Coded
                   Character Set - 7-bit American Standard Code for
                   Information Interchange", ANSI X3.4, 1986.

[米国-ASCII] American National Standards Institut、「7コード化文字集合--ビット、情報交換用米国標準コード、」、ANSI X3.4、1986

   [UTF-16]        Hoffman, P. and F. Yergeau, "UTF-16, an encoding of
                   ISO 10646", RFC 2781, February 2000.

[UTF-16]ホフマン、2000年2月のP.とF.Yergeau、「UTF-16、ISO10646のコード化」RFC2781。

Crispin                      Informational                      [Page 7]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[7ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

   [UTF-7]         Goldsmith, D. and M. Davis, "UTF-7 A Mail-Safe
                   Transformation Format of Unicode", RFC 2152, May
                   1997.

[UTF-7] ゴールドスミス、D.、およびM.デイヴィス(「ユニコードのUTF-7のAメール安全な変化形式」、RFC2152)は1997がそうするかもしれません。

   [UTF-8]         Sollins, K., "Architectural Principles of Uniform
                   Resource Name Resolution", RFC 2276, January 1998.

[UTF-8]Sollins、1998年1月のK.、「一定のリソース名前解決の建築プリンシプルズ」RFC2276。

8.  Security Considerations

8. セキュリティ問題

   As with UTF-8, UTF-9 can represent codepoints that are not in
   [UNICODE].  Applications should validate UTF-9 strings to ensure that
   all codepoints do not exceed the [UNICODE] maximum of U+10FFFF.

UTF-8なら、UTF-9は[ユニコード]にはないcodepointsを表すことができます。アプリケーションは、すべてのcodepointsがU+10FFFFの[ユニコード]最大を超えていないのを保証するためにUTF-9ストリングを有効にするべきです。

   The sample routines in this document are for example purposes, and
   make no attempt to validate their arguments, e.g., test for overflow
   ([UNICODE] values great than 0x10ffff) or codepoints used for
   surrogates.  Besides resulting in invalid data, this can also create
   covert channels.

サンプルルーチンは、例えば、本書では目的であり、オーバーフロー([ユニコード]は0x10ffffより大物を評価する)か代理に使用されるcodepointsのために彼らの議論、例えばテストを有効にする試みを全くしません。また、無効のデータをもたらすこと以外に、これはひそかなチャンネルを創造できます。

9.  IANA Considerations

9. IANA問題

   The IANA shall reserve the charset names "UTF-9" and "UTF-18" for
   future assignment.

そして、IANAがcharset名を予約するものとする、「UTF-9インチ、「UTF-18インチ、将来の課題、」

Author's Address

作者のアドレス

   Mark R. Crispin
   Panda Programming
   6158 NE Lariat Loop
   Bainbridge Island, WA 98110-2098

マークのR.クリスピンパンダプログラミング6158Neラリエット輪のベーンブリッジ島、ワシントン98110-2098

   Phone: (206) 842-2385
   EMail: UTF9@Lingling.Panda.COM

以下に電話をしてください。 (206) 842-2385 メールしてください: UTF9@Lingling.Panda.COM

Crispin                      Informational                      [Page 8]

RFC 4042                    UTF-9 and UTF-18                1 April 2005

クリスピン情報[8ページ]のRFC4042UTF-9とUTF-18 2005年4月1日

Full Copyright Statement

完全な著作権宣言文

   Copyright (C) The Internet Society (2005).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78 and at www.rfc-editor.org/copyright.html, and
   except as set forth therein, the authors retain all their rights.

このドキュメントはBCP78とwww.rfc-editor.org/copyright.htmlに含まれた権利、ライセンス、および制限を受けることがあります、そして、そこに詳しく説明されるのを除いて、作者は彼らのすべての権利を保有します。

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

このドキュメントと「そのままで」という基礎と貢献者、その人が代表する組織で提供するか、または後援されて、インターネット協会とインターネット・エンジニアリング・タスク・フォースはすべての保証を放棄します、と急行ORが含意したということであり、他を含んでいて、ここに含まれて、情報の使用がここに侵害しないどんな保証も少しもまっすぐになるという情報か市場性か特定目的への適合性のどんな黙示的な保証。

Intellectual Property

知的所有権

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

IETFはどんなIntellectual Property Rightsの正当性か範囲、実現に関係すると主張されるかもしれない他の権利、本書では説明された技術の使用またはそのような権利の下におけるどんなライセンスも利用可能であるかもしれない、または利用可能でないかもしれない範囲に関しても立場を全く取りません。または、それはそれを表しません。どんなそのような権利も特定するためのどんな独立している努力もしました。 BCP78とBCP79でRFCドキュメントの権利に関する手順に関する情報を見つけることができます。

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

IPR公開のコピーが利用可能に作られるべきライセンスの保証、または一般的な免許を取得するのが作られた試みの結果をIETF事務局といずれにもしたか、または http://www.ietf.org/ipr のIETFのオンラインIPR倉庫からこの仕様のimplementersかユーザによるそのような所有権の使用のために許可を得ることができます。

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.

IETFはこの規格を実行するのに必要であるかもしれない技術をカバーするかもしれないどんな著作権もその注目していただくどんな利害関係者、特許、特許出願、または他の所有権も招待します。 ietf ipr@ietf.org のIETFに情報を記述してください。

Acknowledgement

承認

   Funding for the RFC Editor function is currently provided by the
   Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。

Crispin                      Informational                      [Page 9]

クリスピンInformationalです。[9ページ]

一覧

RFC 1～100	RFC 1401～1500	RFC 2801～2900	RFC 4201～4300
RFC 101～200	RFC 1501～1600	RFC 2901～3000	RFC 4301～4400
RFC 201～300	RFC 1601～1700	RFC 3001～3100	RFC 4401～4500
RFC 301～400	RFC 1701～1800	RFC 3101～3200	RFC 4501～4600
RFC 401～500	RFC 1801～1900	RFC 3201～3300	RFC 4601～4700
RFC 501～600	RFC 1901～2000	RFC 3301～3400	RFC 4701～4800
RFC 601～700	RFC 2001～2100	RFC 3401～3500	RFC 4801～4900
RFC 701～800	RFC 2101～2200	RFC 3501～3600	RFC 4901～5000
RFC 801～900	RFC 2201～2300	RFC 3601～3700	RFC 5001～5100
RFC 901～1000	RFC 2301～2400	RFC 3701～3800	RFC 5101～5200
RFC 1001～1100	RFC 2401～2500	RFC 3801～3900	RFC 5201～5300
RFC 1101～1200	RFC 2501～2600	RFC 3901～4000	RFC 5301～5400
RFC 1201～1300	RFC 2601～2700	RFC 4001～4100	RFC 5401～5500
RFC 1301～1400	RFC 2701～2800	RFC 4101～4200

RFC4042 日本語訳

一覧

リンク

メニュー

コメント

お問い合わせ

プライバシーポリシー