кодировки

#1 20.10.2016, 13:11

Evgeny Mikheev написал(а) к All в Oct 16 11:53:08 по местному времени:

Привет, All!

Напомните пожалуйста по кладжам CНRS и CНARSET. Чем например CНRS: IBMPC отличается от CНRS: IBMPC 2 с двойкой на конце?

С наилучшими пожеланиями, Evgeny.

--- -Пиши, старик, пиши! Мы тебя не покинем.

#2 20.10.2016, 13:31

Eugene Subbotin написал(а) к Evgeny Mikheev в Oct 16 13:24:46 по местному времени:

Нello Evgeny!

20 Oct 16 11:53, you wrote to All:

EM> Напомните пожалуйста по кладжам CНRS и CНARSET. Чем например CНRS:
EM> IBMPC отличается от CНRS: IBMPC 2 с двойкой на конце?

2. Character set identification line
------------------------------------

The character encoding of a message is specified in the "CНRS"
control line.

The CНRS control line is formatted as follows:

^ACНRS: <identifier> <level>

Where <identifier> is a character string of no more than eight (8)
ASCII characters identifying the character set or character encoding
scheme used, and level is a positive integer value describing what
level of CНRS the message is written in.

For backwards compatibility, "CНARSET" may be treated as a synonym
for "CНRS".

Some implementations do not add the <level> field and some
implementations erroneously present "UTF-8 2" instead of "UTF-8 4".
Well mannered implementations should gracefully handle this situation
when reading messages. The recommended way of doing this is to
ignore the level parameter and only use the name of the identifier.
In future the level parameter may become obsolete.

Incoming messages without "CНRS" control lines should be considered
as being written in pure ASCII, but may be treated as being written
in some default character set or character encoding scheme. Such as
IBM codepage 437, IBM codepage 866 or UTF-8. It is recommended that
message readers offer the user the option of manually selecting a
different character set or encoding scheme for these messages on a
per-area, per-message or other basis.

3. Supported levels
-------------------

These levels are the ones that are implemented in current software:

Level 0
-------

This level is for messages containing pure seven bit ASCII only.
Outgoing messages in pure ASCII need not be identified by a "CНRS"
control line, but if they are, they should be indicated as
"ASCII 1" (not "ASCII 0").

Level 1
-------

First level of internationalisation, using seven bit character sets.
Most of these are based on US ASCII, with minor internationalisation
variations.

Level 2
-------

Second level of internationalisation, using eight bit character
sets.

This level adds support for character sets that use "extended
ASCII", i.e codes with the most significant bit set. The character
sets in level two are all based on ASCII (the codes 0-127 coincide
with ASCII).

Level 3
-------

Level 3 is included just for completeness as it was mentioned in the
proposals (FSC-0054 and FSP-1013, now FRL-1020) that this standard is
based on.

It seems level 3 was originally meant for 16 bit character sets but
there never was an implementation and there may never be. This may
have to do with the NULL byte being reserved in the Fidonet
specifications as a termination character.

Level 3 is "reserved".

Level 4
-------

Level 4 is for multi byte character encodings. The only presently
known implementation is UTF-8.

Eugene

--- GoldED+/LNX 1.1.5-b20160827

#3 20.10.2016, 14:40

Alexey Vissarionov написал(а) к Evgeny Mikheev в Oct 16 13:21:34 по местному времени:

Доброго времени суток, Evgeny!
20 Oct 2016 11:53:08, ты -> All:

EM> Напомните пожалуйста по кладжам CНRS и CНARSET.

Первый использовать нужно, второй использовать нельзя.

EM> Чем например CНRS: IBMPC отличается от CНRS: IBMPC 2 с двойкой на
EM> конце?

Ничем: в обоих случаях вылетишь из нодлиста по 10.3.6

--
Alexey V. Vissarionov aka Gremlin from Kremlin
gremlin ПРИ gremlin ТЧК ru; +vii-cmiii-ccxxix-lxxix-xlii

... Раньше выходили вон, теперь выходят в офф
--- /bin/vi

#4 20.10.2016, 15:40

Alexey Vissarionov написал(а) к Eugene Subbotin в Oct 16 13:21:34 по местному времени:

Доброго времени суток, Eugene!
20 Oct 2016 13:24:46, ты -> Evgeny Mikheev:

EM>> Напомните пожалуйста по кладжам CНRS и CНARSET. Чем например CНRS:
EM>> IBMPC отличается от CНRS: IBMPC 2 с двойкой на конце?
ES> 2. Character set identification line
ES> The character encoding of a message is specified in the "CНRS"
ES> control line.

Этого достаточно.

ES> For backwards compatibility, "CНARSET" may be treated as a synonym
ES> for "CНRS".

Кстати, ни одного примера такого ПО не обнаружено - так что надо будет эту строчку выкинуть туда, где ей самое место.

ES> Some implementations do not add the <level> field

Тоже, кстати, давно списать в optional...

--
Alexey V. Vissarionov aka Gremlin from Kremlin
gremlin ПРИ gremlin ТЧК ru; +vii-cmiii-ccxxix-lxxix-xlii

... Почему клинические идиоты лечатся амбулаторно?!
--- /bin/vi

#5 07.11.2016, 13:11

Eugene Palenock написал(а) к Eugene Subbotin в Oct 16 14:45:08 по местному времени:

Привет, Eugene!

20 окт 16 13:24, Eugene Subbotin -> Evgeny Mikheev:

ES> Level 4
ES> -------

ES> Level 4 is for multi byte character encodings. The only presently
ES> known implementation is UTF-8.

А есть пример настроек эхотага для этого?
Как к нему приделать UTF-8 ?
Как там должны строится таблицы перекодировок чтобы это работало в обе стороны?

--
С уважением, Евгений.

---