Working with localized strings

first, you need to be able to do with the localized strings - a record character constants broad localized characters and distinguish them from conventional lines char[]. For this line is written to the prior s qualifier L:

The result will be:

Pay attention, the string length (number characters) in this case, clearly less, than the number bytes allocated for the string (your operating system, their attitude may be different from, I show in Linux, but it does not affect the programming technique).

In such a line near with equal success may be symbols of diverse nature: different languages, special mathematical symbols, common in theirske designation of the Greek alphabet (a, e, i, Fr., p, l, Phi, Oh ...), musical notes, etc.. How are you, obviously, know, just as well as a part of wide character strings, with equal success, and can meet the characters of the Latin alphabet (ASCII main table), with each such symbol will also hold 2 or 4 bytes (depending on the agreements adopted in the operating system), in contrast to the usual 1 bytes.

We perform a number of operations with the Russian lines, but writing them (till) in traditional form arrays char:


It would seem, what (nearly) everything is working precisely a textbook, and why do we need any extensive localized strings? But this deceptive illusion! The point here is, that some traditional functions lowercase (strcat(), strcpy(), strdup(), strstr() and etc.) will return the correct results. This is because they perform operations on bytes, byte by byte, without delving into the internal structure of characters to copy.

But other operations (and false результат strlen() it has clearly points) will not work properly: strncpy(), strchr(), strsep(), strtok() and etc. And they will create you a very unexpected results, very difficult to interpret. Lookthey how to work byte string Reverse, and how to distinguish his work on the English and Russian line:

It works so, and this definitely not that, what you expected to receive:

nand this concludes our discussion of the possibility of representing the Russian-speaking lines of traditional arrays char[] and the processing of their traditional functions in lower case, and complete this examination output: _y to earn with Russian lines as an array char шt is possible only:

and). or when we use string constants unchanged, only as a line for their input-output unchanged;

b). or for the treatment of their functions (library or their own), which do not take into account the internal structure of the characters, without delving into the is contentit is strings, and operate with them simply as a meaningless sequence of bytes.

In all Otherwise correctI work with the Cyrillic alphabet available only as a broad array of localized characters wchar_t (with completingm row wide null symbol L’′). To work with localized representation of lines of C library provides wide a set of line features, completely similar to the traditional functions of the lower case, but instead of the prefix str in their names prefixed with wcs: wcslen() instead strlen(), wcsncpy() instead strncpy() etc.

Let's see how it works on the example of:

This illustration is quite enough, to see direct analogies manipulation functions with symbols wchar_t. the, who has some experience working with strings char effortlessly spread it wide strings. Setting language locale (call setlocale()) O devices (terminal) obligatory, because the C / C ++ program sets the default locale “C” (andto historically), which allows output only 128 characters younger half of the 8-bit ASCII characters.

In the illustrated writing function sets the locale, used in the default operating system - I'm guessing, that we are experimenting in the Russian-speaking the installed system. The new language standard (C99) and introduces a new format for the string formatting functions (printf(), sprintf()) %ls, This format strings wchar_t[].

Tinternally as well, As with arrays char, converts to C ++ from C, C ++ library introduces a complete analogue of the container class string, but containing in their composition wide localized characters, and is known as the class wstring:

Here, the output string of localized characters (ws) must is output to the output stream wcout (similar in meaning cout, but other than cout).

In the illustrated writing: locale::global( locale( “” ) )This locale setting by default in C ++ OOP way, similar to, as it has been shown before in the manner of C.

Atpolls IO wide character strings (to the terminal or to a file) separate complicated subject, therefore consideration will be deferred to a single note on this subject.

Newsletter of programming:

Working with localized strings
5 (100%) 1 vote

About Olej

practical experience about software development 40 years. Teacher Global Logic international software company. IBM Developer Works Permanent author of publications. Scientific editor of the computer literature publishing house "Symbol-Plus", St. Petersburg.

Leave a Reply

Your email address will not be published. Required fields are marked *