Windows Unicode API usage in LibreOffice

Windows still provides two sets of many of its Win32 API functions taking or returning strings: a legacy “Ansi” (functions named like fooA) and Unicode (named like fooW; available since Windows NT, and in Windows 95 with Layer for Unicode – and thus on any Windows OS supported by LibreOffice).

The “Ansi” functions take 8-bit strings in current codepage (single- or multibyte). The repertoire of characters representable in those strings is, naturally, limited to that codepage (that is either setup in system’s Language for non-Unicode programs, or explicitly set by running application). Unfortunately, unlike in other contemporary OSes, Windows doesn’t allow setting its locale to use UTF-8. If a string arrives to such a function that contains characters outside of that set, the string content will be altered, and functions’ behaviour might change unexpectedly.

“W” versions of those functions take UCS-2 strings, that are able to represent most of Unicode range (I am unsure if those strings are actually UTF-16, and so are able to represent the full Unicode repertoire, but anyway, even UCS-2 is much wider than most of single- or multi-byte codepages).

In last two weeks, we have replaced many places (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D) in LibreOffice codebase where legacy “A”-functions were still used, with explicit calls of their “W”-counterparts, removing redundant conversions of strings from LibreOffice internal UTF-16 string representation to 8-bit strings and back. One of most significant effects might be on file-management functions, where such conversions could alter paths/names containing Unicode characters not representable in currently selected 8-bit codepage, and lead to failed file operations. One example of such problems is tdf#103525.

The changes are included into master towards 6.0.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s