Reading from MySQL data with BLOBs dumped to CSV

As part of my recent work in upgrading our partner & customer ticketing system to make things sweeter for all our users I have been assigned a task involving importing data from a CSV that was generated using MySQL’s SELECT ... INTO OUTFILE. The problem was that the source table contained BLOB fields.

The resulting CSV was quite non-standard. MySQL simply does not allow to produce CSVs conforming to RFC 4180: it would not escape double quotes by duplicating them, but would use a dedicated escapement symbol (backslash \ by default); and when using that escapement symbol, it will escape byte 0x00 as two characters: \0 (backslash + character “0”). This needs a non-standard processing – e.g., Python’s csv module can’t restore the binary from such a file, no matter which encoding (like latin_1) you use to read file (Python’s csv module only works with files opened in text mode), or which settings you pass to reader. MySQL may be instructed to not use the escapement symbol (using FIELDS ESCAPED BY ''), and then zero bytes will be output as is, but then the double quotes (byte 0x22) inside all fields (including blobs) will be completely not escaped, making the CSV invalid and unreadable. MySQL should just duplicate double-quotes in this case, and it would work out of the box… But for now, I needed to read from the file with escaping by \ and all the quirks.

I hadn’t had time to look for a CSV reader capable of dealing with such a syntax (actually, I did a brief search, but of course not exhaustive). So I had to come with a home-grown Python code for this:

def readMySQL_CSVLine(binfile):
    rec = []
    field = bytearray()
    is_quoted = False # inside quoted field
    is_escaped = False # immediately after "\"
    is_after_quote = False # inside a quoted field, immediately after
                           # a double quote that is not an opening
                           # quote of a field
    while True:
        ch =
        if len(ch) == 0: # EOF
            if (is_after_quote): # There was an opening quote, then
                                 # closing quote, then EOF; the field
                                 # must be added, even if empty
            elif is_quoted or is_escaped: # Missing closing quote or a
                                          # character following "\"
                if is_escaped: # Invalid last "\"
                    field += b'\\'
            elif len(rec) > 0 or len(field) > 0: # Don't return records
                                                 # for empty last line
            return rec
        if is_escaped:
            is_escaped = False
            if ch == b'0':
                ch = b'\x00'
            elif ch not in b'\n"\\,': # "\" should only escape:
                                      # FIELDS ESCAPED BY,
                                      # FIELDS [OPTIONALLY] ENCLOSED BY,
                                      # FIELDS TERMINATED BY,
                                      # LINES TERMINATED BY,
                                      # and ASCII NUL.
                field += b'\\'
            field += ch
        if ch == b'\n' and (not is_quoted or is_after_quote): # EOL
            return rec
        if ch == b',' and (not is_quoted or is_after_quote): # End of field
            is_quoted = False
            is_after_quote = False
            field = bytearray() # no .clear() here, or else it will clear
                                # content inside rec
        if ch == b'"':
            if is_after_quote: # Normal case of two two consequent quotes -
                               # one escapes the other
                is_after_quote = False
                field += ch
            elif is_quoted:
                is_after_quote = True
            else: # not is_quoted
                if len(field) == 0: # First character in a field
                    is_quoted = True
                else: # a quote in the middle of a field that didn't
                      # start with quotes
                    field += ch
        if is_after_quote: # An invalid non-closing lonely quote in the
                           # middle of a quoted field
            is_after_quote = False
            field += b'"' # Add it as usual character
        if ch == b'\\':
            is_escaped = True
        field += ch

It is used like this:

with open('dumped_table.csv', 'rb') as csv_file:
    while True:
        rec = readMySQL_CSVLine(csv_file)
        if not rec:
        numeric_field, text_field, blob_field = rec
        numeric_field = numeric_field.decode('ascii')
        text_field = text_field.decode('utf-8')
        with open('path/to/file', 'w+b') as outfile:

This worked for me. Of course, it may easily be converted to be a proper iterator, but I just leave it here as is. Hope that this could be useful for someone, until MySQL implement an option to produce a standards-compliant export.

Soft edge effect on objects in LibreOffice

After implementing glow effect recently, we at Collabora Productivity also implemented soft edge effect for objects in Draw and Impress. And again, that was done thanks to SUSE who made it possible.

The relevant bug report‘s duplicate contains a sample that I use here for illustration. First, take a look at how it was before:

How it was in LibreOffice 6.4

Now let’s see what it looks like now:

How it looks in LibreOffice 7.0

And finally here is the reference look of the slide:


This will be available in the upcoming LibreOffice 7.0.

Glow effect on objects in LibreOffice

Thanks to SUSE who made this possible, now we have glow effect on objects in upcoming LibreOffice 7.0. Collabora Productivity engineers Tam√°s Bunth and myself together have implemented it for shapes and pictures.

Below are some screenshots of a PPTX slide with glow samples collected from the relevant bug report:

How it was in 6.4
How it looks like in master towards 7.0
Reference look

What puzzles me is why fontworks’ (right bottom) glow is not shown in the reference, although the effect is present in its properties. Somehow now LibreOffice seems to support glow in fontworks better ūüėČ

Glow on pictures is only implemented in Impress and Draw. Glow on shapes is available in all modules.

When Legacy Justifies Errors

Since forever, Basic in had a bug: it didn’t properly check closing parentheses in expressions like

FirstUpper = UCase( Left( sString, 1 ) + LCase( Mid( sString, 2 ) )

(example taken from this AskLibO question). Note the mismatch between opening and closing parentheses: four opening, and only three closing. The author missed one to close UCase function after Left( sString, 1 ), and any compiler would naturally point out to this common mistake – any but StarBasic in OOo and derivatives.

What StarBasic did was auto-closing the expression in the end when compiling. It would only complain if the result of such auto-closing would be ill-formed; sometimes (as in the example above), the result would be syntactically correct – but not necessarily semantically correct: the example above compiled with StarBasic did not what author expected (a text with first character capitalized, and the rest of the text in lowercase), but returned text in all caps instead. For some similar cases, such errors could be not easy to find in the absence of compiler check, especially in a large project.

This has been reported to LibreOffice bug tracker as tdf#80731 back in 2014; and it was addressed in 5.4 development cycle, with the fix backported into 5.3.1. A nice and correct fix, isn’t it?

Well, not actually. It turned out, that over the years, the amount of existing and actually used legacy code having the error has become so big, that it was unrealistic to make sure that all of it is checked and fixed. Of course, some errors were found in the code bundled with LibreOffice itself – and naturally, it was fixed. Some third-party extensions – quite a number of them – also happened to have it; and all authors who could be contacted, had released updated releases with the mistake corrected; thank you! But it wasn’t possible to test any extension out there; and besides publicly available and supported extensions, there were also unsupported (but still used, and useful) ones; and private ones (used by those who developed/paid for their development); and also uncountable macros outside of any extensions, and all of them having the error, that happily worked before, suddenly stopped working for their users … so after some time, the fix was reverted both from 5.3.3, and from still developing 5.4 (tdf#106529). By the way, I was enjoying reading “AltSearch extension put a bugfix release 1.4.2 to work around this bug” there, as if pointing to syntax error was actually a LibreOffice’s bug, not a mistake in the extension’s code.

So LibreOffice kept silently allowing the wrong syntax ever after 5.3.3, until now. Today I have reinstated the original fix by Pierre Lepage from 5.4, with one modification: the check is only active when compiling the code from within Basic IDE. What does it mean? It means, that for anyone writing a new code in the Basic IDE, the syntax error will be properly found and shown. When one opens an existing module in the IDE, and makes a modification (which would of course trigger recompilation), the existing errors in the same module (even in different routines) would be found, too. But if the code is compiled not from the IDE (as when a macro is executed from an event handler; or when an extension runs its code), the old permissive handling is kept, and the code with errors will continue working as before.

I consider this an acceptable compromise, which would both allow existing users of legacy code to keep using their code, and still prevent creation of new buggy code (and also help gradually cleaning up the existing errors in supported Basic programs).

The change will appear in coming version 6.4. It’s still to be seen if this change will survive, or will it also uncover a different can of worms, and be eventually reverted, as its predecessor ūüėä

XLSX interoperability: pivot tables-related improvements

Recently we at Collabora Productivity have made some substantial improvements to XLSX interoperability related to pivot tables, fixing many issues existed in Calc.

Personally I have committed these patches:

These changes allow our customers, and the whole LibreOffice user community, to enjoy better interoperability when using XLSX format. They will be available in LibreOffice version 6.3 later this summer; and they are immediately available for our customers in this week’s Collabora Office 6.0 update 28.

Thanks to our valuable customers who make these improvements possible funding the work!

Microsoft deprecates MSI

Well – obviously. At least, their current actions tell that: they deprecated CRT MSMs (which is reiterated in VS 2019 RC2 release notes), a technology designed to allow MSI-based installers to install the CRT libraries in a centrally-managed manner; and the only recommended way now is using vcredist executable, which is not MSI-compatible.

What else, if not deprecation, might it mean, when an installer technology made unable to deploy applications created using vendor’s own flagship development tool?

Well – I thought: maybe that was an oversight? Why not inform them about the problem that MSI-only installers would be left without any viable option?

So I did. And I got the answer that to me was a clear confirmation:

Sorry that our current plan is like this, if there are a lot of customer complain about this new change, we will revisit this issue

So – “yes, our plan is to make it impossible for MSI; this is not a problem in our eyes; only if we will experience pressure, may we re-think about it”.


Proper console mode for LibreOffice on Windows

LibreOffice has always supported usage of command line switches that allow operations like conversion of documents to different file types, or batch-printing. Using LibreOffice CLI in various scripts is a very common scenario.

But until now, it had somewhat suboptimal support for this on Windows. The main executable module –¬†soffice.bin¬†– being a GUI subsystem application, it could not properly output its messages to the calling console, as well as return error codes to check ERRORLEVEL for success. The¬†hacks used to redirect the output of the GUI application to the calling console were unreliable and didn’t work at all on some supported versions of Windows. Sometimes one could not even see why the entered command line was rejected as invalid.

I have just pushed a commit¬†that changes the situation. Now LibreOffice has proper console mode on Windows. soffice.bin¬†is now built for console subsystem, which allows using it in abovementioned scenarios, having the stdout and stderr output, as well as return code, properly sent to console (or redirected using normal means); in debug builds, the debug output is also visible on the console. To allow comfortable usage, a new console launcher executable is introduced,, in LibreOffice installation’s program/ folder, alongside with familiar¬†soffice.exe, which is retained for all GUI uses, as before. This allows to continue using command lines like
"c:\Program Files\LibreOffice\program\soffice" --convert-to odt file.doc
from cmd.exe¬†command-line interpreter, without specifying the executable extension, and have the¬†launched to have proper console operation (subject to value of PATHEXT environment variable). The command properly “owns” the console (does not return to command prompt) until soffice finishes.

The change will be available in LibreOffice 6.3 scheduled for Summer 2019 (if testing does not reveal a major problem which would require to revert this). I hope this will make use of LibreOffice CLI more comfortable for Windows users, on par with other platforms. If you find any problems with the solution, please report bugs to our bug tracker. Early testing using daily builds is much appreciated!