TIL the Python standard library struct module defaults to interpreting binary strings using the endianness of your machine.
Which means that this code:
def decode_matchinfo(buf):
# buf is a bytestring of unsigned integers, each 4 bytes long
return struct.unpack("I" * (len(buf) // 4), buf)
Behaves differently on big-endian v.s. little-endian systems.
I found this out thanks to this bug report against my sqlite-fts4 library.
SQLite doesn't change the binary format depending on the endianness of the system, which means that my function here works correctly on little-endian but does the wrong thing on big-endian systems:
Update: I was entirely wrong about this. SQLite DOES change the format based on the endianness of the system. My bug fix was incorrect - see this issue comment for details.
On little-endian systems:
>>> buf = b'\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00'
>>> decode_matchinfo(buf)
(1, 2, 2, 2)
But on big-endian systems:
>>> buf = b'\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00'
>>> decode_matchinfo(buf)
(16777216, 33554432, 33554432, 33554432)
The fix is to add a first character to that format string specifying the endianness that should be used, see Byte Order, Size, and Alignment in the Python documentation.
>>> struct.unpack("<IIII", buf)
(1, 2, 2, 2)
>>> struct.unpack(">IIII", buf)
(16777216, 33554432, 33554432, 33554432)
So the fix for my bug was to rewrite the function to look like this:
def decode_matchinfo(buf):
# buf is a bytestring of unsigned integers, each 4 bytes long
return struct.unpack("<" + ("I" * (len(buf) // 4)), buf)
Bonus: How to tell which endianness your system has
Turns out Python can tell you if you are big-endian or little-endian like this:
>>> from sys import byteorder
>>> byteorder
'little'