HQR compression

From LBA File Information
Jump to: navigation, search

This compression format is used in High Quality Resources (HQR, VOX, ILE and OBL files) found in LBA games.

The compression types (both 1 and 2) are variants of LZSS (Lempel-Ziv-Storer-Szymanski) compression, based on the LZ77 algorithm by Abraham Lempel and Jacob Ziv.

The compressed data is divided into blocks. Each block can be 9 ~ 17 bytes long. Each block is consists of:

  • flag byte - always the first byte of the block,
  • sub-blocks - there are always 8 sub-blocks, except for the end of the file, where there may be less.

Flag byte[edit]

Flag byte describes the sub-blocks that are in the same block. Each bit of the flag byte describes type of the appropriate sub-block. Least significant bit (LSB) of the flag byte describes the first (closest) sub-block after the flag byte. Most significant bit of the flag byte describes the 8th sub-block after the flag byte.

If a bit in the flag byte is 1, the appropriate sub-block is a regular data byte. In this case the sub-block's length is one byte, and it should be copied directly to the end of the current output (uncompressed) data as it is.

If a bit in the flag byte is 0, the appropriate sub-block is an address. In this case it is two bytes long, and the string described by its content should be copied to the end of the current output (uncompressed) data.

Address sub-blocks[edit]

Address sub-blocks are WORDs in Intel notation (Little Endian).

Their bit meaning follows:

  • bits 15 ~ 4 - address of the string to be copied. It is the number of bytes to go back in the output data to copy the string from. If the value is 0, the address points to the last byte in the current output data.
  • bits 3 ~ 0 - length of the string to be copied. The actual length is this number plus 2 or 3 (2 for compression type 1, 3 for compression type 2 - this is the only difference between compression types). In general the length in bytes is:

length = (bits 3~0) + (compression type) + 1

The string described by the address sub-block begins as the byte pointed to by the address part, and contains the consecutive length number of bytes.

Remarks[edit]

  1. At the end of the file, if there is not so much data to fill all 8 sub-blocks, only the valid blocks are present, and the file ends with the last one that describes valid data. Bits in the flag byte that apply to the non-existent sub-blocks are set to 1.
  2. It is possible that one source byte will be copied several times in a single step, for example when address bits in an address sub-block point to the last byte of the output data (value of 0). In such case the algorithm is following: the first byte of the string to copy is the last byte in the output data, let's assume that the address is 0x035 and final length of the string to copy is 4 bytes. The algorithm meeds to take the byte at address of 0x035, and copy it at the end of the output data (position 0x036). Then it takes the next byte of the string to copy, which now exists, and whose address is 0x036, and copy it to the new position of 0x037, and so the same byte will be copied four times. This always applies when the address sub-block describes a string that ends beyond the current output data end.