MPQ (file format)

From HandWiki
MPQ
Filename extension.mpq
Developed byMike O'Brien

MPQ (Mo'PaQ, short for Mike O'Brien Pack, named after its creator[1]), is an archiving file format used in several of Blizzard Entertainment's games.

MPQs used in Blizzard's games generally contain a game's data files, including graphics, sounds, and level data. The format's capabilities include compression, encryption, file segmentation, extensible file metadata, cryptographic signature and the ability to store multiple versions of the same file for internationalization, platform-specific differences and patching. MPQ archives can use a variety of compression algorithms which may also be combined.

File indexing

In order to meet the requirements of speed generally demanded by a computer game, files are indexed in a hash table using a quick, low-collision hashing algorithm. The index of a specific file within the hash table is the hash of the lowercased filename modulo the size of the hash table, allowing for quick verification of a file's existence within the archive. If multiple files within the archive have the same hash, colliding entries will follow each other in increasing index order (forming a colliding hash cluster). In order to identify the exact entry for the requested file within a colliding hash cluster, each hash table entry stores 2 additional hashes of the lowercased filename, each using the same hashing algorithm but with a different seed value, as well as a locale code and platform code. The end of a colliding hash cluster is detected either by encountering an empty hash table entry or by traversing the entire hash table (including the modulo loopback) back to the initial hash table index.

Encryption

Both the block table (which contains information on where the file data is located in the archive) and the hash table used for file indexing are encrypted when stored. The encryption process which is used by default uses a known algorithm and uses the name of the encrypted file as a primer, which means it is impossible to decrypt a file without brute-forcing every possible filename or using a dictionary of known filenames.

Revisions

The file header reserves space to contain format version data. Warcraft III ignores format version data of .mpq compliant files it loads and assumes all are version 1.

  • Version 1 was used before World of Warcraft.
  • Version 2 added an extended header to the format which contained data for an extended block table to allow for larger archive sizes.

Archive metadata

MPQ archives do not have specific structures to store metadata beyond what is absolutely necessary to access archived files. Instead, the convention is to use regular files whose filename is enclosed by parentheses.

Below are known metadata files.

  • (listfile): Contains a list of the archive's files, one filename per line. May or may not be exhaustive.
  • (signature): Contains the weak cryptographic signature of the archive. This type of signature is deprecated.
  • (attributes): Contains extended file metadata. Currently known attributes are file creation date, CRC32 checksum and MD5 checksum.

Compression

In post StarCraft 1 MPQ archives, each segment (or sector) of a file can be compressed using a combination of compression algorithms. A header byte is prepended to every compressed sector to indicate which compressions were used. The order in which those compressors are applied is hardcoded. Unlike zip archives which can theoretically support different compression algorithms but have never used an algorithm different than deflate, mpq files have shown to use a wide varieties of compression techniques.

The following algorithms are currently in use by Blizzard games:

  • PKZIP (licensed from PKWARE). The first compression algorithm available.[2]
  • Huffman tree compression combined with ADPCM 4:1 compression (both introduced in StarCraft). Latter algorithm is lossy and only suitable for raw PCM input data.
  • zlib (introduced in Warcraft III).
  • bzip2 (introduced in World of Warcraft).
  • LZMA (introduced in StarCraft II).

Since there was only one compression algorithm available when MPQs were first deployed in Diablo, those archives used a different archive file metadata flag to indicate compression and did not use a compression header byte.

A blp1 file for example consists of a blp header with many null bytes followed by an embedded jpeg file, the first sector would be compressed but the following ones would be kept as is since jpeg is already a compressed format and can't be compressed furthermore.

Warcraft III cinematics

Cutscene cinematics with the MPQ extension are included with Warcraft III and, despite the file extension, are not actual MPQ files. Rather they are AVI files compressed with Blizzard's renamed MPEG-4 codec, BLZ0 (which actually is DivX). These files are playable in ordinary media players, provided the proper codecs are installed. Nevertheless, World of Warcraft cinematics use the AVI extension.

Usage in gaming

Blizzard has utilized the MPQ file format for archiving game files in a number of their games, including:

Replacement: CASC

On April 3, 2014, with the beginning of alpha testing for World of Warcraft, Blizzard announced that they were testing a new proprietary file format dubbed CASC (Content Addressable Storage Container) to replace MPQ in World of Warcraft. Among the improvements touted for it include a reduction in file corruption by creating a self-maintaining system, improved in-game performance and faster patching. The CASC format was initially tested in the internal alpha for Heroes of the Storm, and later in the alpha and beta tests for Warlords of Draenor before it is implemented within the main game itself prior to the expansion's release. [3] Both StarCraft II and Diablo III were later changed to using CASC for their main data.

Reverse engineering and libraries

This archiving format has never been documented by Blizzard and the use of encryption supposes it is not meant to be easily used by unauthorized third parties. The functions that deal with mpq archives in blizzard games used to be located in a shared library named storm.dll, so to crack open a mpq archive one would simply link this library in ones program. the first successful attempt to open a mpq archive without the storm dll was made in 1998[4] but the author didn't wish to share his source[4], so an open source C++ project was started to make it easy to read and write into mpq archives: stormlib[5]. Over the year this library became the reference point of the format, and every mpq library ultimately cites it and its documentation as its inspiration source, whether it is in C[6], Go[7], Python[8], PHP[9] or Java[10].

References