BMG
The BMG file format is designed to store a game's "messages", which really refers to any piece of text that may be displayed by the game. Games such as Wind Waker and Twilight Princess use multiple such BMG files to organize all the text they'll display to the player.
This file format is a part of JSystem, and thus is used in many games beyond these two Zelda titles. The file format is tailored to each game's needs, and is designed in such a way that software can be written to handle any game's BMG files generically. This page goes over the generic structure of BMG files, with additional pages set aside to detail info specific to Wind Waker or Twilight Princess.
Contents
File Format
A BMG file is designed as a sequence of sections, each handling some aspect of the game's text display needs. A file header describes some very basic information about the BMG file (including, critically, the number of sections), which is immediately followed by the first section. Both the header and each section is padded to a 32-byte boundary. It seems that sections may be required to be in some particular order, though this possibly differs between games. For Twilight Princess at least, it seems the INF1 section must come first, and the FLI1 and FLW1 sections must come after all the others, but otherwise how strict the ordering is is unknown.
Due to the CPU architecture of the GameCube and Wii, all multibyte integers are in big-endian format, and that is assumed throughout the rest of this article.
File Header
The header contains basic information about the file, including its partial size (see below), the character encoding used for message strings, and the number of sections that follow the header.
Offset | Size | Meaning |
---|---|---|
0x00 | 8 | Magic string MESGbmg1 , identifies the file as as BMG file.
|
0x08 | 4 | Size of the file, up to but not including the FLW1/FLI1 sections. It's not clear why this is, or if other games set this value to the same sort of partial filesize. |
0x0C | 4 | The number of sections in the file. Generic code can use this to ensure it finds all the sections in the file. |
0x10 | 1 | Enumeration determining the character encoding of the file's message strings. Values listed below. |
The character encoding can be one of five possible values:
Value | Meaning |
---|---|
0x00 | Doesn't actually specify an encoding. Indicates that this is actually an older version of the BMG format, which is not necessarily described by this page. |
0x01 | Windows-1252, or at least something close enough to it. Twilight Princess does not render every character correctly, but these are likely not intentional alterations to the character encoding. |
0x02 | UTF-16 (presumably big-endian?) |
0x03 | Shift-JIS |
0x04 | UTF-8 |
Wind Waker and Twilight Princess only use Windows-1252 and Shift-JIS, the other values have been reported in other games. Note that, no matter the encoding, character 0x1A
is special, and must be properly handled by any software handling them. See the section on the text format below for details.
Sections
Each section has a common header format, which is followed by whatever data it contains (usually, but not always, it starts with its own specific header). The generic section header is as follows:
Offset | Size | Meaning |
---|---|---|
0x00 | 4 | Magic string identifying what section it is. Always four characters long, not null-terminated. |
0x04 | 4 | Padded size of the section, which includes this header. The last section of a file may omit padding bytes, but they are still counted. |
The following subsections describe each section in detail. Each section is named after its magic string, and describes its contents ignoring the generic header (that is, offsets are relative to the start of section data).
INF1
— Message Information
The INF1 section describes all the messages in the BMG file, pointing to where its text is in the DAT1 section, as well as holding each message's attributes — data the game has decided to store with each message, usually to control how the game uses the message. This information is stored as an array of entries, and several other parts of the BMG file may refer to messages by indexing this array; this is referred to as the "INF1 index" where it comes up for clarity.
An INF1 section starts with a header of its own, as follows:
Offset | Size | Meaning |
---|---|---|
0x00 | 2 | Number of entries in this section. Any data following the last entry is meaningless. |
0x02 | 2 | The size of a single entry. Since each entry's first four bytes are standard, this can also be seen as the number of attribute bytes in an entry + 4. |
0x04 | 4 | Four bytes of padding. |
What follows is the array of entries, each of which has a very simple generic format:
Offset | Size | Meaning |
---|---|---|
0x00 | 4 | Offset into DAT1's data, pointing to where the message's text starts. |
0x04 | depends on INF1 header | Enough bytes of attribute data to match the stated size of an entry. |
Naturally, if you want to do anything interesting with the attribute data, you'll need to know how the game it comes from has defined it. Each game is free to set up this attribute data however it wishes, and may even use different formats in different BMG files. We have specific notes on Wind Waker's message attributes and Twilight Princess's message attributes in their respective detail pages.
DAT1
— Message Data
The DAT1 section contains the actual text belonging to all the messages. The character encoding of each string is as described in the BMG file header. This section contains no header of its own, it is merely a sequence of null-terminated strings. Text is found by looking up its starting offset in the INF1 section, and ends with the first null character encountered. However, you cannot simply search for a 0x00
byte to find the end of the string, because BMG's tag codes may themselves contain null bytes. See the section on the text format below for details.
MID1
— Message Indices
This section assigns ID values to each message, something which is distinct from its INF1 index. Not all games make use of this information, and thus may not have this section. This is a very simple section, consisting of a very simple header followed by a list of IDs. This section's own header is simply:
Offset | Size | Meaning |
---|---|---|
0x00 | 2 | The number of IDs listed in this section. Ought to be the same as the number of messages in this file, but unsure if this is ever enforced. |
0x02 | 6 | Padding bytes. |
Immediately following the header is a list of unsigned 32-bit integers. Each one is an ID assigned to its respective INF1 entry; that is, the first ID belongs to the first INF1 entry, the second ID to the second INF1 entry, and so on. Thus, this section forms a map of INF1 indices to message IDs, with the INF1 index implied by the position of the ID in this array.
STR1
— String Pool
This section is only found in one file thus far, Twilight Princess's zel_unit.bmg
, and is used to store a pool of null-terminated strings. It's unknown what encoding it uses, if it supports 0x1A
codes, or why it would be used instead of the seemingly equivalent DAT1 section.
FLW1
— Message Flow
To do: How much of this is truly game-agnostic, vs. specific to Twilight Princess? |
This section, which does not occur in all BMG files, describes how one message may flow into another. This is used to chain together different messages as a larger unit. (For example, a single conversation between two characters in a game could divide it up as one message for each time the conversation switches to a different character, and the flow would chain these together as the single conversation they make up.) This section consists of a header and two lists of data; first is a list of nodes, and next is an indirection table.
This section's own header is as follows:
Offset | Size | Meaning |
---|---|---|
0x00 | 2 | Number of flow nodes in the first list. |
0x02 | 2 | Number of entries in the indirection table. |
0x04 | 4 | Padding bytes. |
Flow Nodes
Immediately after the header is the list of flow nodes. Each node is effectively a union of three different types of nodes, with the first byte determining what type of node it is. No matter the type, each node is eight bytes long. The three types of nodes are
- Continuation nodes
- Tells the game to use a given message and move on to the next node
Offset | Size | Meaning |
---|---|---|
0x00 | 1 | Constant 0x01 to indicate continuation node.
|
0x01 | 1 | "Door query parameter". Meaning unclear. |
0x02 | 2 | INF1 index of a message to use. |
0x04 | 2 | Flow node index, pointing to the next node. 0xFFFF terminates the chain. Does not use the indirection table, unlike the other node types.
|
0x06 | 2 | Unused, only exists to match the size of other node types. |
- Branch nodes
- Tells the game to call a function to determine which node to move to next
Offset | Size | Meaning |
---|---|---|
0x00 | 1 | Constant 0x02 to indicate branch node.
|
0x01 | 1 | "Door query parameter". Meaning unclear. |
0x02 | 2 | Indexes some external list of "query functions" to select one to run; used to determine how to branch. |
0x04 | 2 | A parameter that is supplied to the chosen query function. |
0x06 | 2 | Base offset into the indirection table. The query function ultimately alters this offset to get the next flow node index from the indirection table. |
- Event nodes
- Triggers an event, then moves on to another node
Offset | Size | Meaning |
---|---|---|
0x00 | 1 | Constant 0x03 to indicate event node.
|
0x01 | 1 | Indexes some external list of event functions to select one to run. |
0x02 | 2 | Index into the indirection table, which provides the next flow node to use. |
0x04 | 4 | Four bytes of data to provide as arguments to the event functions. How they are used depends on the event function. |
Note that sometimes a flow node list may include an entry of all zeroes (which would naturally give it a type byte of 0x00
); it's unclear what these are for, perhaps padding, but they are counted as a node like any other, and almost certainly has no effect.
Indirection Table
Immediately after the flow nodes is a list of indices into that prior list of flow nodes. This is used by branch and event nodes to indirectly refer to other nodes. This is merely a list of unsigned 16-bit integers, each an index into the flow nodes. An index of 0xFFFF
means the chain of flow nodes is terminated.
FLI1
— Flow Indices
To do: How much of this is truly game-agnostic, vs. specific to Twilight Princess? |
This section deals with assigning ID values to flow nodes stored in the FLW1 section (and would obviously only exist if FLW1 does). It essentially does the same thing the MID1 section does for messages. It starts out with a similarly simple header:
Offset | Size | Meaning |
---|---|---|
0x00 | 2 | The number of IDs listed in this section. |
0x02 | 6 | Padding bytes. |
Following this header is the stated number of flow ID mappings. Each mapping is a simple key-value pair, stored as follows:
Offset | Size | Meaning |
---|---|---|
0x00 | 2 | The ID value to assign to a flow entry. |
0x02 | 2 | Unknown, possibly padding. |
0x04 | 2 | An index into the FLW1 section's flow node list, pointing to the flow node with this ID. |
Text Format
Message text is stored according to the encoding specified in the BMG file header, but regardless of encoding each makes use of tags, as described below. A program wishing to grab text strings from a BMG file must read strings by checking a byte at a time to catch and interpret tags, since they may contain null bytes that would otherwise prematurely end the string. Sequences of literal text between text can of course be handled as per the specified encoding.
Tags
Tags are used by the BMG format to introduce special elements into the text. These tags allow the game to define all kinds of special glyphs, special strings of text, or ways of formatting and controlling text. While each game defines its own set of tags, tags have a universal format that makes it easy for a generic BMG file handler to grab them. The format of a tag is as follows:
Offset | Size | Meaning |
---|---|---|
0x00 | 1 | Constant 0x1A to introduce a tag. (Fittingly, this is the "substitute" control code in ASCII.)
|
0x01 | 1 | Number of bytes in the tag, including itself and the 0x1A . (This means a tag must have a minimum size of 5, considering the following fields as well.)
|
0x02 | 1 | The "tag group". Tags are grouped for ease of organization, and this byte chooses which group this tag belongs to. |
0x03 | 2 | The "tag number". This selects a tag within the specified tag group. |
0x04 | tag size - 5 |
Whatever additional bytes of data the tag needs as arguments. Generically referred to as the "payload" here. |
Note: Even though tags report their size, games may assume that particular tags have a particular size of payload, and just grab the next however many bytes it wants after the tag number. The presence of a size value does not imply that tags can accept a variable number of arguments, it merely allows for generic processing of message strings.
The tag group and tag number together uniquely identify the tag's purpose and functionality. See the Wind Waker and Twilight Princess detail pages for more about each game's set of tags.