Difference between revisions of "Strings and their IDs"

From Nexus Mods Wiki
Jump to: navigation, search
m (Removed redundant numbering)
m (Correctly added images, adjusted spaces between major section, indentations)
Line 1: Line 1:
This article gives you a thorough overview of how adding new strings with a The Witcher 3 mod is handled by the radish modding tools. If you are unsure about the process, about IDs, their spaces and starts, or how to define custom strings for more diverse usage - you're at the right place!  
+
 
 +
This article gives you a thorough overview of how adding new strings with a The Witcher 3 mod is handled by the radish modding tools. If you are unsure about the process, about IDs, their spaces and starts, or how to define custom strings for more diverse usage - you're at the right place!
 +
 
 +
 
 +
 
 
= Overview =
 
= Overview =
  
Line 5: Line 9:
  
 
In this article, we'll cover several topics - but feel free to skip to the parts important to you. Every section is written in such a way that you should be able to gain information on the sub-topic without needing to re-read the whole article, if some part relies on another, it will be mentioned.
 
In this article, we'll cover several topics - but feel free to skip to the parts important to you. Every section is written in such a way that you should be able to gain information on the sub-topic without needing to re-read the whole article, if some part relies on another, it will be mentioned.
 +
 +
 
  
 
= Strings in RedEngine3 =
 
= Strings in RedEngine3 =
Line 18: Line 24:
 
The range of possible IDs is 0.000.000.000. ... 9.999.999.999 (= 10 digits). Wherever you need a string as a user, you will specify its ID or string key (more on that and hex keys in the next section), and the game will automatically look up the string value in the database.
 
The range of possible IDs is 0.000.000.000. ... 9.999.999.999 (= 10 digits). Wherever you need a string as a user, you will specify its ID or string key (more on that and hex keys in the next section), and the game will automatically look up the string value in the database.
  
Adding new strings works by creating a .w3strings file in your mod: If such a file contains entries, they are added to the joint ''database'' in the game. You could also overwrite an existing string by reusing the ID/hex key of that string and assigning your replacement.
+
Adding new strings works by creating a .w3strings file in your mod: If such a file contains entries, they are added to the joint ''database'' in the game. You could also overwrite an existing string by reusing the ID/hex key of that string and assigning your replacement. Since .w3strings files are encoded in a special way, you need the w3strings encoder to create them.
 
 
Since .w3strings files are encoded in a special way, you need the w3strings encoder to create them.
 
  
 
== An alternative to IDs: string/hex keys ==
 
== An alternative to IDs: string/hex keys ==
Line 30: Line 34:
 
As you can see, you can add a string key directly in the entry of a .w3strings file. Using such a key looks like this:
 
As you can see, you can add a string key directly in the entry of a .w3strings file. Using such a key looks like this:
  
[https://i.imgur.com/j4XXcvB.pngBut https://i.imgur.com/j4XXcvB.pngBut] what is the purpose of the hex key? One problem of using strings as a key to look up elements is, that they are (relatively) slow to process. This is especially problematic, if they are potentially used a lot. Therefore an optimization is used.
+
[[File:Str-ref-strkey.png|center|middle|upright|Example of how a string key is used in the engine.]]
 +
 
 +
What is the purpose of the hex key? One problem of using strings as a key to look up elements is, that they are (relatively) slow to process. This is especially problematic, if they are potentially used a lot. Therefore an optimization is used.
  
 
Upon encoding into .w3strings, any string key is ''hashed'' into a hex key, which is stored in the same entry. Hex keys are far more efficient for look up algorithms: Thus when we use a string key like "boss" in the picture above, the engine won't search for the "boss" in the database, but hash "boss" to its hex key and then search that. Since the hex key is stored alongside the string key, the right string will be retrieved.
 
Upon encoding into .w3strings, any string key is ''hashed'' into a hex key, which is stored in the same entry. Hex keys are far more efficient for look up algorithms: Thus when we use a string key like "boss" in the picture above, the engine won't search for the "boss" in the database, but hash "boss" to its hex key and then search that. Since the hex key is stored alongside the string key, the right string will be retrieved.
Line 43: Line 49:
  
 
A .w3strings file can thus be ''localized'' by creating several language versions of it (with the respective prefixes), where each version contains the same keys, but string values respectively translated string values. The engine will then automatically choose the right version of the strings based on the user's language setting.
 
A .w3strings file can thus be ''localized'' by creating several language versions of it (with the respective prefixes), where each version contains the same keys, but string values respectively translated string values. The engine will then automatically choose the right version of the strings based on the user's language setting.
 +
 +
 
  
 
= Strings in Radish Modding Tools =
 
= Strings in Radish Modding Tools =
Line 50: Line 58:
 
== Input format for encoding with w3strings.exe ==
 
== Input format for encoding with w3strings.exe ==
  
First, we will look at how w3strings.exe, an executable included in the modding tools, encodes a human readable list of strings into a .w3strings file.
+
First, we will look at how w3strings.exe, an executable included in the modding tools, encodes a human readable list of strings into a .w3strings file. The encoder (= w3strings.exe) accepts CSVs formatted in a certain way as input. A typical input CSV might look like this:
  
The encoder (= w3strings.exe) accepts CSVs formatted in a certain way as input. A typical input CSV might look like this:
+
[[File:String-csv.png|center|middle|Example of a w3strings.exe input CSV.]]<br/> The first line defines the language prefix which will be used as filename for the resulting .w3strings. The second line is a comment, which shows what each column means:Column one is the ID, column 2 is the hex-key. column 3 is the string-key, column 4 is the text, which is the actual string we want to add/replace.
  
[https://i.imgur.com/TpmrdWx.pngThe https://i.imgur.com/TpmrdWx.pngThe] first line defines the language prefix which will be used as filename for the resulting .w3strings. The second line is a comment, which shows what each column means:Column one is the ID, column 2 is the hex-key. column 3 is the string-key, column 4 is the text, which is the actual string we want to add/replace.
+
This format is a mirror of a .w3string file. The first line corresponds to the filename, from line 3 on every line corresponds to an entry of the encoded file (see 2.1). Therefore, the ID needs to have 10 digits. Under text you can write UTF-8 characters and things like line breaks (</br>) and for string keys anything in " a-z0-9_" is allowed. However, for IDs there are some additional constraints imposed by w3strings.exe.
 
 
This format is a mirror of a .w3string file. The first line corresponds to the filename, from line 3 on every line corresponds to an entry of the encoded file (see 2.1).
 
 
 
Therefore, the ID needs to have 10 digits. Under text you can write UTF-8 characters and things like line breaks (</br>) and for string keys anything in " a-z0-9_" is allowed.
 
 
 
However, for IDs there are some additional constraints imposed by w3strings.exe.
 
  
 
== ID spaces ==
 
== ID spaces ==
Line 83: Line 85:
 
In a radish project, there is a defined subset of places (quest production settings, journals, scenes), where custom strings are defined by the modder. Upon running e.g. full.rebuild.bat, the tools will ''automatically ''search those places for strings, assign IDs to them and generate string CSV files in the 'strings' dir where the assignments are listed:
 
In a radish project, there is a defined subset of places (quest production settings, journals, scenes), where custom strings are defined by the modder. Upon running e.g. full.rebuild.bat, the tools will ''automatically ''search those places for strings, assign IDs to them and generate string CSV files in the 'strings' dir where the assignments are listed:
  
[https://i.imgur.com/C5wklwl.pngAs https://i.imgur.com/C5wklwl.pngAs] mentioned in the image caption, the generated string CSVs are merged into one all.en.strings.csv, which is the input for w3strings.exe.
+
[[File:String-dir.png|center|middle|Example of a radish project's ./strings directory.]]<br/> The generated string CSVs are merged into one all.en.strings.csv, which is the input for w3strings.exe.
 
 
Since the automatic generation of IDs is done separately we need to specify the idspace chosen for the project in each place again.
 
  
Having different parts begs a question: We know from section 3.2 that via prefix and idspace we can assume to have a unique range of IDs to use for our project. But how can we ensure the IDs to be generated distinct internally in the different places? This is the motivation for the ''idstart'' setting, which must be specified for each place. Let's look at a generic radish string ID:
+
Since the automatic generation of IDs is done separately we need to specify the idspace chosen for the project in each place again. Having different parts begs a question: We know from section 3.2 that via prefix and idspace we can assume to have a unique range of IDs to use for our project. But how can we ensure the IDs to be generated distinct internally in the different places? This is the motivation for the ''idstart'' setting, which must be specified for each place. Let's look at a generic radish string ID:
  
 
211nnnnxxx
 
211nnnnxxx

Revision as of 21:34, 2 June 2022

This article gives you a thorough overview of how adding new strings with a The Witcher 3 mod is handled by the radish modding tools. If you are unsure about the process, about IDs, their spaces and starts, or how to define custom strings for more diverse usage - you're at the right place!

 

Overview

The radish modding tools are capable of encoding custom strings by means of the w3strings encoder which is included in the main tools package. As all other encoders, it doesn't need to be used directly, but is automated to a larger extent with the project template, its setup of batch scripts and in combination with the other encoders. For example, you don't need to extract every string specified in you scenes by hand. On the other hand in order for everything to work smoothly, some settings, name idspace and idstart need to be configured correctly and custom strings aside from the automatically extracted ones have their own workflow.

In this article, we'll cover several topics - but feel free to skip to the parts important to you. Every section is written in such a way that you should be able to gain information on the sub-topic without needing to re-read the whole article, if some part relies on another, it will be mentioned.

 

Strings in RedEngine3

We start with some background info on how strings are stored in the engine and what our options as modders are.

On .w3strings files and IDs

All strings used in the game and any additional mods/DLCs are stored in a database consisting of (several) .w3strings files. Every file consists of entries, where an entry has the following elements:

 a unique ID as key, an optional hex key + corresponding string key, and a string as value

The range of possible IDs is 0.000.000.000. ... 9.999.999.999 (= 10 digits). Wherever you need a string as a user, you will specify its ID or string key (more on that and hex keys in the next section), and the game will automatically look up the string value in the database.

Adding new strings works by creating a .w3strings file in your mod: If such a file contains entries, they are added to the joint database in the game. You could also overwrite an existing string by reusing the ID/hex key of that string and assigning your replacement. Since .w3strings files are encoded in a special way, you need the w3strings encoder to create them.

An alternative to IDs: string/hex keys

Referencing a string by ID, which is very straightforward, has a downside: It is hard to know which string is used when looking at a referencing ID and it's easy to make a mistake by typing a wrong digit as a user. This is why a second mechanism exists, which allows referencing strings by string keys. Recall what an entry contains:

 a unique ID as key, an optional hex key + corresponding string key, and a string as value

As you can see, you can add a string key directly in the entry of a .w3strings file. Using such a key looks like this:

Example of how a string key is used in the engine.

What is the purpose of the hex key? One problem of using strings as a key to look up elements is, that they are (relatively) slow to process. This is especially problematic, if they are potentially used a lot. Therefore an optimization is used.

Upon encoding into .w3strings, any string key is hashed into a hex key, which is stored in the same entry. Hex keys are far more efficient for look up algorithms: Thus when we use a string key like "boss" in the picture above, the engine won't search for the "boss" in the database, but hash "boss" to its hex key and then search that. Since the hex key is stored alongside the string key, the right string will be retrieved.

There is one major caveat to this: The hashing procedure can produce collisions. This means, it is possible that two different string keys are hashed to the same hash key. Unfortunately we cannot catch this error, so only the advice remains, to use string keys only when necessary.

Localization

There is one last aspect to the strings concept of the engine and that is defining separate strings for several languages. The idea here is simple: Each .w3strings has a given language associated to it. The language is specified by using abbreviations like "en", "de" as a prefix for the .w3strings filename:

LANG_PREFIX.w3.strings

A .w3strings file can thus be localized by creating several language versions of it (with the respective prefixes), where each version contains the same keys, but string values respectively translated string values. The engine will then automatically choose the right version of the strings based on the user's language setting.

 

Strings in Radish Modding Tools

The previous three sections discussed how strings are handled in RedEngine3 using .w3strings files as a database. Now we will proceed to how the radish modding tools are built and geared towards that concept to allow defining and encoding custom strings in a user-friendly way (relatively, that is).

Input format for encoding with w3strings.exe

First, we will look at how w3strings.exe, an executable included in the modding tools, encodes a human readable list of strings into a .w3strings file. The encoder (= w3strings.exe) accepts CSVs formatted in a certain way as input. A typical input CSV might look like this:

Example of a w3strings.exe input CSV.

The first line defines the language prefix which will be used as filename for the resulting .w3strings. The second line is a comment, which shows what each column means:Column one is the ID, column 2 is the hex-key. column 3 is the string-key, column 4 is the text, which is the actual string we want to add/replace.

This format is a mirror of a .w3string file. The first line corresponds to the filename, from line 3 on every line corresponds to an entry of the encoded file (see 2.1). Therefore, the ID needs to have 10 digits. Under text you can write UTF-8 characters and things like line breaks (</br>) and for string keys anything in " a-z0-9_" is allowed. However, for IDs there are some additional constraints imposed by w3strings.exe.

ID spaces

The w3strings.exe encoder intentionally restricts the IDs usable for an encoding of one CSV (= the ID space) in order to "revent multiple mods to overwrite strings of other mods". To ensure this, two conditions are required:

  1. any ID used as input for the encoder needs to start with 211
  2. IDs used in one input CSV need to specify an idspace = nnnn, and fulfill that any ID inside follow the format 211nnnnxxx with xxx being free to use.

Condition 1. ensures that modders not using this encoder can avoid conflicts with w3strings.exe generated encodings by simply choosing an ID prefix unequal to 211.

Condition 2. ensures that modders using this encoder can avoid conflicts among them, if they use different idspaces. A way to do this proposed by the creator is using the mods Nexus ID as idspace.

As an example: The CSV file shown in section 3.1 only contains IDs prefixed with 211 and with idspace = 1410. Thus this mod is guranteed conflict-free with any non-encoder-using authors who choose a prefix unequal to 211 and any encoder-using authors who choose a different idspace.

Automated string extraction & encoding with the Project Template | idstart

The radish modding tools come with a tailored project template, a folder of specialized folders containing batch scripts for automated encoder calls, ressource collection and mod/dlc deployment. Since the project template (naturally) uses w3strings.exe (though hidded by a chain batch script calls), we need to define the idspace for the input CSV that will be generated for the project. See the previous section for information on idspaces.

First of all, you need to specify the idspace in "./_settings_.bat" - this value is used as validation input for w3strings.exe - it checks that every ID used in the project has indeed the same idspace. Now on to the places, where strings are used.

In a radish project, there is a defined subset of places (quest production settings, journals, scenes), where custom strings are defined by the modder. Upon running e.g. full.rebuild.bat, the tools will automatically search those places for strings, assign IDs to them and generate string CSV files in the 'strings' dir where the assignments are listed:

Example of a radish project's ./strings directory.

The generated string CSVs are merged into one all.en.strings.csv, which is the input for w3strings.exe.

Since the automatic generation of IDs is done separately we need to specify the idspace chosen for the project in each place again. Having different parts begs a question: We know from section 3.2 that via prefix and idspace we can assume to have a unique range of IDs to use for our project. But how can we ensure the IDs to be generated distinct internally in the different places? This is the motivation for the idstart setting, which must be specified for each place. Let's look at a generic radish string ID:

211nnnnxxx

Here nnnn = idspace, which must be the same for all places and _settings_.bat. However, if xxx starts at 0 for each place, then we'll have a conflict. Thus, the idstart setting, which is an offset added to xxx, has to be manually adjusted to a different value for each place. If we for instance specify idspace = 100 for quest production and idspace = 200 for our first scene, then those won't conflict if there aren't more than 99 strings relying on the quest production settings.

Thus you need to set the same idspace and different idstarts in these places:

  • the quest production settings (./definition.quest/prod.quest-newquestproject.yml), which provide IDs for e.g. quest name and caption, journal strings
  • each scene (./definition.scenes/scenes.*.yml), where IDs are needed for the lines spoken by actors
    In a scene dumped from the storyboard UI mod, you will find the idspace/idstart settings under the "production" section.

Note that usually idstarts in 100-intervalls are a good default. However, if you have more than nine places using strings to cover (which is mostly caused by scenes), you might need to adjust the idstarts in a more customized way.

Custom strings in the 'strings' directory

In the ./strings folder of radish project templates string CSV files are stored to be used as input when the time for encoding has come (see section 3.1). As discussed in the previous section, string CSVs are automatically generated in this dir for each place (e.g. scenes). But what if you want to define strings which aren't covered by the automatic generation?

In this case, you can simply add another strings CSV file to the ./strings directory. Be sure, that each entry you specify has an ID that is unique in your project - and also be sure that you don't choose a filename that would be written by the automatic generation (e.g. avoid 'all.en.strings.csv').