Radish Trial 6 Advanced Scenes

From Nexus Mods Wiki
Revision as of 16:14, 14 May 2022 by Culorin (talk | contribs) (+ note on multiple scene exits and their connection to quest editor)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
The trial of the radishes is meant as a guided, self-learning tutorial without step-by-step instructions. Instead it focuses on exploratory learning by actively using the tools to solve increasingly challenging tasks.

>> Trail 6 focuses on creating more advanced scenes <<.

This article provides some background information and various tips required or helpful to accomplish these objectives.

See forum post for more information about the trials.

Trial 6 - Objectives

You already created simple scenes which concentrated on the content (e.g. dialog and choices). In this trial the focus will be on the presentation and on using custom audio for scenes, including setting up the lipsync animations.

Your task is to:

  • create a new scene which tries to follow some cinematography guideline (see cinematography infos)
  • custom voicelines (record something yourself or just use some voicelines from outside the game) should be used by some actor and lipsync animatons should work
  • some advanced storyboard features should be used, e.g. using a sword-drawing animation and attaching the sword into the actors hand at the appropriate timing
  • a scene should contain at least one camerablend
  • a scene should contain an item or actor placement interpolation, e.g. some looped walking, horse riding
  • some weighted mimic and/or additive animations should be used
  • it should contain at least one timed effect activation, e.g. teleport in or out of one actor which would include visibility changing, too

Some background Information

In the previous trial the focus was set on interactive dialogue scenes and the dialogue flow. Now it’s time to concentrate on the presentation, that is: fine-tune animations, mimics, voicelines and camera framing. This will not only make dialogue scenes more interesting but also allow to create more engaging cinematic cutscenes which present questprogress without player interaction.

Cinematography Techniques

Adjusting animations and camera framing is only one part of the story. A cinematic scene should also have some visual structure to support the narrative. This a very broad topic and way beyond the scope of this article. As a start read at least this short introduction which touches some aspects as this article only focus on the technical side.  

Rule of Thirds

One simple cinematography technique which is also mentioned in the above introduction is the “Rule of Thirds”. It’s a rule of thumb on how to frame the shot to make the composition look more interesting: the camera frame should be dived by two equally spaced horizontal and vertical lines and important elements of the scene should be placed along those lines or their intersections.

Storyboard UI supports this by providing a Rule-of-Thirds overlay in the camera mode:

sbui thirds-rule no guides
sbui thirds-rule guides

The overlay can be toggled on and off with a hotkey.

Moving Cameras

In addition to a visually pleasing frame composition and varying between appropriate static shot types, scenes can also benefit from carefully defined camera movements. In the Witcher 3 scene system this works by defining two or more key frames (meaning a specific camera position and its direction) and letting the scene system smoothly interpolate the camera position and direction (and other parameters) automatically between the key frames.

Storyboard UI does not provide any direct support or preview for camera setting interpolation. Instead it is required to define separate shots with static cameras in SBUI and afterwards connect these cameras in the scene yml definition by setting up the first camera as a start cam and the second one as the end cam. In the encoder this is called camera blends and an example definition may look like this:


cam blend example

In this case the camera blend starts in the element shot_1 and ends in the element shot_3 because it spans multiple voicelines which have to be separate elements. But that is not necessary: it’s possible to define multiple (but not overlapping!) camera blends in one element, e.g. a long pause element without any voicelines. However camera blends cannot span dialog sections boundaries and cannot be defined in choice sections.

Depending on the defined duration (that is the time between the key frames) the interpolation between the camera parameters will be slower or faster. Most of the time only a subtle, very slow movement will improve the scene while fast changes will most likely draw too much attention from viewers and distract from the actual content.

Be aware that smoothly also means that interpolation between more than two key frames (which can be inserted between start and end) will be a curve that may overshot the key frame significantly if not carefully defined. Some experimentation will be necessary.

One simple encode-able example can be found in the docs.scenes/test.examples directory (test_storyboard_cam.blend.yml) which also explains the parameters. The “bigbunny.example” in docs.scenes is an example of a camera flyby with multiple keyframes.

Depth of Field and Focus

An advanced cinematography technique is to adjust the depth of field (DOF). It’s the area in front of the camera that appears sharp and can be used to direct the attention of viewers to certain elements and to blur out unimportant ones, e.g. crowds of npcs surrounding the main actor(s) of the scene.

Here is are three example shots of the same scene with different DOF settings to visualize the effect (click on the image to see the video with rapid switches between the dof settings):


dof screens

The camera definitions logged from Storyboard UI contain a default DOF setting:


dof definition default

But in most cases it should be adjusted (or at least tuned down) as it makes the encoded scenes “blurry” if the actors are not in the focused sweet spot area.

Both (blur and focus) settings are specified by two distance values (near, far) which (as a simplified analogy) define the range from the camera position that the blur and focus is centered around. So to put some actor(s) (or the interesting scene element) into focus and blur out the near and far parts of the scene the settings should be spaced roughly like this:

[ ... blur near ... [focus near ... actor(s) ... focus far] ... blur far ... ]

It’s possible to change the values individually in SBUI but most of the time it is easier to use the rough (!) automatic DOF centering (see hotkey help). It tries to adjust the settings to “somewhat adequate” values that put the selected actor into the sweet spot of the current shot.


dof definition auto

Though the settings can be tweaked individually afterwards, too.

Unfortunately due to technical limitations there is only limited support to preview the effect in SBUI. As a prerequisite for the preview to work at all no active layered environment definitions are allowed (for example no weather environment must be active). Even in this case the preview in SBUI works only partially and only the ‘far’ plane is blurred but not the ‘near’ area. However it is possible to deactivate layered environments with the console command envui_disable_envs() which is part of the envui package. But make sure the depth of field is not switched off in the games post-processing settings!

As a side note it’s also noteworthy that the DOF settings like all camera parameters will be interpolated in camera blends which can be used to create interesting visual effects.

Scene (Definition) Tuning

As briefly hinted in the dialog scene tutorial (see docs.scenes/tutorial) the ‘storyboard’ part of a radish scene definition is responsible for the actual presentation of a scene. Most of the tuning for the presentation has to be done in this part of the definition. But it’s important to understand where settings have to be added or changed, first.

For every section in the dialogscript part of the scene definition, a dedicated, equally named section in the storyboard part can be defined: it allows to attach a variety of different ‘scene events’ for specific “dialogline” or “pause” elements of the dialogscript section.

All dialogscript elements can be referenced in the storyboard part either positional (by their position in the respective section, e.g. PAUSE_1 for the first pause in the section or GERALT_2 for the second textline of the actor named GERALT) or by the name of an additional, prepended named HINT/CUE definition (see example below). Most of the time (e.g. in all Storyboard UI dumped definitions) the named CUE references should be preferred as they don’t rely on the order of elements and also improve the readability of the storyboard.


storyboard positional references


The previously described camera blends are an example for one type of scene events related to cameras (but there are many other types).

Every scene event type has a different purpose and most have different settings that can be adjusted.

See the short example definitions in the ‘docs.scenes/test.examples’ directory as a reference for of all supported scene events and their respective settings.

Of particular interest are the animation events (normal, additive and mimic animations): although adding custom animations is not supported by the radish modding tools, Witcher 3 has a lot of reusable animations that can be further tweaked by some settings to customize them for the scene. For example animations can be ‘clipped’ to just playback a specific part, have a smooth transition between previous and/or next or idle animation (‘blend’), the intensity can be adjusted by a ‘weight’ parameter and animations can be slowed down or played faster with a ‘stretch’ parameter. In addition a sequence of multiple animations can be setup with different starting positions for any dialogline or pause element in a scene section.

For technical reasons the Storyboard UI in-game mod cannot provide a preview for any of these settings and it also cannot assign multiple animations per shot or setup their starting point within a shot. As a consequence tuning these parameters is only possible in the yml definition and the results have to be checked by encoding and reviewing the scene in the game. Nevertheless SBUI is useful to lay out the rough scene, setup some variations of a shot (e.g. different cameras or animations) or simply to search and preselect animations or mimics.

Debug Timeline

To ease with the tuning of a scene and to get a better “overview” of the storyboard parts of a scene definition the radish scene encoder automatically “renders” a “debug timeline” which visualizes the positions, duration and sequence of animations and other scene events (e.g. camera changes) into a text file.

It looks like this:


ascii timeline


At this point it is *highly* recommended to use sublime as an editor and to install the radish modding tools sublime support package (from the download section). The package contains auto completions for a couple of scene events, completions for animation names and a coloring scheme for scene timelines. This improves the overall usefulness of the debug timeline, especially with more complex scenes containing many actors and props (note the folded actor timelines in the screenshot):


ascii timeline colored


With some practice many changes can be prepared using the debug timeline without the need to check every change in the game (see this short example video). Nevertheless make sure you setup the ‘Utility Scene Autostarter’ from trial 4 correctly to playback encoded scenes in the game as fast as possible (when you have to).

Adding/Adjusting Animations To A Scene

Once a scene definition is changed by manual tweaking it becomes impractical to use Storyboard UI to adjust and dump the *full* scene, again. One possible way to add or exchange some animation in an already defined scene requires creating a new (or adjusting an existing) shot in a SBUI scene, logging the definition and afterwards manually merging of multiple parts from the dump into the scene definition yml. These are namely the part of the dumped repository, part of the production assets and the actual animation usage in the storyboard section referencing the new animation in question:


def adding anim


However an easier way is just to use inline animations in the definition and thus make the manual transfer of the repository and production parts unnecessary. The above example will be considerably reduced to just adding/changing the actual animation and attaching it to the actor directly in the storyboard:


def adding inline-anim


The only required information from the SBUI dump is the repository name of the animation (in this case "anim_6970_high_standing_determined_calm_enter_frontal").

Poses and Animation Blending

Every actor has always a “pose” defined. Poses are basically the “idle” animations and are active all the time when no specific animation is played, e.g. in scene choice sections. For these reasons pose animations are always looped.

If a specific animation is played and stops the pose idle animation takes over. Most of the usable animations intended for dialogs are named in a way which indicates a compatible pose idle animation (e.g. geralt_high_standing_determined… is compatible with all pose idle animations defined as high, standing and determined). But even in a compatible transition there may be a visible, sudden change of the actors pose at the end of the animation. To reduce this animation jerk, for every anim a blend-in and also a blend-out can be defined, like this:


def anim blending


These settings are defined as the duration (in seconds) the animation will blend either from the previous (specific or idle) anim to the next anim (again, either the next or the idle one). However making these blends too long will make it look unnatural - so some experimentation will be required. Usually a good starting point is something between 0.3 - 0.7 seconds.

Since this blending is applied to the *played* part of the animation the clipping has to be considered, too:


anim blending schema


It’s also possible to slowdown or speedup animations (‘stretch’ setting) and to change the ‘weight’ of the animation (which is basically the intensity of the overlay of the animation - try out a couple of different weights to get a feeling for the effect).

Also be aware that the ‘stretch’ parameter is applied on the played range of the animation (that is: *after* the clipping). As seen in the above video it’s much easier to get a handle on the resulting duration by using the debug timeline visualization.

Looped Animations

Some animations like generic walking, running or riding are only very short and have only one or at most only a couple of cycles - then they just stop when used as actor.anim scene events. In addition the actor is moving at the same spot and does not advance.

There are basically two options to loop these animations:

  1. setup a custom pose with the animation as idle animation in SBUI (select the animation intended to be looped in the pose animation list and add an actor.pose event in the appropriate element in the yml definition)
  2. setup a sequence of the same animation with actor.anim events as many times as required

The first solution is easier to setup but does not work reliably if the animation has to be synced with another actors animation (e.g. rider and horse): pose animations seem to start at slightly random positions and this may result in some clipping (e.g. between rider and horse). The second one syncs reliably but the subsequent starting positions have to be set manually which may require more fine-tuning (or some calculations for correct positioning).

Additionally, in both cases the actor has to be moved accordingly by properly defined placement interpolation events.

Placement Interpolation

Most of the actor animations with some movement also move the involved actor while playing. But at the end of the animation the actors position is always reset to the starting position. To reposition an actor (at any time) to a specific position the ‘actor.placement’ scene events can be used.

However for looped animation like walking cycles (or any other animation which do not have actor movement encoded) it is required to manually add continuous, smooth placement updates. In the yml definition this can be done with ‘placement interpolation’ scene events, similar to camera interpolation (see ‘test_storyboard_actor.placement.interpolation.yml’):

def placement blend

Additional key frames can be set between the start and end to define a more specific path. In addition to the position also the rotation will be interpolated. The last parameter defines the ‘ease-in’ and ‘ease-out’ just like for camera blends.

The easiest way to define an interpolation is to create two dedicated shots in SBUI (one for start the other for the end position), place the actor in both shots, log the definition and either change the placement events into interpolation events or just use the coordinates to write the above scene event sequence manually, e.g. attached only to one element (as above). However an appropriate amount of time needs to be between the events or you’ll just get an unnatural sliding - so expect some iterations for fine-tuning.

One thing to keep in mind is that actor movement (from animations) in scenes does not respect any collisions, that includes the ground: actors will clip with any terrain bump or float over terrain dents.

If necessary this can be manually fixed with placement interpolation (Z-Axis) as well. Although most of the time it’s just easier to frame the shot and hide the clipping.

Also, any scene props can be positioned and/or moved along a placement interpolation path with dedicated scene events (see test.examples).

Mimic Animations

Normal animations do not contain any facial movements. There is a dedicated set of animations for this and they are attached with ‘actor.anim.mimic’ scene events. The available mimic animations can be previewed in Storyboard UI.

Unfortunately SBUI does not support assigning a mimic animation and a voiceline at the same time in the same shot. However this constraint does not apply to yml scene definitions.

The settings for ‘actor.anim.mimic’ events are basically the same as for “normal” animation events. And just like normal animations events an ‘actor.anim.mimic’ can be defined as an “inline” animation and directly attached to an actor:

def inline-mimic

Take special care with the weight setting for mimic animations, e.g. if your actors are smiling like idiots you should tune down the weight for the mimic. Here are four examples with anim.mimic weights of 1.0, 0.66, 0.33 and 0.0:

scene mimic 000 small
scene mimic 033 small
scene mimic 066 small
scene mimic 100 small

As a good rule of thumb you should not go overboard with the mimic weight:

the more subtle mimics are the more convincing and natural the result will look. Sometimes even one mildly raised eye brow will be enough to emphasize a reaction.


The encoder automatically generates ‘actor.lookat’ events for scenes with multiple actors and makes actors look at each other. However it’s possible to override these autogenerated scene events by specifying a custom looked-at actor or static point, the speed of change or the involved rotating bodyparts (eyes, head, upper body). It’s also possible to “disable” a specific look-at by defining a ‘none’ target because look-at events are on-top modifiers for animations and some animations may already contain gaze changes, too.

SBUI provides a hotkey to cycle between all actors as look-at target and also the definition of static look-at points (make sure to adjust the distance or the actor will be squint-eyed) but the advanced settings, like speed and the involved bodyparts can only be setup in the definition (see above debug timeline video or the test.examples).

Cutscene Anchoring

Most of cutscene type scenes probably include some camera shots embedding the narrative into the surrounding and are intended to be played back at exactly the same location. As mentioned in the 2nd part of the dialog scene tutorial the ‘placement’ key in the production part of a scene definition can be used to attach the scene to either an actor or to a specific world location (via the tag of a spawned entity).

Attaching a scene to a tagged, statically positioned entity is done via ‘scenepoints’ (a special type of layer entities) which can be created in radish quest UI at the location where the scene should be played back. However a precise placement of the scenepoint is not required: it defines only the anchorpoint (origin of the coordinate system for the scene) for all placement settings of this particular scene.


sbui scenepoint


To ensure every scenepoint can be fetched individually the encoder automatically attaches to all scenepoints an autogenerated tag derived from the project- and scenepoint name.

The exact name of the attached tag can be inspected in modeditor in the appropriate “scenepoints.w2l” encoded layer file. But the scheme is always ‘<modname>_<hubname>_<given scenepoint name>_sp’, so for example the ‘hubtest’ project scenepoint named ‘examine_corpse’ in the prologue area will be expanded to ‘hubtest_prologue_examine_corpse_sp’.

Setting this tag as placement will always playback the scene at this location IF the layer with the scenepoint is visible. But be aware that if a scenepoint entity’s location is changed afterwards the played scene will also move (or rotate).

If you want just to change (or simply bind to) a scenepoint for a Storyboard UI prepared scene without moving the scene actors and prop placements you can start SBUI with ‘sbui_with_scenepoint(<scenepoint tagname>)’. This will keep the defined scene and just recalculate all positions. So it is safe to use the command on an “existing” storyboard even multiple times.

The other option to bind a scene to a specific location is to create the scene in Storyboard UI and use the coordinates information below the placement settings in the logged definition to create a new scenepoint (either in radish quest UI or manually in a quest layer definition).

Either way the placement tag in the logged scene description still needs to be set manually to the scenepoint tag.

Please note that SBUI logs the rotation in the dumped “world coordinates of used origin” info as ‘[pitch, yaw, roll]’ like the EulerAngles struct in witcher scripts expects it. But the radish quest UI and the radish encoders expect rotations to be ordered as ‘[roll, pitch, yaw]’.

Gameplay Scenes

Aside from dialog-type and cutscene-type scenes there is one other category supported by the radish modding tools: ‘gameplay scenes’. Those scenes do not interrupt the normal 3rd person view and can be used to playback some custom (even randomized) comments by a npc (or the player) or even more complex interactions between multiple actors with animations, dialoglines and even mimics, e.g.:


(click on the image to see the animation)


Be aware that using animations or pose changes in gameplay scenes may lead to poor blending between the currently active npc gameplay pose or animation. In practice, playing voiceline and mimics should work most of the time. And maybe also some gesture animations defined as ‘actor.anim.additive’ scene events as well. Check out the two short example definitions for gameplay scenes in the ‘docs.scenes/test.examples’ folder and experiment for yourself (how about a mod adding some scenes with mimics, gestures and new comments from npc bystanders?).

Custom Voicelines

The radish modding tools support adding custom audio voicelines to the game. Added voicelines can be used in any scene just like vanilla voicelines would be used, including their selection in Storyboard UI to setup new scenes.

In addition it’s also possible to generate somewhat passable lipsync animations for custom voicelines. There is a more detailed HOWTO in the docs.lipsync folder of the encoder package (HOWTO.generate.lipsynced.w3speech.txt) but as a short overview it basically works like this:

  1. prepare audio as wav files and the corresponding spoken text as a strings.csv with string ids assigned
  2. extract initial phoneme timings
  3. tune phoneme timings manually in the GUI
  4. convert the above wav audio into ‘wem’-format
  5. generate lipsync animation and pack wem audio as w3speech file
  6. test voicelines and lipsync animations in SBUI

In step 4 the audio is converted into the ‘wem’ format from Audiokinetic Wwise to be usable in Witcher 3. Since newer versions of Wwise generate incompatible wem formats, an older version that was used to create the Witcher 3 audio has to be downloaded from here.

A detailed HOWTO for step 4 can be found in the ‘docs.speech’ folder (HOWTO.wem.conversion.txt). Coincidently there is also a video doing more or less the same.

Step 5 is automated by the radish build pipeline so the following will describe the GUI and its usage in steps 2 and 3 and afterwards the necessary tasks to add custom voicelines into SBUI’s selection list (step 6).

The Big Picture In A Nutshell

In order to generate lipsync animation that is adequately synced with the audio it is required to have (a) correct timings tied to the audio and (b) some information what kind of lip animation should be generated at (a).

Information for (b) will be directly acquired from the text corresponding to the audio by translating the text into a ‘phoneme’ sequence, e.g.:


text-to-phoneme example


Phonemes are basically a set of standardized symbols and each defines a unique pronunciation.

Since the translated phoneme sequence does not contain any timing information the Phoneme Extractor GUI tries to extract phonemes from the audio (a) as well. Unfortunately this is rather difficult and while the result does contain phonemes and timings, this phoneme sequence does not necessary match the expected phoneme sequence from the text. Nevertheless most of the time it is a good baseline to start manual adjustments.

Once the timings are tuned (step 3) the radish speech encoder uses the phonemes from the *translated* phonemes sequence to pick lip animation snippets from a set of already extracted lipsync animation snippets from vanilla game voicelines. Using the translated phoneme sequence as groundtruth ensures the picked snippets resemble the *intended* text much more accurately. Based on the corrected timings these snippets are then combined into an animation sequence (step 5).

A more detailed “big picture” diagram can be found at the beginning of the “HOWTO.tuning-options.txt” in the docs.lipsync folder.

Phoneme Extractor GUI

After step 1 the strings csv with the text strings for the audio voicelines (and only for those!) should be in the speech folder of the project (make sure the string ids are within the projects string-idspace and do NOT overlap with any other strings of the project!). Additionally all of the audio files should be in the ‘speech/speech.en.wav’ folder.

The Phoneme GUI can be started with the ‘speech/_extract-phonemes-from-audio.bat’ batch file. It automatically scans the speech.en.wav folder and lists the found wav files in the ‘audio selection’ queue at the bottom. It will also automatically start extracting phonemes (step 2) for wav files named with a string id from the strings csv as prefix.

But the string ids can also be assigned to audio files interactively (string id assigment example video). Once the wav files are processed they are also renamed and the duration is added to the prefix of the filename. This information is required for the subsequent packaging into the w3speech file. As soon as the processing of one audio file finishes its phoneme timings can be adjusted while remaining unfinished files from the queue are processed in the background. Restarting the GUI will pick up where it left when it was stopped last time.

Selecting a processed audio from the queue displays its audio waveform (top) with some phoneme blocks positioned and scaled according to the extracted timing information (directly below the waveform):


phoneme extractor-gui-selected-wav


The waveform can be zoomed (mouse wheel) and dragged (while middle button is pressed) with the phoneme blocks scaling and moving accordingly. The selected audio can be played by pressing the space bar. It’s also possible to playback only a specific part by setting a startpoint (left mouse button) and/or an endpoint (right mouse button) within the waveform. Setting an endpoint before the starting point will remove the endpoint.

A table with exact timings for every phoneme is also shown (left ‘phoneme segment positions’ panel) and allows to activate/deactive specific phoneme segments or to adjust the start, end or intensity (aka weight) by dragging the appropriate sliders. However most of the time it’s easier to adjust the timings by directly dragging (left mouse button) the left or right phoneme block boundary below the waveform. Depending on the ‘phoneme block drag mode’ (changeable in a panel to the right) the neighboring block boundaries will be adjusted slightly different (proportionally or not at all in ‘unconstrained’ mode). You’ll have to experiment to get a feeling how it works.

Additional panels (on the right) show useful but read-only information: the assigned input text for the audio, its phoneme translation and a table with all the initial timing and matching information at the time the audio was selected. It also highlights phonemes with a low confidence score from the automatic extraction in yellow.

Phoneme Timing Tuning

As already mentioned the automatic phonemes extraction is prone to errors (see below for explanation of some warnings and errors you might spot).

A common problem are mismatches between text-translated-phonemes and audio-extracted-phonemes. These mismatches are indicated by a placeholder phoneme segment labeled ‘_’. No valid lipsync animation for such a block exist so all of these blocks will be set inactive by default. But this also (may) create gaps in the phoneme sequence at positions with spoken audio (easily seen in the block sequence below the audio waveform) and needs to be corrected.

There is a “gap close” button which automatically tries to close all gaps ripped by ‘_’ by simply extending the neighboring segments. However this may fail in some cases and also close gaps where a gap *should* be according to the audio. In addition the newly extended timings of the neighboring blocks might now be off as well.

It’s *always* a good idea to verify and manually readjust the timings before saving.

In theory phoneme block boundaries can be moved to overlap neighboring blocks and sometimes this may be even useful to squeeze in or extend some phoneme blocks that otherwise would be very/too short. You can experiment to see the results in-game and decide for yourself.

As a rule of thumb it is better to include all text-translated-phonemes even if they cannot be heard or are only short or overlapped segments because they still define the lip movement and get smoothed anyway - so it may look better even with short phoneme segments.

Here is an example video showing the phoneme extraction and tuning workflow two audio files. Notice the usage of the auto-gap feature and the manual adjustments of falsely merged blocks.

Warnings and Errors In Steps 2 and 5

The automatic extraction of phoneme timings from an audio file and the subsequent matching with the phoneme sequence from the text in step 2 may result in a mismatch or a poor match indicated by a low “score”. This situation will be logged in the console window:


phoneme error extraction-alignment


This is merely a strong suggestion to manually adjust this particular phoneme block, especially if it was replaced by a ‘_’ segment - which should be checked anyway.

After adjusting all phonemes it’s also possible that in step 5 the following error(s) or warning(s) are produced by the lipsync generating:


phoneme errors lipsync-generator


The first warning means that no lipsync animation snippet was found for a specific phoneme id, in the above example for ‘_’ which is by definition an invalid placeholder phoneme segment. The last two warnings indicate that an exact match for a required lipsync animation snippet including the previous and following phonemes (which form its context) was not found and some fallback snippet was used (with a similar but different context). This is not necessary a bad thing - but it’s not optimal either.

The error at the end means that even fallback snippets could not be found for some phonemes (in this case it is related to the first warning). This should not happen very often. But if it does, you have multiple options as a workaround:

  • deactivate that specific phoneme in the phoneme extractor and just extend the neighboring phonemes. The quality of the lipsync animation result will depend on the lipsync animation of the surrounding phonemes
  • create an alias for the missing phoneme in the repository ‘<encoder dir>/repo.lipsync/phoneme.alias.repo.yml’ file and set it to use some more or less similar phoneme
  • ignore it and live with the fact that no animation will be generated for the unknown phoneme(s) (probably a visible gap in the anim)

But most of the time it will be just a forgotten active ‘_’-block which should be fixable easily.

Testing Custom Audio In Storyboard UI

In order to easily test custom voicelines in-game the radish speech packer automatically generates a script file containing references to all packed voicelines in a format suitable for extending the selectable voiceline list in SBUI:


sbui speech lines


The file is generated into the ‘mod.scripts-tmp’ folder and needs to be deployed into the Witcher 3 installation with the ‘bin/deploy.tmp-mod-scripts.bat’ or the appropriate sublime build option (every time the text or string ids for voicelines are changed!).

Afterwards the new list has to be manually integrated into the SBUI list: the newly generated function must be called from the ‘SBUI_getExtraActorVoiceLines’ function in the SBUI ‘mod_additional_voicelines.ws’ file, like this:


sbui speech lines.addition


Adding the call has to be done only once as all subsequent updates and deploys of the file do not change the name of the generated function anymore.

The next time SBUI is started the custom voicelines can be selected from the list in the voiceline mode (click on the image to see the video):


sbui speech test
Ioverth model mod by [@Holgar96](https://www.nexusmods.com/witcher3/users/26045214)

Adding Other Languages

Technically it is possible to generate lipsync for other languages but it requires a little more effort (it was already done in the W1 Prologue mod). For best results the phoneme extraction from audio requires a language specific phoneme model compatible for the used library (pocketsphinx). But there aren’t many publically available.

However as the extracted phoneme timings are only used as a baseline you can always try the default english model and manually readjust and reassign the phoneme blocks in the GUI. The remaining workflow should be more or less the same.

Generating Lipsync Animations Without Audio

It is also possible to create mods without custom audio and use (old-school) subtitles instead. There is a section in the docs.lipsync/HOWTO.generate.lipsynced.w3speech.txt about generating lipsync animations from text only.

Tips Scene Definitions

  • there are many more scene events supported that are not mentioned in this article. Make sure to checkout the examples in the ‘docs.scenes/test.examples’ folder of the encoder package.
  • do not uncomment the ‘duration’ setting in the production/assets/animations for an animation. This is just a convenienve information about the duration of the unclipped animation. Nowadays it’s easier to use the debug timeline.
  • it’s not advisable to put much time into adjusting animation timings if custom audio is to be used or still needs to be tweaked: do this after you have the voicelines in SBUI tested because SBUI will embedd the duration into the dumped scene definition. If duration from replacing voiceline differs from the previously used one, the animations will speed up/slow down. So working with placeholder lines may be problematic (IF the duration differ significantly).
  • pose animations: any animation that moves the actor wont move the actor when it’s used in a pose.
  • invisible actors in scenes:
    • prefer to test without other mods first, as some (e.g. MultiCompanion Mod) may spawn entities as well and that might interact in unexpected ways. First test if the scene works then test if it works with your other mods.
    • be aware of extreme camera FOV settings. It distorts proportions of the displayed actors and also may trigger some elements to be invisible probably because of the near/far plane clipping. Instead try to use a “safe” FOV settings range (30-45), though it depends on the scene, actor positions and the desired effect, too.
    • scene actors have a ‘by_voicetag’ setting, try to change it to false as a test. The flag defines if the actor should be searched by tag (meaning an entity with the voicetag must be present in this hub). If this is set to false the actor will be spawned for the duration of the scene (player should always be searched by tag!). But for temporary debugging the flag can be set for npc to false.
    • try a different template for this actor (some main/secondary npcs have multiple templates defined): the current template may simply not work in scenes (for whatever reason) even if it works in SBUI.
  • jerky animations:
    • most of the time it is a problem with animation lengths or missing blend-in/blend-out settings. Verify the duration, blend settings and possibly unintended overlapping of animations with the debug timeline.
    • make sure the animation is compatible with the currently set pose or add a pose change while the animation is still playing.
    • you can try to use the actor.anim.additive event instead. It adds the animation on top of the pose. But it will not look good for most animations as most will tweak some parts of the body unnaturally. However for some it might look better.
    • if you want to stop an animation earlier you can use clipend setting. However a new section always resets everything.
  • effects in scenes:
    • you can define new effects and start them in scenes (now that you know how to define custom effects) but you can also play any effect that a template already has. Just check the effect names in the cookedeffects of the encoded entity template, e.g. ‘teleport_in’ for some actor templates.
    • some effects may not work in encoded scenes. It’s also possible that they just play their fx only once directly after they are spawned at the beginning of the scene. In this case it should be possible to restart it again with an appropriately timed ‘actor.effect.start’ or ‘prop.effect.start’ scene event.
  • scene props will be spawned for the scene only and will be despawn on scene end. If you want to have persistent scene props use radish quest UI to create a layer with static layer entities.
  • you can add multiple exits to a scene. If you add a - OUTPUT: key element for each added exit in the scene definition, then you can access the signal output in quest editor by using key as an output socket in the corresponding scene node.

Tips Storyboard UI (SBUI)

  • multiple saved storyboards:
  • you can use (and save) multiple storyboards in SBUI by providing an optional name parameter at startup, e.g. sbui(my_new_experimental_scene). If omitted the default name will be ‘[default]’.
  • a list of currently available storyboards (in the current savegame) can be queried by the console command ‘sbui_list()’.
  • removing a named storyboard can be done via ‘sbui_clear_storyboard(my_new_experimental_scene)’ command
  • setting a scenepoint as anchor to a named storyboard can be accomplished by calling ‘sbui_with_scenepoint(<scenepoint tag name>, <sceneid>)
  • getting fine looking cam blends can require many iterations. It’s a good idea to create multiple variations of camera setups for the same shot from the beginning (by cloning and adjusting the shot) and pick the best one for the final scene.
  • if for some reason, the lipsync doesn’t get triggered when you test new voicelines on actors different than player it may be that you changed the templates for actors too quickly: every actor is tweaked on spawn to support mimics - maybe the spawing messed this up. Try to restart SBUI again.
  • creating multiple scenes requires to manage the used string ids, especially if the scenes generate the strings from text (meaning there are no audio files, yet): string ids for dialoglines must not overlap! You can check what strings (and their respective ids) are generated for every scene in the strings dir of the project. It contains dedicated strings files created by for different scenes and the quest. The “all.en.strings.csv” file in strings folder is automatically regenerated from all strings.* files. You can use it to check if there are any dupes (within the whole project). If there are: change the string id start for the problematic scene in its scene yml file (‘strings-idstart’ setting). But do NOT change it directly in the strings file as it will get overwritten next time the scene is generated!
  • you can delete all the generated string csv files “strings.scenes.*.csv”, “strings.quest.csv” and “strings.custom.speech.csv” but need to regenerate them again from the yml files either by a full build, a quest build and scene(s) build. The “all.en.strings.csv” file is the concatenation of all csv snippet files. Do NOT edit it!

Tips Phoneme Extraction/Tuning

  • the GUI requires a csv with the strings for the audio. Those strings are required to extract phonemes and will also be used as subtitles. The project template contains a “MODNAME.speech.csv.template” file in the speech folder as example and contains instructions how to rename the file so it is automatically used.
  • make sure the text in the strings file intended for the audio lines (speech/<modname>.speech.csv) is *exactly* as spoken in the audio.
  • do NOT edit all.en.strings.csv file in strings folder - it is automatically regenerated from all strings.* files
  • if you change the scene string-id start you require to rename the voiceline files accordingly, too!
  • make sure your wav is 16 bit mono for the Phoneme Extractor GUI
  • the wem files need to be named with the string id and duration as prefix for the w3 speech packer to correctly detect and pack them. It’s easier to use the phoneme extractor before converting the wav files to wem: by assigning string ids to the audio the GUI will rename the wav files properly and converting with Wwise afterwards will keep the generated filename as prefix.
  • do NOT use office for editing the strings csv! Use only a normal text editor (e.g. notepad++ or sublime). Office will screw up the file when you save it.
  • it’s possible to increase the verbosity of the log output of the radish tools (a lot!) by temporarily adjusting the log level setting to ‘LOG_LEVEL=--very-verbose’. This may also provide some insights if some audio files are not added to the audio queue in the GUI.

Tips Speechfile Packing

  • a full.rebuild run does NOT automatically convert updated wav files to wem files: the conversion has to be done manually every time a wav is changed. If you want to change audio make sure you encode the wem again and put the updated file into the correct project folder. Adjusting phoneme timings is probably required, too. Afterwards it should be enough to encode speech only (and deploy the tmp generated file with the voiceline list)
  • all voiceline audio (except some lines under water which are stereo) are 44khz 16bit mono. However the w3speech packer packs the wem files no matter what content they have. So if you encode higher quality wem files (or stereo) with different settings the encoder will not stop packing but the game may have problems with that audio. Make sure it still works!