Into Depth – Shadow Casting & Portals

A debug view showing shadow cast segments from the vantage point of the actor.

Last month we covered the unified action system, in which we built some basic things players can do with their heroes, and set the stage for fancier futures and actions for enemy agents. The most significant action was moving, which was built based on a GridView of what the actor sees from their current vantage point. As noted in that post, we eventually wanted to expand the grid view to include edge information, such as thin little walls and platforms, but also portals that act as non-Euclidean connections to other parts of the map.

Here we have a simple two-tile room with a loop-back portal on either end. The result – the hero can see a bunch of versions of themselves!

In the last post, I said I’d be adding other entities in and adding more actions, but I decided that the map representation was so fundamental that prioritizing getting these portals in was more important. Any changes there would affect most other systems, so I wanted to get that done first.

Grid Representation

Prior to this change, the underlying level geometry representation, the Grid, was just a 2D array of cells:

After this change, I wanted to represent both the cells as well as the edges between them:

This was achieved by adding two additional 2D arrays, one for the horizontal edges and one for the vertical edges. There are definitely other ways to do this, but I felt that (1) this was simple, and (2) there are differences between those edge types. A player might stand on a platform but have a door that they can open, for example.

The edges are single-bit entries, to keep things efficient. If we need more information, we can index into a separate array. I’m currently using the first two bits to identify the edge type, and the remaining 6 bits, if the edge is a portal, index into a Portal array. The Portal stores the information necessary to know which two edges it links.

This change has the immediate consequence that getting the cell neighbor in a given direction does not necessarily produce the Euclidean neighbor. Its second consequence is that cells no longer need to be solid or not — the edges between cells can instead be solid. This slightly simplifies cells.

Grid View Representation

The Grid View, introduced in the last post to represent the level geometry as seen by an actor from their vantage point, can now represent views where the same underlying cell might show up multiple times due to portal connections. Furthermore, we might have weird cases where one square cell might contain slices from different areas of the map:

The green arrow points at a view cell which, due to the portal (magenta), has two sections of different grid cells.

Before, the grid view was a simple 2D array with one cell per tile. Now, that is no longer the case.

The updated grid view is organized around ViewNodes, ShadowSegments, and ViewTransforms:

struct GridView {
    // The grid tile the view is centered on
    CellIndex center;

    // The first ViewNode in each view cell.
    // 0xFFFF if there is no view node in that cell.
    u16 head_node_indices[GRID_VIEW_MAX_X][GRID_VIEW_MAX_Y];

    u16 n_nodes;
    ViewNode nodes[GRID_VIEW_MAX_NODES];

    // Shadowcast sectors.
    u16 n_shadow_segments;
    ShadowSegment shadow_segments[GRID_VIEW_MAX_NODES];

    // The transforms for the view as affected by portals.
    u8 n_view_transforms;
    ViewTransform view_transforms[GRID_VIEW_MAX_TRANSFORMS];
};

Most of the time, a given cell in the grid view will contain a single cell in the underlying grid. However, as we saw above, it might see more than one. Thus, we can look up a linked list of ViewNodes per view cell, with an intrusive index to its child (if any). Each such node knows which underlying grid cell it sees, whether it is connected in each cardinal direction to its neighbor view cells, and what sector of the view it contains:

struct ViewNode {
    CellIndex cell_index;

    // Intrusive data pointer, indexing into the GridView nodes array,
    // to the next view node at the same ViewIndex, if any.
    // 0xFFFF if there is no next view node.
    u16 next_node_index;

    // Whether this view node is continuous with the view node in the
    // given direction.
    // See direction flags.
    u8 connections;

    // The sector this view node represents.
    Sector sector;

    // The stencil id for this view node.
    // Identical to the transform_index + 1.
    u8 stencil_id;
};

We can now represent cases like the motivating portal scenario:

I’ve given the region on the other side of the portal different tile backgrounds to make it more obvious.

In the last post, we constructed the grid view with breadth-first search (BFS) from the center tile. Now, since we’re working with polar segments that strictly expand outwards, we can actually get away with the even simpler depth-first search (DFS). Everything is radial, and the farther out we get, the narrower our beams are:

A debug view showing the final shadow cast segments.

The search logic is a bit more complicated given that we have to narrow down the segment every time we pass through a tile’s edge, we’re saving view nodes in the new intrusive linked list structure, and edge traversals have to check for portals. Solid edges terminate and trigger the saving of a shadow segment.

Portal traversals have the added complexity of changing physically where we are in the underlying grid. If we think of the underlying grid as one big mesh, we effectively have to position the mesh differently when rendering the stuff on the other side of the portal. That information is stored in the ViewTransform, and it is also given its own stencil id.

A debug view of the view nodes. We fully see many cells, but some cells we only partially see. The origin is currently split into four views in order for view segments to never exceed 180 degrees.

Above we can see an overlay of the resulting view nodes. There is a little sliver of a cell we can see for a passage off to the left, and there are the two passages, one with a portal, that we can see on the right. The platforms (little brown rectangles) do not block line of sight.

The Stencil Buffer

Now that we have the grid view, how do we render it? It is pretty easy to render a square cell, but do we now have to calculate the part of each cell that intersects with the view segment? We very much want to avoid that!

Clean rendering is accomplished via the stencil buffer, which is a feature in graphics programming that lets us mask out arbitrary parts of the screen when rendering:

We’ve already been using several buffers when rendering. We’re writing to the RGB color buffers, and 3D occlusion is handled via the depth buffer. The stencil buffer is just an additional buffer (here, one byte per pixel), that we can use to get a stenciling effect.

At one byte (8 bits) per pixel, the stencil buffer can at first glance only hold 8 overlapping stencil masks at once. However, with our 2D orthographic setup, our stencil masks never overlap. So, we can just write 255 unique values per pixel (reserving 0 for the clear).

We simply write the shadow segments to the stencil buffer, setting the buffer to the appropriate stencil ID:

The stencil buffer has four stencil masks – the red region in the main chamber, the little slice of hallway to the left, the green region through the portal leading back into the same room, and the yellow region through a different portal to an entirely different room.

Our rendering pipeline now needs to support masking with these stencil IDs. In January we covered the render setup, which involved six different types of render commands that the game could send to the backend:

  • render mesh – render a 3D mesh with lighting, already uploaded to the GPU
  • render local mesh – render a 3D mesh without lighting, sending vertices from the CPU
  • render quad – an old simple command to render a quad
  • render sprite – self-explanatory
  • render spritesheet sprite – an old version of render sprite
  • render panel – renders a panel with 9-slice scaling by sending vertices from the CPU
  • render text – renders font glyphs by sending quads from the CPU

Having three versions of everything – rendering without stencils, rendering with stencils, and rendering to the stencil buffer itself, would triple the number of command types. That is too much!

Furthermore, we want to order the render commands such that opaque things are drawn front to back, and transparent things are then drawn back to front, and finally drawing UI elements in order at the end. That would require sorting across command types, which seems really messy.

To achieve this all neatly, I collapsed everything sending vertices from the CPU to a unified command type, and have the backend fill the vertex buffer when the game creates the command:

You will notice that the game-side changed a bit. Rather than receiving a command struct and being responsible for populating all fields, I decided to move rendering to methods that expect everything you need to be passed in. This is less error prone, allows for different function signatures that populate what you need, and means I don’t actually need structs for the different command types if we’re directly converting to vertices.

Each unified command (everything but the mesh render calls) is simply:

// Determines the order in which things are rendered.
enum class PassType : u8 {
    OPAQUE = 0,
    TRANSLUCENT = 1,
    UI = 2,
};

// Determines whether and how a command uses a stencil.
enum class StencilRequirement : u8 {
    NONE  = 0, // Does not use stencils
    WRITE = 1, // Writes to the stencil buffer
    READ  = 2, // Reads from the stencil buffer
};

// Determines how the draw gets executed.
enum class DrawMode : u8 {
    ARRAYS,   // No EBO
    ELEMENTS, // With EB
};

struct RenderCommandUnlitTriangle {
    // The commands are sorted by the sort key prior to rendering, determining the order.
    // Bits 63-62 (2 bits): Pass Type (Opaque, Translucent, UI)
    // Bits 61-60 (2 bits): Stencil Requirement
    // Bits 59    (1 bit):  Draw Mode
    // Bits 58-51 (8 bits): Stencil ID (0 for none)
    // Bits 50-32  <reserved>
    // Bits 31-0  (32 bits)
    //   In Opaque:      front-to-back distance
    //   In Translucent: back-to-front distance
    //   In UI:          sequence ID
    u64 sort_key;

    // Draw Arguments
    u32 n_vertices; // Number of vertices to draw
    u32 i_vertex;   // Index of the start vertex
    u32 n_indices;  // Number of indices, if using ELEMENTS
    u32 i_index;    // Index of the start element index, if using ELEMENTS
};

Apparently, this is much closer to how modern game engines structure their render commands. There is one vertex buffer that we write into as commands are added by the game, and the command just keeps track of the appropriate range. The same is true for the index buffer, which in our case is pre-populated buffer for quad rendering (so each quad takes 4 vertices and 6 indices instead of 6 vertices):

The pass type, stencil requirement, and draw mode are then baked into a sort key that we can sort on in order to automagically get the commands in the order we want: opaque < transparent < UI, with no stencil < write to stencil < read from stencil, and then grouping by whether we use the elements (index) buffer or not. We even use the lower 32 bits for additional sorting, rendering opaque things front to back (since lighting is expensive and we can discard more pixels that way), rendering transparent things back to front (required to get correct results), and rendering UI elements in the order they are created.

Being able to call glBufferData once to send all of the vertices up once at the beginning is really nice. Before, we were writing vertices and sending just the few we needed for every command type each time.

The end result is significantly simpler backend rendering code. It can become even simpler if we eventually unify the mesh rendering with the unlit triangle rendering, but I’m holding off on that because those render commands have a bunch of uniform data and then I’m switching between shaders. I’m not currently rendering transparent meshes, and if I eventually do I’ll bite the bullet and further consolidate.

Conclusion

I am quite happy to have tackled the fundamental grid representation, which as we saw ended up affecting a lot of stuff. The new grid view now needs to be properly integrated into the movement action, and then we can actually start introducing other entities like ropes, buckets and relics. No promises on specifics! We’ll let the project dictate how it should evolve.

Into Depth – Action System

Current title screen using text rendering with the Arial TrueType fonts.

Last month we covered text rendering, which was necessary for getting the scaffolding up that supports the over-arching gameplay loop. We have a title screen, a level select screen, codex screens for seeing information about unlocked heroes and relics, and a level results screen that summarizes what was gained when completing a level. All of these are rudimentary first stabs, but ya got to make it exist first.

A lot happened in the last month, some of which I might cover in future posts. I’m not going to list everything every time, but it is interesting to see just how much stuff goes into making some sort of usable interactive experience when you’re doing a lot from scratch.

  • Added basic hero avatars to display when the hero is selected.
  • Added a tweak file system for rapidly tuning parameters.
  • Expanded the hero struct to include a hero level state to distinguish between in-level heroes and those that have not yet been deployed and those that have exited a level.
  • Added new screens.
  • Generalized my UI panel work to have fancier button logic that properly detects button presses, accounting for when the player clicks elsewhere but releases over the button, or presses down on the button but releases elsewhere.
  • Moved the game interface to simply receive one large memory buffer that the game itself then chops up into whatever smaller allocations and arenas it needs.
  • Introduced local mesh assets that can be directly rendered via the triangle shader from the previous post, used for basic quads and things like the selection outline.
  • Introduced the grid view. More on that in a bit.
  • Added a way to quickly identify which entities are in which cells in the stage, now more necessary due to the grid view.
  • Introduced schedules and simulating consequences up front rather than live.
  • Added the action selection panel and action sub-selection interfaces for hero deployment, move selection, and turn passing. The focus of this blog post.
  • Removed a bunch of earlier code pre-move selection where the active hero would move one tile or perform one action per key press.

The game now looks like this:

Obviously, all art, layout, UI, etc. is an extremely early first cut and likely not the final version. We do, however, see the basic framework for a game.

Action Panel

The main thing I want to talk about this week is the fledgling action system, starting with the action panel:

The action panel shows the active hero’s avatar, their name, their level state, and has a series of action buttons.

You’ll notice that the panel has the same color as the background. Eventually, when the field of view includes shadow casting, it should just blend with the shadows. Here is my Google slides mockup of what I’m roughly working toward:

Eventually we’ll see the whole party, along with whatever status bars we need to see, and the actions available to the hero will look fancier. Most notably, I intend to render little equilateral triangles to represent the action points available for various moves. I’m not 100% settled on how that would work, but something like that will happen.

The game loop inside a level tracks which actor is active, and then loops through four states:

  • Generate Actions
  • Select Action
  • Play Schedule
  • Done

The first state is where the game looks at the current state and generates the actions available to the actor. For heroes, these then show up as options in the action panel. The user can then select an action and get access to its dedicated UI, and use that to determine the details.

Here we’ve selected the DEPLOY action and we get a user interface for selecting which cell to deploy the hero to.

Once the user commits to an action, the action generates a schedule. This is the sequence of events that represent the outcome of the action. For example, passing the turn simply produces an event that moves the game to the next actor. Selecting a move to a cell produces a more complicated schedule that traverses multiple cells in sequence, and may involve posture changes like changing from standing to climbing.

Most importantly, a schedule contains the complete outcome of the action. Previously, if the actor planned to move, I was having the game check live, as the player moved, for triggered events like ending up over an empty shaft and then moving into a falling state. This fragments the logic and makes it harder to test planning and consequence code, as we don’t really know the consequence of an action until it is tediously simulated out over many iterations.

Instead, the schedule does all necessary consequence simulation during construction, and the game then just needs to play that out until it has completed. Given that we have the schedule, it is also quite easy to undo the entire action without having to figure out some weird reverse simulation.

This doesn’t look terribly complicated to implement, but it is actually the system that flummoxed me the most in my previous project attempt, particularly when it came to changing which actions were available based on the actor’s equipment and in enabling undo. I am much happier with how this latest rendition is set up.

The schedule is, at its core, just a DAG of events stored in a topological order:

struct Schedule {
    // Events are in a topological order
    u16 n_timeline;
    Event* timeline; // allocated on the Active linear allocator

    // The net outcome.
    u16 n_outcome;
    Event* outcomes; // allocated on the Active linear allocator

    // The opposite of the net outcome, since events are not inherently reversible.
    u16 n_undo;
    Event* undos; // allocated on the Active linear allocator
};

The schedule contains the full timeline, plus a compressed net outcome that only contains the events necessary to encode the overall delta. For something like a pass action, the timeline and the outcome are the same. For a move, where the actor traverses multiple cells, the timeline contains the sequence of cell traversals but the outcome only contains one cell change – from source to dest.

The schedule also contains the set of events needed to undo the action. This is very similar to the outcome, just reversed. The game events don’t all contain the information necessary to be reversed, so we construct the undo events separately.

A simple schedule for moving a hero. The timeline contains multiple cell transitions, but the outcome and undo each consist of a single event.

Events are small, composable game state deltas:

struct Event {
    u16 index;  // The event's index in the schedule. (Events are in a topological order)
    u16 beat;   // The beat that this event should be executed on. Events sharing a beat happen concurrently.
    EventType type;

    union {
        EventEndTurn end_turn;
        EventSetHeroLevelState set_hero_level_state;
        EventMove move;
        EventSetCellIndex set_cell_index;
        EventSetFacingDir set_facing_dir;
        EventSetHeroPosture set_hero_posture;
        EventAttachTo attach_to;
        EventOnBelay on_belay;
        EventOffBelay off_belay;
        EventHaulRope haul_rope;
        EventLowerRope lower_rope;
        ... etc.
    };
};

Each event knows its event type and then contains type-specific data. This is a pretty straightforward way to interleave them without annoying object-oriented inheritance code.

Events also contain beats. The game logic is discrete, and in order to have events run concurrently, we store them in the same beat:

Whenever we advance a beat, we apply all events in that beat and trigger any animations or whatnot and use that to determine how long to wait until we start the next beat. This keeps the event system clean (it doesn’t need to know how long a given animation will take, or even what animation is associated with an event), and gives us one centralized place for triggering animations and sounds.

When the schedule is fully played out, the game enters the done state and checks to see if the level is done. If not, it goes back to action generation.

The state for active levels thus includes the core game state (the stage), which actor’s turn it is, the active schedule (if any), the schedule playback state (event index, time in beat), and data for all of this turn’s actions:

struct ScreenState_Active {
    // The active playable area and the entities in it.
    Stage stage;

    // Stores data for the active actor's turn.
    Turn turn;

    // Stores the events that are scheduled to run.
    Schedule schedule;

    // The playback state of the schedule.
    PlaybackState playback_state;

    // The index of the active actor.
    int i_actor = 0;
    int i_actor_next;

    // Where we allocate the action data.
    // This allocator is only reset when actions are regenerated.
    LinearAllocator action_allocator;
};

Pretty clean when it comes down to it.

The Turn contains the actions generated for the current actor. Each action has a name, which then makes it easy to render the buttons for the actions in the action panel.

Actions

An action is a discrete state change available to an actor, such as passing the turn, moving through the environment, or attacking another actor. The available actions depend on the game state — an actor that is not yet deployed has a deploy action but no move action, and an actor with a bow should get an attack action with a ranged UI whereas an actor with a sword can only select the tiles within melee range.

To handle these things flexibly, actions have methods that can be specialized on a per-action basis. In modern C++, one would probably use objects and inheritance to implement a bunch of action subclasses. I am avoiding classes and am not using inheritance at all, so instead we have a basic struct with some function pointers:

// A function pointer called to run the action UI for the action,
// once the action has been selected in the menu.
typedef void (*FuncRunActionUI)(GameState* game, RenderCommandBuffer* command_buffer, void* data, const AppState& app_state);

// A function pointer for a method that determines whether the action was committed.
typedef bool (*FuncIsActionCommitted)(const void* data);

// A function pointer for a method that builds a schedule from the action.
typedef bool (*FuncBuildSchedule)(GameState* game, void* data);

struct Action {
    char name[16]; // null-terminated

    // The key to press to perform this action
    char shortkey;

    // Data associated with the action, specialized per action type
    void* data;

    // Function pointers
    FuncRunActionUI run_action_ui;
    FuncIsActionCommitted is_action_committed;
    FuncBuildSchedule build_schedule;
};

We use the action name when displaying its button in the action panel, and its shortkey is available if the user doesn’t want to have to click the button with the mouse.

Each action also has a void* data member, which can be populated when the action is created and then used in the member functions. We’ll see an example of that shortly.

Generating the available actions is conceptually straightforward; just run a method for every action in the game that checks if that action is available, and if it is, allocates it, constructs it, and adds it to the action list:

void GenerateActions(GameState* game, const Hero& hero) {
    Turn& turn = game->screen_state_active.turn;
    turn.n_actions = 0;
    turn.i_action_selected = -1;

    if (MaybeGenerateAction_Deploy(&turn.actions[turn.n_actions], game, hero)) {
        turn.n_actions++;
    }
    if (MaybeGenerateAction_Move(&turn.actions[turn.n_actions], game, hero)) {
        turn.n_actions++;
    }
    if (MaybeGenerateAction_Pass(&turn.actions[turn.n_actions], game, hero)) {
        turn.n_actions++;
    }
    ...
}

This may seem too simple, but I think it is actually quite an advantage. I had previously been considering having various pieces of equipment be responsible for determining which actions they are associated with, and then having a way to store that metadata on the equipment, save it to disk, etc. Messy! Instead, I can just run all of these methods, every time, and if any require specific equipment, they can check for it and just quickly return false if it isn’t there.

Every action currently requires at least four methods: action generation, running the action-specific UI, a simple method that determines whether the user committed to the action, and a schedule generation. For the simple pass action, this doesn’t even need the void* data member:

// ------------------------------------------------------------------------------------------------
bool MaybeGenerateAction_Pass(Action* action, GameState* game, const Hero& hero) {

    strncpy(action->name, "PASS", sizeof(action->name));
    action->shortkey = 'p';
    action->run_action_ui = RunActionUI_Pass;
    action->is_action_committed = IsActionCommitted_Pass;
    action->build_schedule = BuildSchedule_Pass;

    return true;
}

// ------------------------------------------------------------------------------------------------
void RunActionUI_Pass(GameState* game, RenderCommandBuffer* command_buffer, void* data, const AppState& app_state) {
    // Nothing to do here.
}

// ------------------------------------------------------------------------------------------------
bool IsActionCommitted_Pass(const void* data) {
    return true;
}

// ------------------------------------------------------------------------------------------------
bool BuildSchedule_Pass(GameState* game, void* data) {
    ScreenState_Active& active = game->screen_state_active;

    // Build the schedule, which consists just of an end turn action.
    Schedule* schedule = &(game->screen_state_active.schedule);
    
    schedule->n_timeline = 1;
    schedule->timeline = (Event*)Allocate(&(active.action_allocator), sizeof(Event));
    ASSERT(schedule->timeline != nullptr, "BuildSchedule_Pass: Failed to allocate timeline!");
    
    CreateEventEndTurn(schedule->timeline, /*index=*/0, /*beat=*/0);

    // The outcome is the same.
    schedule->n_outcome = 1;
    schedule->outcomes = schedule->timeline;

    // The undo action: TODO

    return true;
}

The deploy action is more involved, and it does allocate a custom data struct:

struct DeployActionData {
    // List of legal entry cells for the hero.
    u16 n_entries;
    CellIndex entries[STAGE_MAX_NUM_ENTRIES];

    // The index of the entry we are looking at.
    int targeted_entry;

    // Whether the entry has been selected.
    bool entry_selected;
};

// ------------------------------------------------------------------------------------------------
void InitActionData_Deploy(DeployActionData* data) {
    data->n_entries = 0;
    data->targeted_entry = 0;
    data->entry_selected = false;
}

// ------------------------------------------------------------------------------------------------
bool MaybeGenerateAction_Deploy(Action* action, GameState* game, const Hero& hero) {

    if (hero.level_state != HERO_LEVEL_STATE_UNDEPLOYED) {
        // Hero does not need to be deployed
        return false;
    }

    ScreenState_Active& active = game->screen_state_active;
    const Stage& stage = active.stage;
    
    // Allocate the data for the action
    action->data = Allocate(&(active.action_allocator), sizeof(DeployActionData));
    DeployActionData* data = (DeployActionData*)action->data;
    InitActionData_Deploy(data);

    // Run through all stage entries and find the valid places to deploy
    for (u16 i_entry = 0; i_entry < stage.n_entries; i_entry++) {
        CellIndex cell_index = stage.entries[i_entry];
        
        ASSERT(!IsSolid(stage, cell_index), "Entry cell is solid!");

        // Ensure that the cell is not occupied by another hero
        if (IsHeroInCell(stage, cell_index)) {
            continue;
        }

        // Add the entry.
        data->entries[data->n_entries++] = cell_index;
    }

    if (data->n_entries == 0) {
        // No valid entries
        return false;
    }
    
    
    strncpy(action->name, "DEPLOY", sizeof(action->name));
    action->shortkey = 'd';
    // action.sprite_handle_icon = // TODO
    action->run_action_ui = RunActionUI_Deploy;
    action->is_action_committed = IsActionCommitted_Deploy;
    action->build_schedule = BuildSchedule_Deploy;

    return true;
}

// ------------------------------------------------------------------------------------------------
void RunActionUI_Deploy(GameState* game, RenderCommandBuffer* command_buffer, void* data, const AppState& app_state) {
    
    DeployActionData* action_data = (DeployActionData*)data;

    const TweakStore* tweak_store = &game->tweak_store;
    const f32 kSelectItemFlashMult = TWEAK(tweak_store, "select_item_flash_mult", 2.0f);
    const f32 kSelectItemReticuleAmplitude = TWEAK(tweak_store, "select_item_reticule_amplitude", 0.25f);
    const f32 kSelectItemFlashAlphaLo = TWEAK(tweak_store, "select_item_flash_alpha_lo", 0.1f);
    const f32 kSelectItemFlashAlphaHi = TWEAK(tweak_store, "select_item_flash_alpha_hi", 0.9f);
    const f32 kSelectItemArrowOffsetHorz = TWEAK(tweak_store, "select_item_arrow_offset_horz", 1.0f);
    const f32 kSelectItemArrowScaleLo = TWEAK(tweak_store, "select_item_arrow_scale_lo", 1.0f);
    const f32 kSelectItemArrowScaleHi = TWEAK(tweak_store, "select_item_arrow_scale_hi", 1.0f);

    // TODO: This is probably lagged by one frame.
    const glm::mat4 clip_to_world = CalcClipToWorld(command_buffer->render_setup.projection, command_buffer->render_setup.view);
    const glm::vec2 mouse_world = CalcMouseWorldPos(app_state.pos_mouse, 0.0f, clip_to_world, app_state.window_size);

    bool deploy_hero_to_target_cell = false;

    // Set the hero's location to one tile over the entry we are looking at.
    // This is a hacky way to handle the fact that the hero is not in the level yet

    // and that the camera is centered on the hero
    ScreenState_Active& active = game->screen_state_active;
    Hero* hero = active.stage.pool_hero.GetMutableAtIndex(active.i_actor);
    const CellIndex cell_index = action_data->entries[action_data->targeted_entry];
    MoveHeroToCell(&active.stage, hero, {cell_index.x, (u16)(cell_index.y + 1)});
    hero->offset = {0.0f, 0.0f};

    // Render the entrance we are currently looking at.
    {
        const f32 unitsine = UnitSine(kSelectItemFlashMult * game->t);

        RenderCommandLocalMesh& local_mesh = *GetNextLocalMeshRenderCommand(command_buffer);
        local_mesh.local_mesh_handle = game->local_mesh_id_quad;
        local_mesh.screenspace = false;
        local_mesh.model = glm::translate(glm::mat4(1.0f), glm::vec3(0.0f, -1.0f, RENDER_Z_FOREGROUND));
        local_mesh.color = kColorGold;
        local_mesh.color.a = Lerp(kSelectItemFlashAlphaLo, kSelectItemFlashAlphaHi, unitsine);

        {
            RenderCommandLocalMesh& reticule = *GetNextLocalMeshRenderCommand(command_buffer);
            reticule.local_mesh_handle = game->local_mesh_id_corner_brackets;
            reticule.screenspace = false;
            reticule.model = glm::translate(glm::mat4(1.0f), glm::vec3(0.0f, -1.0f, RENDER_Z_FOREGROUND + 0.1f));
            reticule.model = glm::scale(reticule.model, glm::vec3(unitsine * kSelectItemReticuleAmplitude + 1.0f));
            reticule.color = kColorWhite;
        }

        // Run button logic on the targeted entry.
        const Rect panel_ui_area = {
                .lo = glm::vec2(- 0.5f, - 1.5f),
                .hi = glm::vec2(+ 0.5f, - 0.5f)};
        const UiButtonState button_state = UiRunButton(&game->ui, panel_ui_area, mouse_world);

        if (button_state != UiButtonState::NORMAL) {
            local_mesh.color.x = Lerp(local_mesh.color.x, kColorWhite.x, unitsine);
            local_mesh.color.y = Lerp(local_mesh.color.y, kColorWhite.y, unitsine);
            local_mesh.color.z = Lerp(local_mesh.color.z, kColorWhite.z, unitsine);
        }

        if (button_state == UiButtonState::TRIGGERED) {
            deploy_hero_to_target_cell = true;
        }
    }

    bool pressed_left = IsNewlyPressed(app_state.keyboard, 'a');
    bool pressed_right = IsNewlyPressed(app_state.keyboard, 'd');

    // Render two arrow-like triangles that let us switch between options.
    {
        const f32 unitsine = UnitSine(game->t);
        const f32 scale = Lerp(kSelectItemArrowScaleLo, kSelectItemArrowScaleHi, unitsine);

        const glm::vec2 halfdims = glm::vec2(0.433f * scale, 0.5f * scale); // NOTE: Rotated

        { // Left triangle
            const glm::vec2 pos = glm::vec2(-kSelectItemArrowOffsetHorz, -1.0f);
            
            const Rect panel_ui_area = {
                    .lo = pos - halfdims,
                    .hi = pos + halfdims};
            const UiButtonState button_state = UiRunButton(&game->ui, panel_ui_area, mouse_world);

            RenderCommandLocalMesh& local_mesh = *GetNextLocalMeshRenderCommand(command_buffer);
            local_mesh.local_mesh_handle = game->local_mesh_id_triangle;
            local_mesh.screenspace = false;
            local_mesh.model =
            glm::scale(
                glm::rotate(
                    glm::translate(glm::mat4(1.0f), glm::vec3(pos.x, pos.y, RENDER_Z_FOREGROUND)),
                    glm::radians(90.0f),
                    glm::vec3(0.0f, 0.0f, 1.0f)
                ),
                glm::vec3(scale, scale, 1.0f)
            );
            local_mesh.color = kColorGold;

            if (button_state != UiButtonState::NORMAL) {
                local_mesh.color = glm::mix(local_mesh.color, kColorWhite, unitsine);
            }

            // Check for pressing the button.
            if (button_state == UiButtonState::TRIGGERED) {
                pressed_left = true;
            }
        }
        { // Right triangle
            const glm::vec2 pos = glm::vec2(kSelectItemArrowOffsetHorz, -1.0f);
            
            const Rect panel_ui_area = {
                    .lo = pos - halfdims,
                    .hi = pos + halfdims};
            const UiButtonState button_state = UiRunButton(&game->ui, panel_ui_area, mouse_world);

            RenderCommandLocalMesh& local_mesh = *GetNextLocalMeshRenderCommand(command_buffer);
            local_mesh.local_mesh_handle = game->local_mesh_id_triangle;
            local_mesh.screenspace = false;
            local_mesh.model = 
            glm::scale(
                glm::rotate(
                    glm::translate(glm::mat4(1.0f), glm::vec3(pos.x, pos.y, RENDER_Z_FOREGROUND)),
                    glm::radians(-90.0f),
                    glm::vec3(0.0f, 0.0f, 1.0f)
                ),
                glm::vec3(scale, scale, 1.0f)
            );
            local_mesh.color = kColorGold;

            if (button_state != UiButtonState::NORMAL) {
                local_mesh.color = glm::mix(local_mesh.color, kColorWhite, unitsine);
            }

            // Check for pressing the button.
            if (button_state == UiButtonState::TRIGGERED) {
                pressed_right = true;
            }
        }
    }


    // Process the presses, which can come from keys or clicking the arrows.
    if (pressed_left) {
        CircularDecrement(action_data->targeted_entry, (int)action_data->n_entries);
    } else if (pressed_right) {
        CircularIncrement(action_data->targeted_entry, (int)action_data->n_entries);
    }

    if (deploy_hero_to_target_cell) {
        action_data->entry_selected = true;
    }
}

// ------------------------------------------------------------------------------------------------
bool IsActionCommitted_Deploy(const void* data) {
    const DeployActionData* action_data = (DeployActionData*)data;
    return action_data->entry_selected;
}

// ------------------------------------------------------------------------------------------------
bool BuildSchedule_Deploy(GameState* game, void* data) {
    ScreenState_Active& active = game->screen_state_active;

    const Hero* hero = active.stage.pool_hero.GetAtIndex(active.i_actor);
    const CellIndex src = hero->cell_index;
    const CellIndex dst = {src.x, (u16)(src.y - 1)};

    // Build the schedule, which changes the hero's level state and moves them to the entry (1 cell down).
    Schedule* schedule = &(game->screen_state_active.schedule);
    
    {
        schedule->n_timeline = 2;
        schedule->timeline = (Event*)Allocate(&(active.action_allocator), schedule->n_timeline * sizeof(Event));
        ASSERT(schedule->timeline != nullptr, "BuildSchedule_Deploy: Failed to allocate timeline!");
        
        CreateEventSetHeroLevelState(schedule->timeline, /*index=*/0, /*beat=*/0, hero->id, HERO_LEVEL_STATE_IN_LEVEL);
        CreateEventMove(schedule->timeline + 1, /*index=*/1, /*beat=*/0, hero->id, Direction::DOWN, src, dst);
    }

    // The outcome is the same.
    schedule->n_outcome = 1;
    schedule->outcomes = schedule->timeline;

    // Undo: TODO

    return true;
}

Move Actions

The move action is significantly more complicated than the deploy or pass actions.

The move action highlights all valid target cells, allowing the player to select one to move to. Once selected, the shortest path to that cell is taken by the hero, and all consequences are simulated (e.g. falling, triggering a trap) and built into a schedule.

The move action UI, which here has three valid target cells (since falling ends the search). The tile under the mouse cursor is highlighted in yellow, has square angle brackets, and the shortest path is shown as a trail of equilateral triangles.

In order to support these features, the move action logic needs to know what the reachable cells are and what the shortest paths to them are. This is achieved by running Dijkstra’s algorithm from the actor’s initial state. The state space is not merely cell positions, but also includes the actor’s facing direction and their posture (standing, on ladder, etc.).

There is one additional point of complication, and that is that we want this game to only reason about cells that are visible from the hero’s current vantage point, and we eventually want to support non-Euclidean connections like portals:

In this mockup, the key is visible twice because the hallway has a portal loop-back connection.

In order to achieve this, we introduce a new representation of the level geometry visible to an actor, the GridView:

struct GridView {
    // The grid tile the view is centered on
    CellIndex center;

    CellIndex cell_indices[GRID_VIEW_MAX_X][GRID_VIEW_MAX_Y];
    u8 flags[GRID_VIEW_MAX_X][GRID_VIEW_MAX_Y];
};

This view has a finite size, much smaller than the overall level grid, that is big enough to fit the screen. The view is always centered on an actor, and the cells in the view are then indices into the cells in the underlying level grid. If a level grid cell is connected by a non-Euclidean portal to another cell, the view can just index into the correct cells on either side of the portal. Constructing the grid view is a straightforward breadth-first search from the center tile.

Note: I am actually temporarily making this simpler than it really is. To do this properly, we’ll need a fancier data structure that is not a grid but can handle sectors, because one view cell may actually contain view sectors of multiple level cells:

This view cell is visible twice, once with a cell containing a key, with this portal (magenta) set up.

We thus run Dijkstra’s algorithm in this grid view, starting from the center cell where the actor is. The same cell may be visible multiple times in the grid view, and we will correctly be able to route to that cell via multiple paths.

The search assigns a cost for each state change, and only searches up to a maximum cost. Very soon, actors will have action points to spend per turn, and it won’t be possible to move further than are affordable given the action points currently available.

The move action custom data struct is:

struct MoveActionData {
    // We can only ever move to a visible tile, so for now, we can
    // just allocate a grid's worth of potential targets.

    // The cheapest cost to reach each move state.
    u16 costs[GRID_VIEW_MAX_X][GRID_VIEW_MAX_Y][/*num directions=*/2][kNumHeroPostures];

    // The parent state on the shortest path that arrives at the given state.
    // I.E., if [1][2][LEFT][STANDING] contains {2,2,LEFT,STANDING}, then we took a step over.
    // The root state points to itself.
    MoveState parents[GRID_VIEW_MAX_X][GRID_VIEW_MAX_Y][/*num directions=*/2][kNumHeroPostures];

    // Used to track whether a view cell has been visited.
    u32 visit_frame;
    u32 visits[GRID_VIEW_MAX_X][GRID_VIEW_MAX_Y];
    
    ViewIndex view_index_target;  // Target view cell index for the move.
    bool entry_selected;
};

Having completed the search, we are able to render all reachable tiles, render a reticle over the cell the user’s mouse is over, and if the cell is reachable, we can backtrack over the cell’s parents to render the cells traversed to get there. (Since states include more than just cell changes, and we don’t want to render multiple times to the same cell, we also store a u32 visit frame that we can use to mark cells we have rendered to in order to avoid rendering to the same cell multiple times.)

Finally, when the user clicks on a cell to commit the action, we compute the schedule by:

  1. Extracting the shortest path by traversing back to the source node.
  2. Writing the shortest path out one state change at a time.
  3. Simulating consequences (like falling) after every step, and if any consequences do take place, ending the planned schedule there and appending all consequence events.

This process makes a copy of the current state and applies all changes there. Making a copy, while taking memory, has the advantage of not polluting the actual game state and giving us a second Stage to compare the current Stage to in order to get the overall schedule delta.

Conclusion

With the action system in place, the foundation is laid for actual gameplay. Next time, we’ll look at introducing some core entities back in (ropes, buckets, relics) and managing the overall gameplay cycle.

Into Depth – Unified Triangle Shader

Into Depth is the name of a little game project I have been working on for over a year now. Its roots go back even further, to the 2D platformer I had started working on two years ago. The platformer ended up teaching me a lot about OpenGL, which I have continued to use in a fairly low-level manner in this new project.

Concept art of stone ruins with multiple characters running around.

Concept image by ChatGPT.

In a nutshell, the game is a simplification of the 2D platformer born both from managing complexity and wanting something I could reasonably make multiplayer and play with my friends online. Into Depth is a turn-based 2D sidescroller on a grid. Kind of like Gloomhaven, but vertical:

A square grid with solid walls, a hero in one tile, a rope extending several tiles down from an entryway in the ceiling, and a key. Two edges are marked 'a'.

An early mockup from my Google slides design doc.

The team of heroes descends into dungeons (e.g., down the rope in the image above) with the goal of retrieving relics. Oftentimes these relics are heavy and require hauling to get them back to the surface. Collected relics can be equipped and used in other levels, and teams may revisit levels in order to get even deeper. That’s the rough idea.

There are additional details, of course. For example, I want non-Euclidean geometry to play a role. In the screen above, the two edges labeled ‘a’ are joined, such that the hero’s view is:

Similar to the previous image, but expanded such that looking through edge 'a' is possible.

The hero can see the key from two different vantage directions because of the portal.

This is made more interesting by having each hero only be able to see what is in their line of sight. Not only does that add an element of discovery and the need to juggle teammates’ perspectives, it also enables stealth mechanics.

An in-game screenshot centered on the hero, with shadow casting blocking things not in sight. An enemy is visible twice due to portal effects.

Solid geometry blocks the view of what is behind it. This image was from before I went orthographic.

I am writing this myself, for fun, in my spare time. My primary goal is to learn how things work, and to actually get something playable for myself and my friends.

I am using C++ and VS Code, on Linux. I try to minimize library use, but leverage SDL2 as my platform library, use GLAD for OpenGL, the GLM types, and several Sean Barrett STB libraries. This includes avoiding the standard libraries, such as std::vector and std::map, and I’m trying to use my own string view s8 as defined by Chris Wellons instead of std::string. I actually haven’t been using ImGui in this project, but I probably will once I have more of an editor.

Why these restrictions? Because I am a big fan of Casey Muratori, Jonathan Blow, and the do-it-yourself attitude of understanding your code. It is a great way to learn how it all works, and it is fun! Presumably it leads to smaller executables and faster code too.

The art you see here is programmer art, since I am a programmer. That is good enough for now. My long-term vision is to have a low-poly aesthetic and mostly not use sprites, since I should be able to author low-poly meshes myself. The scene will be 3D with effects like lighting, but will be rendered in orthographic since that just makes it so much easier to see and work with the grid.

Font Glyphs + Sprites

I was most recently working on some basic shaders for the game. I had just added font rendering with its own shader for rendering glyph quads. A UI system typically needs things beyond just font glyphs, like UI panels and sprite icons, and those are mostly just basic quads as well. It seemed obvious that I could share a shader for the dual uses of rendering font quads and rendering sprite quads:

A vertex consisting of pos, uv, and color. Uniforms are a proj_view matrix and a texture. A font glyph is defined as a 2-triangle quad with uv coordinates indexing into an atlas.

The shader is really simple. In the vertex shader, the position is just:

gl_Position = proj_view * vec4(aPosWorld, 1.0);

And then the fragment shader uses the input color as a componentwise multiplier after looking up the texture value via the UV coordinate:

vec4 tex_color = texture(u_texture, uv);
FragColor = color * texture_rgba;

The font atlas has white glyphs, which means this color multiplier produces fonts that look like we want. Colored sprites are typically rendered with color = [1,1,1,1] so that we don’t affect them, but you can also tint them by changing up the values.

Rendering quads is also straightforward, but when you’re doing it all yourself, there is still a fair amount of stuff to keep track of. Every glyph in the font has a packed quad that tells us its size and its UV coordinates in the font atlas. We write the quad vertices into a VBO, and rather than writing six vertices to get the two triangles per quad, we only write the four unique ones and use an EBO to define the quads by indexing into the vertex array (as is standard practice):

A two-triangle quad showing four vertices in a VBO and six indices in an EBO. This uses 4 vertices * 9 f32s = 4*36 = 144 bytes in the VBO and 6*4 = 24 bytes in the EBO.

Each frame, the backend fills up a vertex buffer and sends it to the VBO for rendering. This only needs to happen once, if the sprites and the font glyphs are all packed into the same texture and everything uses the same projection and view matrix.

Render Commands

The core game code does not directly fill the vertex buffer. Instead the core game code issues more simple render commands. These commands are simple, higher-level descriptions of what we want rendered, making the core game code fast and separating out the task of deciding how to render everything up to the backend.

A simple flowchart showing Core::Tick() followed by Backend::Render().

For example, a render text command is:

struct RenderCommandTextLine {
    FontHandle font_handle;
    glm::vec2 pos;       // Bottom-left of the first glyph
    f32 letter_size;     // Multiplier off of default
    f32 letter_spacing;  // Multiplier off of default
    s8 text;             // Must last until rendered
    glm::vec4 color;     // RGBA componentwise color multipliers
};

It is up to the backend to convert that higher-level specification into the actual quad vertices, moving that non-trivial logic out of the core game loop and into its own location. The render command thus acts as an API between the game and the backend. Any new backend that we support in the future must be able to render these things.

In the future, when I might have partially transparent objects, the backend can do more sophisticated things like render all opaque geometry first and then sort the transparent objects by depth and render them in a second pass. Something like that is very hard to do when the core logic is rendering as it goes.

Including Triangles

So far, we have a shader that can render both font glyphs and sprites. That’s nice, but I want Into Depth to have a low-poly, flat-color aesthetic. I wanted to add UI panels that underlay the text, and I want those to be tessellated:

A text box backed by a panel consisting of a solid quad surounded by a low-poly border.

A panel like that is comprised of a relatively small number of triangles when compared to a modern 3D character mesh (~40 vs. 1000’s). It should be cheap enough for us to directly write that geometry out every frame.

Our font + sprite shader can actually be used to draw basic triangles, we just need to point all three vertices of the triangle to all-white uv coordinates so that we get the color we want. This is achieved by including a \(1 \times 1\) white sprite.

I thus define some additional render commands, such as RenderCommandPanel, that act as high-level descriptions of what should be rendered, and then have the backend construct the mesh live in the vertex buffer and send it over. Doing this does not use the quad-specific EBO, so for now I’m just using a glDrawArrays call instead of a glDrawElements call.

I did make one adjustment to the shader – rather than switching out the textures depending on my use, I just have an array of textures that I can leave bound and have each vertex declare which one its UV coordinate applies to:

The shader now includes `int texture_index` for each vertex and the uniforms now include an array of textures.

I pack my fonts into a font atlas that is uploaded as the first texture, then have a series of sprite atlases.

Sprite Packing

I need to bake my font glyphs and my sprites into a series of textures that my shader can then use. There are plenty of tools that do sprite packing for you. I was already using stb_truetype and stb_rect_pack to create my font atlas, so I decided to write my own sprite packer on top of stb_rect_pack.

A collection of rectangular images on the left are closely packed into a texture on the right.

Conceptually, it is really simple. I give it a set of individual sprites, and it keeps trying to pack as many as possible into a new texture until everything is packed. The only little gotcha is that a \(w \times h\) sprite is packed as a \(w+2 \times h+2\) rectangle in order for it to be padded with a 1-pixel extended border to fight edge bleeding caused by inaccuracies in UV coordinates between mipmap levels.

Shader Code Organization

Beyond assets, shaders themselves tend to more complexity than you’d think because the code to use them ends up spread all over the codebase. The vertex and fragment shader definitions are often two separate .glsl files, which live in their own folder alongside the other shaders. The C++ side typically needs a vertex struct defined somewhere, and then we need both CPU-side vertex buffers and references to the GPU-side shader program IDs, VAO, VBO, and EBOs. Whenever we use a shader, we need to activate it, which involves more code that has to use these references and set various shader uniforms.

I’ve adopted a few enhancements to reduce the degree of code spread.

The first thing I did was unify the vertex and fragment shaders into a single .glsl file. Since we just read in the file and upload it to the GPU before calling glCompileShader, we can actually process out the necessary portions ourselves. Writing both shaders in the same file helps keep their interdependencies consistent.

#version 330 core

#ifdef VERTEX_SHADER

layout (location = 0) in vec3 aPosWorld;
layout (location = 1) in vec4 aVertexColor; // rgba
layout (location = 2) in vec2 aTexCoords;

out vec4 vertex_color;
out vec2 tex_coords;

uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;

void main()
{
gl_Position = projection * view * model * vec4(aPosWorld, 1.0);
vertex_color = aVertexColor;
tex_coords = aTexCoords;
}

#endif

//////////////////////////////////////////////////////////////////////////

#ifdef FRAGMENT_SHADER

in vec4 vertex_color;
in vec2 tex_coords;

out vec4 FragColor;

uniform sampler2D atlas;

void main()
{
float atlas_value = texture(atlas, tex_coords).r;
FragColor = vec4(vertex_color.rgb, vertex_color.a * atlas_value);
}

#endif

This was nice, but I actually ended up further simplifying by having a single .h file that defines the shader data relevant to the CPU side. The vertex and fragment shaders are still defined in one place, but as explicit strings, and we additionally define a vertex layout:

#pragma once

#include "util/strings.h"
#include "util/vertex_layout.h"

constexpr int kNumUnlitTriangleShaderTextures = 8;

// The vertex data structure (CPU-side)
struct UnlitTriangleVertex {
    glm::vec3 pos;
    glm::vec4 rgba;
    glm::vec2 uv;
    int texture_index;
};

// The shader definition
struct UnlitTriangleShaderDefinition {
    static constexpr const char* kName = "UnlitTriangle";

    static constexpr VertexElement kVertexElements[4] = {
        {ShaderDataType::Vec3F, "aPosWorld", false, offsetof(UnlitTriangleVertex, pos          )},
        {ShaderDataType::Vec4F, "aRGBA",     false, offsetof(UnlitTriangleVertex, rgba         )},
        {ShaderDataType::Vec2F, "aUV",       false, offsetof(UnlitTriangleVertex, uv           )},
        {ShaderDataType::I32,   "aTexIndex", false, offsetof(UnlitTriangleVertex, texture_index)},
    };

    static constexpr VertexLayout kVertexLayout = {
        kVertexElements,
        sizeof(kVertexElements) / sizeof(kVertexElements[0]),
        sizeof(UnlitTriangleVertex)
    };

    static constexpr const char* kVertexShaderSource = R"(
        #version 330 core

        layout (location = 0) in vec3 aPosWorld;
        layout (location = 1) in vec4 aRGBA;
        layout (location = 2) in vec2 aUV;
        layout (location = 3) in int  aTexIndex;

             out vec4 rgba;
             out vec2 uv;
        flat out int  texture_index; // 'flat' means do not interpolate

        uniform mat4 proj_view; // proj * view

        void main()
        {
            gl_Position = proj_view * vec4(aPosWorld, 1.0);
            rgba = aRGBA;
            uv = aUV;
            texture_index = aTexIndex;
        }
    )";

    static constexpr const char* kFragmentShaderSource = R"(
        #version 330 core

             in vec4 rgba;
             in vec2 uv;
        flat in int  texture_index;

        out vec4 FragColor;

        uniform sampler2D textures[8];

        void main()
        {
            vec4 texture_rgba;
            
            // The 'switch trick' is faster/safer on some drivers.
            switch(texture_index) {
                case 0:  texture_rgba = texture(textures[0], uv); break;
                case 1:  texture_rgba = texture(textures[1], uv); break;
                case 2:  texture_rgba = texture(textures[2], uv); break;
                case 3:  texture_rgba = texture(textures[3], uv); break;
                case 4:  texture_rgba = texture(textures[4], uv); break;
                case 5:  texture_rgba = texture(textures[5], uv); break;
                case 6:  texture_rgba = texture(textures[6], uv); break;
                case 7:  texture_rgba = texture(textures[7], uv); break;
                default: texture_rgba = vec4(1.0,0.0,1.0,1.0); break; // magenta
            }
            
            FragColor = rgba * texture_rgba;
        }
    )";
};

The vertex layout is just a programmatic representation of the inputs to the Vertex shader. This avoids having to have VBO-creation code that lives elsewhere depend on this shader. We can go from code like this:

// vertex positions
u8 attribute_index = 0;
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, position));
// vertex normals
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, normal));
// uv coordinates
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 2, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, uv));
// bone ids (max 4)
glEnableVertexAttribArray(attribute_index);
glVertexAttribIPointer(attribute_index++, 4, GL_INT, sizeof(Vertex),
                       (void*)offsetof(Vertex, bone_ids));
// bone weights (max 4)
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 4, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, bone_weights));

to just running:

for (u32 i = 0; i < vertex_layout.n_elements; i++) {
    SetUpVertexAttrib(i, vertex_layout.elements[i], vertex_layout.stride);
}

where SetUpVertexAttrib looks at the element type and executes the appropriate logic.

void SetUpVertexAttrib(const u8 attribute_index, const VertexElement& element, const size_t stride) {
    glEnableVertexAttribArray(attribute_index);
    
    switch (element.type) {
    case (ShaderDataType::F32):
        glVertexAttribPointer(attribute_index, 1, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::Vec2F):
        glVertexAttribPointer(attribute_index, 2, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::Vec3F):
        glVertexAttribPointer(attribute_index, 3, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::Vec4F):
        glVertexAttribPointer(attribute_index, 4, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::I32):
        glVertexAttribIPointer(attribute_index, 1, GL_INT, stride, (void*)element.offset);
        break;
    }
}

My OpenGL backend has the general SetUpVertexAttrib, but it doesn’t have to change if we change the shader. The backend additionally has structs for the GPU object references, the shader uniforms, and their locations:

struct UnlitTriangleShaderRefs {
    GLuint shader_program_id;

    GLuint vao; // vertex array object
    GLuint vbo; // vertex buffer object
    size_t vbo_tri_capacity;  // Number of triangles we have space for in the VBO.

    GLuint ebo_quads; // prepopulated with quad indices
};

struct UnlitTriangleShaderUniforms {
    GLint uniformloc_proj_view; // proj * view from glGetUniformLocation
    GLint uniformloc_textures;

    GLuint texture_ids[kNumUnlitTriangleShaderTextures];
};

This organization scheme has, so far, been much nicer to work with.

Conclusion

When you are writing everything yourself, there is a lot to keep track of. Most of my previous attempts at coding a game “from scratch” struggled under mounting complexity. Unifying font, sprite, and basic unlit geometry rendering under one shader helps alleviate that complexity. Even more so, having good separations of concern between the core game logic and the backend lets the game logic just have simple references to assets and focus on the game, and the backend figure out how to fill vertex buffers and ship them to the GPU.

Happy New Year!

3D 2-Bone Inverse Kinematics

I am having fun figuring out how to write a 2D sidescroller from scratch, with 3D graphics. I’ve covered how posing and animating those skeletal meshes works. This post extends that work with inverse kinematics, which allows for the automatic placement of hands or feet at specific locations.

This wonderful low-poly knight comes from Davmid at SketchFab.

Some Recap

A character’s pose is defined by the transforms of their skeleton. A skeleton is made up of bones, where each bone is translated, offset, and scaled with respect to a parent bone.

The \(i\)th bone’s transform is given by:

\[\begin{aligned}\boldsymbol{T}_i &= \text{trans}(\boldsymbol{\Delta}_i) \cdot \text{rot}(\boldsymbol{q}_i) \cdot \text{scale}(\boldsymbol{s}) \\ &= \begin{bmatrix}1 & 0 & 0 & \Delta_x \\ 0 & 1 & 0 & \Delta_y \\ 0 & 0 & 1 & \Delta_z \\ 0 & 0 & 0 & 1\end{bmatrix} \cdot \begin{bmatrix} 1 – 2(q_y^2+q_z^2) & 2(q_x q_y – q_w q_z) & 2(q_x q_z + q_w q_y) & 0 \\ 2(q_x q_y + q_w q_z) & 1 – 2(q_x^2 + q_z^2) & 2(q_y q_z – q_w q_x) & 0 \\ 2(q_x q_z – q_w q_y) & 2(q_y q_z – q_w q_x) & 1 – 2(q_x^2 + q_y^2) & 0 \\ 0 & 0 & 0 & 1\end{bmatrix} \cdot \begin{bmatrix}s_x & 0 & 0 & 1 \\ 0 & s_y & 0 & 1 \\ 0 & 0 & s_z & 1 \\ 0 & 0 & 0 & 1\end{bmatrix}\end{aligned}\]

where \(\boldsymbol{\Delta}_i\) is its translation vector, \(\boldsymbol{q}_i\) is its orientation quaternion, and \(\boldsymbol{s}\) is its scaling vector. The transforms are \(4\times 4\) matrices because we’re using homogeneous coordinates such that we can support translations. A 3D position is recovered by dividing each of the first three entries by the fourth entry.

The resulting transform converts homogeneous locations in the bone’s local coordinate frame into the coordinate frame of its parent:

\[\boldsymbol{p}_{\text{parent}(i)} = \boldsymbol{T}_i \boldsymbol{p}_i\]

We can keep on applying these transforms all the way to the root bone to transform the bone into the root coordinate frame:

\[\boldsymbol{p}_{\text{root}} = \boldsymbol{T}_{\text{root}} \cdots \boldsymbol{T}_{\text{parent}(\text{parent}(i))} \boldsymbol{T}_{\text{parent}(i)} \boldsymbol{T}_i \boldsymbol{p}_i = \mathbb{T}_i \boldsymbol{p}_i\]

This aggregate transform \(\mathbb{T}_i\) ends up being very useful. If we have a mesh vertex \(\boldsymbol{v}\) defined in mesh space (with respect to the root bone), we can transform it to the \(i\)th bone’s frame via \(\mathbb{S}^{-1}_i \boldsymbol{v}\) according to the aggregate transform of the original skeleton pose and then transform it back into mesh space via \(\mathbb{T}_i \mathbb{S}^{-1}_i \boldsymbol{v}\). That gives us the vertex’s new location according to the current pose.

The Problem

We can define keyframes and interpolate between them to get some nice animations. However, the keyframes are defined in mesh space, and have no way to react to the world around them. If we have a run animation, the player’s foot will always be placed in the same location, regardless of the ground height. If the animation has the player’s arms extend to grab a bar, it can be very easy for the player’s position to be off and the hands not quite be where the edge is.

Here one foot is hovering in the air and the other is sticking into the ground.

The animation system we have thus far produces hand and foot positions with forward kinematics — the positions are based on the successive application of transforms based on bones in the skeleton hierarchy. We want to do the opposite — find the transforms needed such that an end effector like a hand or a foot ends up where we want it to be.

There are several common ways to formulate this problem. I am going to avoid heuristic approaches like cyclic coordinate descent and FABRIK. This method will be closed-form rather than iterative.

Instead of supporting any number of bones, I will stick to 2-bone systems. For example, placing a wrist by moving an upper an lower arm, or placing a foot by moving an upper and lower leg.

Finally, as is typical, I will only allow rotations. We don’t want our bones to stretch or displace in order to reach their targets.

Our inverse kinematic problem thus reasons about four 3D vectors:

  • a fixed hip position \(\boldsymbol{a}\)
  • an initial knee position \(\boldsymbol{b}\)
  • an initial foot position \(\boldsymbol{c}\)
  • a target foot position \(\boldsymbol{t}\)

Our job is to adjust the rotation transform of the hip and knee bones such that the foot ends up as close as possible to the target position.

The image makes it easy to forget that this is all happening in 3D space.

Working it Out

We are going to break this down into two parts. First we will find the final knee and foot positions \(\boldsymbol{b}’\) and \(\boldsymbol{c}’\). Then we’ll adjust the bone orientations to achieve those positions.

Finding the New Positions

There are three cases:

  • we can reach the target location and \(\boldsymbol{c}’ = \boldsymbol{t}\)
  • the bone segments aren’t long enough to reach the target location
  • the bone segments are too long to reach the target location

Let’s denote the segment lengths as \(\ell_{ab}\) and \(\ell_{bc}\), and additionally compute the distance between \(\boldsymbol{a}\) and \(\boldsymbol{t}\):

The target is too close if \(\ell_{at} < |\ell_{ab} – \ell_{bc}|\). When that happens, we extend the longer segment toward the target and the other segment away.

The target is too far if \(\ell_{at} > \ell_{ab} + \ell_{bc}\). When that happens, we extend the segments directly toward the target, to maximum extension.

The remaining, more interesting case is when the target can be reached exactly, and we know \(\boldsymbol{c}’ = \boldsymbol{t}\). In that case we still have to figure out where the knee \(\boldsymbol{b}’\) ends up.

The final knee position will lie on a circle that is the intersection of two spheres: the sphere of radius \(\ell_{ab}\) centered at \(\boldsymbol{a}\) and the sphere of radius \(\ell_{bc}\) centered at \(\boldsymbol{t}\):

We can calculate \(\theta\) from the law of cosines:

\[\cos \theta = \frac{\ell_{bc}^2 – \ell_{ab}^2 – \ell_{at}^2}{-2 \ell_{ab} \ell_{at}} \]

The radius of the circle that \(\boldsymbol{b}’\) lies on is:

\[r = \ell_{ab} \sin \theta = \ell_{ab} \sqrt{1 – \cos^2 \theta}\]

The center of the circle is then

\[\boldsymbol{m} = \boldsymbol{a} \> – \> \ell_{ab} \cos \theta \hat{\boldsymbol{n}}\]

where \(\hat{\boldsymbol{n}}\) is the unit vector from \(\boldsymbol{t}\) to \(\boldsymbol{a}\).

We can place the knee anywhere we want on this circle using some unit vector \(\hat{\boldsymbol{u}}\) perpendicular to \(\hat{\boldsymbol{n}}\):

\[\boldsymbol{b}’ = \boldsymbol{m} + r \hat{\boldsymbol{u}}\]

For the inverse kinematics to look nice, I want to move the knee as little as possible. As such, let’s try to keep \(\hat{\boldsymbol{u}}\) close to the current knee position:

\[\begin{aligned}\boldsymbol{v} &= \boldsymbol{b} \> – \boldsymbol{m} \\ \boldsymbol{u} &= \boldsymbol{v} \> – (\boldsymbol{v} \cdot \hat{\boldsymbol{n}}) \hat{\boldsymbol{n}} \\ \hat{\boldsymbol{u}} &= \text{normalize}(\boldsymbol{u}) \end{aligned}\]

We can implement this in code as follows:

f32 len_ab = glm::length(b - a);
f32 len_bc = glm::length(c - b);
f32 len_at = glm::length(t - a);

// Unit vector from t to a.
glm::vec3 n = (a - t) / len_at;
if (len_at < 1e-3f) {
    n = glm::vec3(1.0f, 0.0f, 0.0f);
}

bool reached_target = false;
glm::vec3 b_final, c_final;  // The final positions of the knee and foot.
if (len_at < std::abs(len_ab - len_bc)) {
    // The target is too close.
    // Extend the longer appendage toward it and the other away.
    int sign = (len_ab > len_bc) ? 1 : -1;
    b_final = a - sign * len_ab * n;
    c_final = b_final + sign * len_bc * n;

} else if (len_at <= len_ab + len_bc) {
    // The target is reachable
    reached_target = true;
    c_final = t;

    // The final knee position b_final will lie on a circle that is the intersection of two
    // spheres: the sphere of radius len_ab centered at a and the sphere of radius len_bc
    // centered at t.

    // Cosine of the angle t - a - b_final
    f32 cos_theta =
        (len_bc * len_bc - len_ab * len_ab - len_at * len_at) / (-2 * len_ab * len_at);
    f32 sin_theta = std::sqrt(1.0 - cos_theta * cos_theta);

    // Radius of the circle that b_final must lie on.
    f32 r = len_ab * sin_theta;

    // The center of the circle that b_final must lie on.
    glm::vec3 m = a - len_ab * cos_theta * n;

    // Unit vector perpendicular to n and pointing at b from m.
    // If b and m are coincident, we default to any old vector perpendicular to n.
    glm::vec3 u = GetPerpendicularUnitVector(n, b - m);

    // Compute the new knee position.
    b_final = m + r * u;

} else {
    // The target is too far.
    // Reach out toward it to max extension.
    b_final = a - len_ab * n;
    c_final = b_final - len_bc * n;
}

Adjusting the Orientations

Now that we have the updated knee and foot positions, we need to adjust the bone transforms to rotate the segments to match them.

The bone transforms are all defined relative to their parents. Let our end effector (the hand or foot) be associated with bone \(i\), its parent (the elbow or knee) be associated with bone \(j\), and its parent (the hip or shoulder) be associated with bone \(k\). We should already have the aggregate transforms for these bones computed for the existing pose.

First we’ll rotate bone \(k\) to point toward \(\boldsymbol{b}’\) instead of \(\boldsymbol{b}\). We need to calculate both positions in the bone’s local frame:

\[\begin{aligned}\boldsymbol{b}_\text{local} &= \mathbb{T}_k^{-1} \> \boldsymbol{b} \\ \boldsymbol{b}’_\text{local} &= \mathbb{T}_k^{-1} \> \boldsymbol{b}’ \end{aligned}\]

We then normalize these vectors and find the rotation that between them. A minimum rotation between two unit vectors \(\hat{\boldsymbol{u}}\) and \(\hat{\boldsymbol{v}}\) has an axis of rotation:

\[\boldsymbol{a} = \hat{\boldsymbol{u}} \times \hat{\boldsymbol{v}}\]

and a rotation angle:

\[\theta = \cos^{-1}\left(\hat{\boldsymbol{u}} \cdot \hat{\boldsymbol{v}}\right)\]

// Find the minimum rotation between the two given unit vectors.
// If the vectors are sufficiently similar, return the unit quaternion.
glm::quat GetRotationBetween(const glm::vec3& u, const glm::vec3& v, const f32 eps = 1e-3f) {
    glm::vec3 axis = glm::cross(u, v);
    f32 len_axis = glm::length(axis);
    if (len_axis < eps) {
        return glm::quat(0.0f, 0.0f, 0.0f, 1.0f);
    }

    axis /= len_axis;
    f32 angle = std::acos(glm::dot(u, v));
    return glm::angleAxis(angle, axis);
}

We then adjust the orientation of bone \(k\) using this axis and angle to get an updated transform, \(\boldsymbol{T}_k’\). We then propagate that effect down to the child transforms.

We do the same thing with bone \(j\), using the updated transform. The code for this is:

glm::mat4 iTTk = glm::inverse(TTk);
glm::vec3 local_currt = glm::normalize(To3d(iTTk * glm::vec4(b, 1.0f)));
glm::vec3 local_final = glm::normalize(To3d(iTTk * glm::vec4(b_final, 1.0f)));
frame->orientations[grandparent_bone] =
frame->orientations[grandparent_bone] * GetRotationBetween(local_currt, local_final);

(*bone_transforms)[grandparent_bone] =
(*bone_transforms)[grandgrandparent_bone] * CalcTransform(*frame, grandparent_bone);
(*bone_transforms)[parent_bone] =
(*bone_transforms)[grandparent_bone] * CalcTransform(*frame, parent_bone);

// Do the same with the next bone.
glm::mat4 TTj_new = (*bone_transforms)[parent_bone];
local_currt = glm::normalize(To3d(glm::inverse(TTj) * glm::vec4(c, 1.0f)));
local_final = glm::normalize(To3d(glm::inverse(TTj_new) * glm::vec4(c_final, 1.0f)));
frame->orientations[parent_bone] =
frame->orientations[parent_bone] * GetRotationBetween(local_currt, local_final);

Conclusion

This approach to inverse kinematics is efficient and effective. I’ve been using it to refine various action states, such as placing hands and feet onto ladder rungs:

It works pretty well when the hand or foot doesn’t have to be adjusted all that much. For larger deltas it can end up picking some weird rotations because it does not have any notion of maximum joint angles. I could add that, of course, at the expense of complexity.

Its really nice to be able to figure something out like this from the basic math foundations. There is something quite satisfying about being able to lay out the problem formulation and then systematically work through it.

Cheers!

Skeletal Animation

In the previous post, I updated the sidescroller project with skeletal meshes — meshes that can be posed. In this post we animate those meshes.

As a fun fact, we’re about 4 months into this project and I’ve written about 5600 lines of code. It keeps piling up.

Animations

An animation defines where the \(m\) mesh vertices are as a function of time:

\[\vec{p}_i(t) \text{ for } i \text{ in } 1:m\]

That’s rather general, and would require specifying a whole lot of functions for a whole lot of mesh vertices. We’ll instead leverage the mesh skeletons from the last blog post, and define an animation using a series of poses, also called keyframes:

5 keyframes from the walk animation

A single keyframe defines all of the bone transforms that pose the model in a certain way. A series of them acts like a series of still frames in a movie – they act as snapshots of the mesh’s motion over time:

When we’re in-between frames, we have to somehow interpolate between the surrounding frames. The simplest and most common method is linear interpolation, where we linearly blend between the frames based on the fraction of time we are between them:

Here \(\alpha\) ranges from 0 at the previous frame and 1 at the next frame:

\[\alpha = \frac{t \> – \> t^{(k)}}{t^{(k+1)} – t^{(k)}}\]

Each frame is a collection of bone transforms. We could interpolate between the transform matrices directly, but it is more common to decompose those into position \(\vec{p}\), orientation \(\vec{q}\), and scale \(\vec{s}\), and interpolate between those instead. If we’re a fraction \(\alpha\) between frames \(f^{(k)}\) and \(f^{(k+1)}\), then the transform components for the \(i\)th bone are:

\[\begin{align} \vec{p}_i(\alpha) & = (1- \alpha) \vec{p}^{(k)}_i + \alpha \vec{p}^{(k+1)}_i \\ \vec{q}_i(\alpha) & = \texttt{slerp}(\vec{q}^{(k)}_i, \vec{q}^{(k+1)}_i, \alpha) \\ \vec{s}_i(\alpha) & = (1- \alpha) \vec{s}^{(k)}_i + \alpha \vec{s}^{(k+1)}_i \end{align}\]

You’ll notice that the position and scaling vectors use linear interpolation, but we’re using spherical linear interpolation for the rotation. This is why we decompose it – basic linear interpolation doesn’t work right for rotations.

Conceptually, animation is super simple. A keyframe just stores some components and a duration:

struct Keyframe {
    f32 duration;
    std::vector<glm::vec3> positions;  // for each bone
    std::vector<glm::quat> orientations;
    std::vector<glm::vec3> scales;
};

and an Animation is just a named list of keyframes:

struct Animation {
    std::string name;
    std::vector<Keyframe> frames;
};

If you want a simple animation, you slap down some spaced-out frames and let interpolation handle what happens in-between. If you want a nice, detailed animation, you can fill out your animation with a higher density of frames, or define special interpolation methods or easing functions to give you a better effect.

Posing a mesh is as simple as interpolating the keyframes and then applying its transforms to the mesh. To do that, its useful to keep track of how long our animation has been playing, and which frames we’re between. To this end we define an AnimationIndex:

struct AnimationIndex {
    f32 t_total;  // elapsed time since the start of the animation
    f32 t_frame;  // elapsed time since the start of the current frame
    int i_frame;  // index of the current frame
};

Advancing the index requires updating our times and the frame index:

AnimationIndex Advance(
    const AnimationIndex& idx,
    f32 dt,
    const Animation& anim)
{
    AnimationIndex retval;
    retval.t_total = idx.t_total + dt;
    retval.t_frame = idx.t_frame + dt;
    retval.i_frame = idx.i_frame;

    const int n_frames = (int)anim.frames.size();
    while (retval.i_frame < n_frames && 
           retval.t_frame > anim.frames[retval.i_frame].duration) {
        retval.t_frame -= anim.frames[retval.i_frame].duration;
        retval.i_frame += 1;
    }

    return retval;
}

The interpolation code almost isn’t worth showing, since its the same as the math:

void Interpolate(
    Keyframe* dst,
    const AnimationIndex& idx,
    const Animation& anim)
{
    int i_lo = idx.i_frame;
    int i_hi = idx.i_frame + 1;
    f32 alpha = 0.0;
    int n_frames = (int)anim.frames.size();
    if (idx.i_frame + 1 >= n_frames) {
        // Animation is done
        i_lo = i_hi = anim.frames.size() - 1;
        alpha = 0.0;
    } else {
        alpha = idx.t_frame / anim.frames[idx.i_frame].duration;
    }

    Interpolate(dst, anim.frames[i_lo], anim.frames[i_hi], alpha);
}

void Interpolate(
    Keyframe* dst,
    const AnimationIndex& idx,
    const Animation& anim)
{
    int i_lo = idx.i_frame;
    int i_hi = idx.i_frame + 1;
    f32 alpha = 0.0;
    int n_frames = (int)anim.frames.size();
    if (idx.i_frame + 1 >= n_frames) {
        // Animation is done
        i_lo = i_hi = anim.frames.size() - 1;
        alpha = 0.0;
    } else {
        alpha = idx.t_frame / anim.frames[idx.i_frame].duration;
    }

    Interpolate(dst, anim.frames[i_lo], anim.frames[i_hi], alpha);
}

We can use our interpolated keyframe to pose our mesh:

void PoseMeshToAnimationFrame(
    std::vector<glm::mat4>* bone_transforms_curr,
    std::vector<glm::mat4>* bone_transforms_final,
    const Mesh& mesh,
    const Keyframe& frame)
{
    size_t n_bones = bone_transforms_final->size();
    for (size_t i_bone = 0; i_bone < n_bones; i_bone++) {
        glm::vec3 pos = frame.positions[i_bone];
        glm::quat ori = frame.orientations[i_bone];
        glm::vec3 scale = frame.scales[i_bone];

        // Compute the new current transform
        glm::mat4 Sbone = glm::translate(glm::mat4(1.0f), pos);
        Sbone = Sbone * glm::mat4_cast(ori);
        Sbone = glm::scale(Sbone, scale);

        // Incorporate the parent transform
        if (mesh.bone_parents[i_bone] >= 0) {
            const glm::mat4& Sparent = 
                (*bone_transforms_curr)[mesh.bone_parents[i_bone]];
            Sbone = Sparent * Sbone;
        }

        // Store the result
        const glm::mat4& Tinv = mesh.bone_transforms_orig[i_bone];
        (*bone_transforms_curr)[i_bone] = Sbone;
        (*bone_transforms_final)[i_bone] = Sbone * Tinv;
    }
}

Adding Animations to the Game

Now that we have animations, we want to integrate them into the sidescroller game. We can do that!

These aren’t beautiful masterpieces or anything, since I clumsily made them myself, but hey – that’s scrappy do-it-to-learn-it gamedev.

We are now able to call StartAnimation on a RigComponent, specifying the name of the animation we want. For example, if the player loses contact with the ground, we can start either the falling or jumping animations with a code snippet like this:

if (!player_movable->support.is_supported) {
    // Fall or Jump!
    StartAnimation(player_rig, jump_pressed ? "jump" : "fall");
    game->player_state_enum = PlayerActionState::AIRBORNE;
}

This looks up the requested animation and sets our animation index to its initial value.

There is one additional thing we need to get right. Instantaneously transitioning to a new animation can look jarring. Watch what happens here were I start punching in the air:

The knight’s legs suddenly snap to their idle positions!

What we’d like instead is to blend seamlessly from the pose the model is at when StartAnimation is called to the poses specified by the new animation. We can already blend between any two poses, so we just leverage that knowledge to blend between the initial pose and the animation pose.

We specify a fadeout time \(t_\text{fadeout}\) whenever we start a new animation. For the first \(t_\text{fadeout}\) seconds of our animation, we linearly interpolate between our initial pose based on how much time has elapsed. If we happen to start in a pose close to what the animation plays from, this effect won’t be all that noticeable. If we happen to start in a wildly different pose we’ll move the mesh over more naturally.

Editing Animations

Animations contain a lot of data. There a multiple frames and a bunch of things bones can do per frame. I didn’t want to specify those transforms in code, nor manually in some sort of textfile.

I made a basic animation editor using ImGUI and used that to craft my animations:

Its fairly rudimentary, but gets the job done.

One could argue that I should have stuck with Blender’s animation tools. There are a few advantages to rolling my own editor:

  • Blender doesn’t run on my Linux laptop and keeps crashing
  • My model format has somewhat departed from Blender’s
  • I want to be able to edit hitspheres and hurtspheres down the road
  • My objective is to learn, not turn a profit or release a real game

The right trade-off will of course depend on your unique situation.

Conclusion

Going from pose-able meshes to animations was mainly about figuring out how to tie a bunch of poses together over time. Now that we have that, and a way to cleanly transition between animations, we’re pretty much set.

Animations in big games can be a lot more complicated. We might have overlapping animations for various parts of a mesh, such as facial animations that play simultaneously with body-level animations. We could implement inverse kinematics to help with automatically placing our feet at natural stepping locations (or when climbing to grasp at natural handholds). There are a bunch of fancier techniques that give you tighter control over the mesh too, but at the core of it, what we did here covers the fundamentals.

Next I plan to work on a basic combat system and get some other entities into the game. In order for this to be a proper level, we kind of need that. And maybe we’ll do some ladders.

Posing Meshes with OpenGL

This month I’m continuing the work on the side scroller and moving from static meshes to meshes with animations. This is a fairly sizeable jump as it involves a whole host of new concepts, including bones and armatures, vertex painting, and interpolation between keyframes. For reference, the Skeletal Animation post on learnopengl.com prints out to about 16 pages. That’s a lot to learn. Writing my own post about it helps me make sure I understand the fundamentals well enough to explain them to someone else.

Moving Meshes

Our goal is to get our 3D meshes to move. This primarily means adding having the ability to make our player mesh move in prescribed ways, such as jumping, crouching, attacking, and climbing ladders.

So far we’ve been working with meshes, i.e. vertex and face data that forms our player character. We want those vertices to move around. Our animations will specify how those vertices move around. That is, where the \(m\) mesh vertices are as a function of time:

\[\vec{p}_i(t) \text{ for } i \text{ in } 1:m\]

Unfortunately, a given mesh has a lot of vertices. The low-poly knight model I’m using has more than 1.5k of them, and its relatively tiny. Specifying where each individual vertex goes over time would be an excessive amount of work.

Not only would specifying an animation on a per-vertex level be a lot of work, it would be very slow. One of the primary advantages of a graphics card is that we can ship the data over at setup time and not have to send it all over again:

With the rendering we’re doing now, we just update a few small uniforms every frame to change the player position. The mesh data itself is already on the GPU, stored as a vertex buffer object.

So fully specifying the vertex positions over time is both extremely tedious and computationally expensive. What do we do instead?

When a character moves, many of their vertices typically move together. For example, if my character punches, we expect all of the vertices in their clenched fist to together, their forearm arm to follow. Their upper arm follows that, maybe with a bend, etc. Vertices thus tend to be grouped, and we can leverage the grouping for increased efficiency.

Bones

You know how artists sometimes have these wooden figures that they can bend and twist into poses? That’s the industry-standard way to do 3D animation.

We create an armature, which is a sort of rigid skeleton that we can use to pose the mesh. The armature is made up of rigid entities, bones, which can move around to produce the movement expected in our animation. There are far fewer bones than vertices. We then compute our vertex positions based on the bones positions – a vertex in the character’s fist is going to move with the fist bone.

This approach solves our problems. The bones are much easier to pose and thus build animations out of, and we can continue to send the mesh and necessary bone information to the GPU once at startup, and just send the updated bone poses whenever we render the mesh.

The bones in an armature have a hierarchical structure. Every bone has a single parent and any number of children, except the root bone, which has no parent.

Unlike my oh-so-fancy drawing, bones don’t actually take up space. They are actually just little reference frames. Each bone’s reference frame is given by a transformation \(T\) with respect to its parent. More specifically, \(T\) transforms a point in the bone’s frame to that of its parent.

For example, we can use this transform to get the “location” of a bone. We can get the position of the root bone by transforming the origin of its frame to its parent – the model frame. This is given by \(T_\text{root} \vec{o}\), where \(\vec{o}\) is the origin in homogenous coordinates: \([0,0,0,1]\). Our transforms use 4×4 matrices so that we can get translation, which is ubiquitous in 3D graphics.

Similarly, the position of one of the root bone’s children with transform \(T_\text{c}\) can be obtained by first computing its position in the root bone’s frame, \(T_\text{c} \vec{o}\), and then convert that to the model frame, \(T_\text{root} T_\text{c} \vec{o}\).

The order of operation matters! It is super important. It is what gives us the property that moving a leaf bone only affects that one bone, but moving a bone higher up in the hierarchy affects all child bones. If you ever can’t remember which order it is, and you can’t just quickly test it out in code, try composing two transforms together.

The origin of a bone 4 levels deep is:

\[T_\text{root} T_\text{c1} T_\text{c2} T_\text{c3} T_\text{c4} \vec{o}\]

Let’s call the aggregated transform for bone \(c4\), this product of its ancestors, \(\mathbb{T}_{c4}\).

It is these transformations that we’ll be changing in order to produce animations. Before we get to that, we have to figure out how the vertices move with the bones.

Transforming a Vertex

The vertices are all defined in our mesh. They have positions in model-space.

Each bone has a position (and orientation) in model space, as we’ve already seen.

Let’s consider what happens when we associate a vertex with a bone. That is, we “connect” the vertex to the bone such that, if the bone moves, the vertex moves with it. We expect the vertex to move along in the bone’s local frame:

Here the bone both translates to the right and up, and rotates counter-clockwise about 30 degrees. In the image, the vertex does the same.

The image above has the bone starting off at the origin, with its axis aligned with the coordinate axes. This means the vertex is starting off in the bone’s reference frame. To get the vertex into the bone frame (it is originally defined in the model frame), we have to multiply it by \(\mathbb{T}^{-1}\). Intuitively, if \(\mathbb{T} \vec{p}\) takes a point from bone space to model space then the inverse takes a point from model space to bone space.

The bone moved. That means it has a new transform relative to its parent, \(S\). Plus, that parent bone may have moved, and so forth. We have to compute a new aggregate bone transform \(\mathbb{S}\). The updated position of the vertex in model space is thus obtained by transforming it from its original model space into bone space with \(\mathbb{T}^{-1}\) and then transforming it back to model space according to the new armature configuration with \(\mathbb{S}\):

\[\vec v’ = \mathbb{S} \mathbb{T}^{-1} \vec v\]

We can visualize this as moving a vertex in the original mesh pose to the bone-relative space, and then transforming it back to model-space based on the new armature pose:

This means we should store \(\mathbb{T}^{-1}\) for each bone in the armature – we’re going to need it a lot. In any given frame we’ll just have to compute \(\mathbb{S}\) by walking down the armature. We then compute \(\mathbb{S} \mathbb{T}^{-1}\) and pass that to our shader to properly pose the model.

Bone Weights

The previous section let us move a vertex associated with a single bone. That works okay for very blocky models composed of rigid segments. For example, a stocky robot or simple car with spinning wheels. Most models are less rigid. Yes, if you punch someone you want the vertices in your fist to follow the hand bone, but as you extend your elbow the vertices near the joint will follow both the bone from the upper arm and the bone from the lower arm. We want a way to allow this to happen.

Instead of a vertex being associated with one bone, we allow a vertex to be associated with multiple bones. We have the final vertex combination be a mix of any number of other bones:

\[\vec v’ = \sum_{i=1}^m w^{(i)} \mathbb{S}_i \mathbb{T}_i^{-1} \vec v\]

where the nonnegative weights \(\vec w\) sum to one.

In practice, most vertices are associated with one one or two bones. It is common to allow up to 4 bone associations, simply because that covers most needs and then we can use the 4-dimensional vector types supported by OpenGL.

Data-wise, what this means is:

  • In addition to the original mesh data, we also need to provide an armature, which is a hierarchy of bones and their relative transformations.
  • We also need to associate each vertex with some set of bones. This is typically done via a vec4 of weights and an ivec4 of bone indices. We store these alongside the other vertex data.
  • The vertex data is typically static and can be stored on the graphics card in the vertex array buffer.
  • We compute the inverse bone transforms for the original armature pose on startup.
  • When we want to render a model, we pose the bones, compute the current bone transforms \(\mathbb{S}\), and then compute and sent \(\mathbb{S} \mathbb{T}^{-1}\) to the shader as a uniform.

Coding it Up

Alright, enough talk. How do we get this implemented?

We start by updating our definition for a vertex. In addition to the position, normal, and texture coordinates, we now also store the bone weights:

struct RiggedVertex {
    glm::vec3 position;   // location
    glm::vec3 normal;     // normal vector
    glm::vec2 tex_coord;  // texture coordinate

    // Up to 4 bones can contribute to a particular vertex
    // Unused bones must have zero weight
    glm::ivec4 bone_ids;
    glm::vec4 bone_weights;
};

This means we have to update how we set up the vertex buffer object:

glGenVertexArrays(1, &(mesh->vao));
glGenBuffers(1, &(mesh->vbo));

glBindVertexArray(mesh->vao);
glBindBuffer(GL_ARRAY_BUFFER, mesh->vbo);
glBufferData(GL_ARRAY_BUFFER, mesh->vertices.size() * sizeof(RiggedVertex),
                              &mesh->vertices[0], GL_STATIC_DRAW);

// vertex positions
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(RiggedVertex),
                      (void*)offsetof(RiggedVertex, position));
// vertex normals
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, sizeof(RiggedVertex),
                      (void*)offsetof(RiggedVertex, normal));
// texture coordinates
glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, sizeof(RiggedVertex),
                      (void*)offsetof(RiggedVertex, tex_coord));
// bone ids (max 4)
glEnableVertexAttribArray(3);
glVertexAttribIPointer(3, 4, GL_INT, sizeof(RiggedVertex),
                       (void*)offsetof(RiggedVertex, bone_ids));
// bone weights (max 4)
glEnableVertexAttribArray(4);
glVertexAttribPointer(4, 4, GL_FLOAT, GL_FALSE, sizeof(RiggedVertex),
                       (void*)offsetof(RiggedVertex, bone_weights));

Note the use of glVertexAttribIPointer instead of glEnableVertexAttribArray for the bone ids. That problem took me many hours to figure out. It turns out that glVertexAttribPointer accepts integers but has the card interpret them as floating point, which messes everything up it you indent to actually use integers on the shader side.

As far as our shader goes, we are only changing where the vertices are located, not how they are colored. As such, we only need to update our vertex shader (not the fragment shader). The new shader is:

#version 330 core

layout (location = 0) in vec3 aPos;
layout (location = 1) in vec3 aNormal;
layout (location = 2) in vec2 aTexCoord;
layout (location = 3) in ivec4 aBoneIds; 
layout (location = 4) in vec4 aBoneWeights;

out vec3 pos_world;
out vec3 normal_world;
out vec2 tex_coords;

const int MAX_BONES = 100;
const int MAX_BONE_INFLUENCE = 4;

uniform mat4 model; // transform from model to world space
uniform mat4 view; // transform from world to view space
uniform mat4 projection; // transform from view space to clip space
uniform mat4 bone_transforms[MAX_BONES]; // S*Tinv for each bone

void main()
{
    // Accumulate the position over the bones, in model space
    vec4 pos_model = vec4(0.0f);
    vec3 normal_model = vec3(0.0f);
    for(int i = 0 ; i < MAX_BONE_INFLUENCE; i++)
    {
        j = aBoneIds[i];
        w = aBoneWeights[j]
        STinv = bone_transforms[j];
        pos_model += w*(STinv*vec4(aPos,1.0f));
        normal_model += w*(mat3(STinv)*aNormal);
    }

    pos_world = vec3(model * pos_model);
    normal_world = mat3(model) * normal_model;
    gl_Position = projection * view * model * pos_model;
    tex_coords = vec2(aTexCoord.x, 1.0 - aTexCoord.y);
}

Note that the normal vector doesn’t get translated, so we only use mat3. The vertice’s texture coordinate is not affected.

Testing it Out

Let’s start with something simple, a 2-triangle, 2-bone system with 5 vertices:

The root bone is at the origin (via the identity transform) and the child bone is at [2,1,0]. The vertices of the left triangle are all associated only with the root bone and the rightmost two vertices are only associated with the child bone.

If we slap on a sin transform for the child bone, we wave the right triangle:

If we instead slap a sin transform on the root bone, we wave both triangles as a rigid unit:

We can apply a sin to both triangles, in which case the motions compose like they do if you bend both your upper and lower arms:

We can build a longer segmented arm and move each segment a bit differently:

Posing a Model

Now that we’ve built up some confidence in our math, we can start posing our model (The low-poly knight by Davmid, available here on Sketchfab) The ideas are exactly the same – we load the model, define some bone transforms, and then use those bone transforms to render its location.

I created an animation editing mode that lets us do this posing. As before, the UI stuff is all done with Dear ImGUI. It lets you select your model, add animations, add frames to those animations, and adjust the bone transforms:

A single frame represents an overall model pose by defining the transformations for each bone:

struct Keyframe {
    f32 duration;
    std::vector<glm::vec3> positions;
    std::vector<glm::quat> orientations;
    std::vector<glm::vec3> scales;
};

The position, orientation, and scale are stored separately, mainly because it is easier to reason about them when they are broken out like this. Later, when we do animation, we’ll have to interpolate between frames, and it is also easier to reason about interpolation between these components rather than between the overall transforms. A bone’s local transform \(S\) is simply the combination of its position, orientation, and scale.

We calculate our fine bone transforms \(\mathbb{S} \mathbb{T}^{-1}\) for our frame and then pass that off to the shader for rendering:

size_t n_bones = mesh->bone_transforms_final.size();
for (size_t i_bone = 0; i_bone < n_bones; i_bone++) {
    glm::vec3 pos = frame.positions[i_bone];
    glm::quat ori = frame.orientations[i_bone];
    glm::vec3 scale = frame.scales[i_bone];

    // Compute the new current transform
    glm::mat4 Sbone = glm::translate(glm::mat4(1.0f), pos);
    Sbone = Sbone * glm::mat4_cast(ori);
    Sbone = glm::scale(Sbone, scale);

    if (mesh->bone_parents[i_bone] >= 0) {
        const glm::mat4& Sparent =
            mesh->bone_transforms_curr[mesh->bone_parents[i_bone]];
        Sbone = Sparent * Sbone;
    }

    mesh->bone_transforms_curr[i_bone] = Sbone;

    glm::mat4 Tinv = mesh->bone_transforms_orig[i_bone];
    mesh->bone_transforms_final[i_bone] = Sbone * Tinv;
}

Conclusion

When you get down to it, the math behind rigged meshes isn’t all too bad. The main idea is that you have bones with transforms relative to their parent bone, and that vertices are associated with a combination of bones. You have to implement rigged meshes that store your bones and vertex weights, need a shader for rigged meshes, and a way to represent your model pose.

This main stuff doesn’t take too long to implement in theory, but this post took me way longer to develop than normal because I wrote a bunch of other supporting code (plus banged my head against the wall when things didn’t initially work – its hard to debug a garbled mess).

Adding my own animation editor meant I probably wanted to save those animations to disk. That meant I probably wanted to save my meshes to disk too, which meant saving the models, textures, and materials. Getting all that set up took time, and its own fair share of debugging. (I’m using a WAD-like data structure like I did for TOOM).

The animation editor also took some work. Dear ImGUI makes writing UI code a lot easier, but adding all of the widgets and restructuring a lot of my code to be able to have separate application modes for things like gameplay and the animation editor simply took time. The largest time suck (and hit to motivation) for a larger project like this is knowing you have to refactor a large chunk of it to make forward progress. Its great to reach the other end though and to have something that makes sense.

We’re all set up with posable models now, so the next step is to render animations. Once that’s possible we’ll be able to tie those animations to the gameplay — have the model walk, jump, and stab accordingly. We’ll then be able to attach other meshes to our models, i.e. a sword to its hand, and tie hitspheres and hurtspheres to the animation, which will form the core of our combat system.

Rendering with OpenGL

In the previous post I introduced a new side scroller project, and laid out some of the things I wanted to accomplish with it. One of those points was to render with the graphics card. I had never worked directly with 3D assets and graphics cards before. Unless some high-level API was doing rendering for me under the hood, I wasn’t harnessing the power of modern graphics pipelines.

For TOOM I was not using the graphics card. In fact, the whole point of TOOM was to write a software renderer, which only uses the CPU for graphics. That was a great learning experience, but there is a reason that graphics cards exist, and that reason pretty much is that software renderers spend a lot of time rendering pixels when it’d be a lot nicer to hand that work off to something that can churn through it more efficiently.

Before starting, I thought that learning to render with the graphics card would more-or-less boil down to calling render(mesh). I assumed the bulk of the benefit would be faster calls. What I did not anticipate was learning how rich render pipelines can be, with customized shaders and data formats opening up a whole world of neat possibilities. I love it when building a basic understanding of something suddenly causes all sorts of other things to click. For example, CUDA used to be an abstract, scary performance thing that only super smart programmers would do. Now that I know a few things, I can conceptualize how it works. It is a pretty nice feeling.

Learning OpenGL

I learned OpenGL the way many people do. I went to learnopengl.com and followed their wonderful tutorials. It took some doing, and it was somewhat onerous, but it was well worth it. The chapters are well laid out and gradually build up concepts.

This process had me install a few additional libraries. First off, I learned that one does not commonly directly use OpenGL. It involves a bunch of function pointers corresponding to various OpenGL versions, and getting the right function pointers from the graphics driver can be rather tedious. Thus, one uses a service like GLAD to generate the necessary code for you.

As recommended by the tutorials, I am using stb_image by the legendary Sean Barrett for loading images for textures, and assimp for loading 3D asset files. I still find it hilarious that they can get away with basically calling that last one Butt Devil. Lastly, I’m using the OpenGL mathematics library (glm) for math types compatible with OpenGL. I am still using SDL2 but switched to the imgui_impl_opengl3 backend. These were all pretty easy to use and install — the tutorials go a long way and the rest is pretty self-explanatory.

I did run into an issue with OpenGL where I didn’t have a graphics driver new enough to support the version that learnopengl was using (3.3). I ended up being able to resolve it by updating my OS to Ubuntu 22.04. I had not upgraded in about 4 years on account of relying on this machine to compile the textbooks, but given that they’re out I felt I didn’t have to be so paranoid anymore.

In the end, updating to Jammy Jellyfish was pretty easy, except it broke all of my screen capture software. I previously used Shutter and Peek to take screenshots and capture GIFs. Apparently that all went away with Wayland, and now I need to give Flameshot permission every time I take a screenshot and I didn’t have any solution for GIF recordings, even if login with Xorg.

In writing this post I buckled down and tried Kooha again, which tries to record but then hangs forever on save (like Peek). I ended up installing OBS Studio, recording a .mp4, and then converting that to a GIF in the terminal via ffmpeg:

ffmpeg -ss 1 -i input.mp4 \
  -vf "fps=30,scale=320:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
   -loop 0 2d_to_3d.gif

This was only possible because of this excellent stack overflow answer and this OBS blog post. The resulting quality is actually a lot better than Peek, and I’m finding I have to adjust some settings to make the file size reasonable. I suppose its nice to be in control, but I really liked Peek’s convenience.

Moving to 3D

The side scroller logic was all done in a 2D space, but now rendering can happen in 3D. The side scroller game logic will largely remain 2D – the player entities will continue to move around 2D world geometry:

but that 2D world geometry all exists at depth \(z=0\) with respect to a perspective-enabled camera:

Conceptually, this is actually somewhat more complicated than simply calling DrawLine for each edge, which is what I was doing before via SDL2:

for edge in mesh
    DrawLine(edge)

Rendering in 3D requires:

  • setting up a vertex buffer and filling it with my mesh data, including color
  • telling OpenGL how to interpret said data
  • using a vertex shader that takes the position and color attributes and transforms them via the camera transformations into screen space
  • using a dead-simple fragment shader that just passes on the received vertex colors
  • calling glDrawArrays with GL_LINES to execute all of this machinery

This is not hard to do per se, but it is more complicated. One nice consequence is that there is only a single draw call for all this mesh data rather than a bunch of independent line draw calls. The GPU then crunches it all in parallel:

The coordinate transforms for this are basically what learnopengl recommends:

  • an identity model matrix since the 2D mesh data is already in the world frame
  • a view matrix obtained via glm::lookat based on a camera position and orientation
  • a projection matrix obtained via glm::perspective with a 45-degree viewing angle, our window’s aspect ratio, and \(z\) bounds between 0.1 and 100:
glm::mat4 projection = glm::perspective(
     glm::radians(45.0f),
     screen_size_x / screen_size_y,
     0.1f, 100.0f);

This setup worked great to get to parity with the debug line drawing I was doing before, but I wanted to be using OpenGL for some real mesh rendering. I ended up defining two shader programs in addition to the debug-line-drawing one: one for textured meshes and one for material-based meshes. I am now able to render 3D meshes!

This low-poly knight was created by Davmid, and is available here on Sketchfab. The rest of the assets I actually made myself in Blender and exported to FBX. I had never used Blender before. Yeah, its been quite a month of learning new tools.

The recording above also includes a flashlight effect. This was of course based on the learnopengl tutorials too.

Rendering with shaders adds a ton of additional options for effects, but also comes with a lot of decisions and tradeoffs around how data is stored. I would like to simplify my approach and end up with as few shaders as possible. For now its easiest for me to make models in Blender with basic materials, but maybe I can move the material definitions to a super-low-res texture (i.e., one pixel per material) and then only use textured models.

Architecture

It is perhaps worth taking a step back to talk about how this stuff is architected. That is, how the data structures are arranged. As projects get bigger, it is easier to get lost in the confusion, and a little organization goes a long way.

First off, the 3D stuff is mostly cosmetic. It exists to look good. It is thus largely separate from the underlying game logic:

I created a MeshManager class that is responsible for storing my mesh-related data. It currently keeps track of four things:

  1. Materials – color properties for untextured meshes
  2. Textures – diffuse and specular maps for textured meshes
  3. Meshes – the 3D data associated with a single material or texture
  4. Models – a named 3D asset that consists of 1 or more meshes

I think some structs help get the idea across:

struct Vertex {
    glm::vec3 position;   // location
    glm::vec3 normal;     // normal vector
    glm::vec2 tex_coord;  // texture coordinate
};
struct Material {
    std::string name;
    glm::vec3 diffuse;   // diffuse color
    glm::vec3 specular;  // specular color
    f32 shininess;       // The exponent in the Phong specular equation
};
struct Texture {
    GLuint id;             // The OpenGL texture id
    std::string filepath;  // This is essentially its id
};
struct Mesh {
    std::string name;  // The mesh name

    std::vector<Vertex> vertices;

    // Indices into `vertices` that form our mesh triangles
    std::vector<u32> indices;

    // An untextured mesh has a single material, typemax if none.
    // If the original mesh has multiple materials, 
    // the assimp import process splits them.
    u16 material_id;

    // A textured mesh has both a diffuse and a specular texture
    // (typemax if none)
    u16 diffuse_texture_id;
    u16 specular_texture_id;

    GLuint vao;  // The OpenGL vertex array object id
    GLuint vbo;  // The OpenGL vertex buffer id associated with `vertices`
    GLuint ebo;  // The element array buffer associated with `indices`
};
struct Model {
    std::string filepath;       // The filepath for the loaded model
    std::string name;           // The name of the loaded model
    std::vector<u16> mesh_ids;  // All mesh ids
};

Visually, this amounts to:

All of this data can be loaded from the FBX files via assimp. In the future, I’ll also be writing this data out and back in via my own game data format.

My game objects are currently somewhat of a mess. I have an EntityManager that maintains a set of entities:

struct Entity {
    u32 uid;  // Unique entity id
    u32 flags;

    common::Vec2f pos;               // Position in gamespace

    // A dual quarter edge for the face containing this position
    core::QuarterEdgeIndex qe_dual;

    u16 movable_id;       // The movable id, or 0xFFFF otherwise
    u16 collider_id;      // The collider id, or 0xFFFF otherwise
};

Entities have unique ids and contain ids for the various resources they might be associated with. For now this includes a movable id and a collider id. I keep those in their own managers as well.

A movable provides information around how an entity is moving. Right now it is just just a velocity vector and some information about what the entity is standing on.

A collider is a convex polygon that represents the entity’s shape with respect to collision with the 2D game world:

The player is the only entity that currently does anything, and there is only one collider size from which the Delaunay mesh is built. I’ll be cleaning up the entity system in the future, but you can see where its headed – it is a basic entity component system.

Editing Assets

I want to use these meshes to decorate my game levels. While I could theoretically create a level mesh in Blender, I would much prefer to be able to edit the levels in-game using smaller mesh components.

I created a very basic version of this using ImGUI wherein one can create and edit a list of set pieces:

This is leaps and bounds better than editing the level using a text file, or worse, hard-coding it. At the same time, it is pretty basic. I don’t highlight the selected set piece, I don’t have mouse events hooked up for click and drag, and my camera controls aren’t amazing. I do have some camera controls though! And I can set the camera to orthographic, which helps a lot:

The inclusion of this set piece editor meant that I needed a way to save and load game data. (You can see the import and export buttons on the top-left). I went with the same WAD-file-like approach that I used for TOOM, which is a binary finally consisting of a basic header, a list of binary blobs, followed by a table of contents:

Each blob consists of a name and a bunch of data. It is up to me to define how to serialize or deserialize the data associated with a particular blob type. Many of them just end up being a u32 count followed by a list of structs.

One nice result of using a WAD-like binary is that I don’t always have to copy the data out to another data structure. With TOOM, I was copying things out in the C++ editor but was referencing the data as-is directly from the binary in the C code that executed the game. Later down the road I’ll probably do something similar with a baked version of the assets that the game can refer to.

There are some tradeoffs. Binary files are basically impossible to edit manually. Adding new blob types doesn’t invalidate old save files, but changing how a blob type is serialized or deserialized does. I could add versions, but as a solo developer I haven’t found the need for it yet. I usually update the export method first, run my program and load using the old import method, export with the new method, and then update my import method. This is a little cumbersome but works.

With TOOM I would keep loading and overwriting a single binary file. This was fine, but I was worried that if I ever accidentally corrupted it, I would have a hard time recreating all of my assets. With this project I decided to export new files every time. I have a little export directory and append the timestamp to the filename, ex: sidescroller_2023_12_22_21_28_55.bin. I’m currently loading the most recent one every time, but I still have older ones available if I ever bork things.

Conclusion

This post was a little less targeted than normal. I usually talk about how a specific thing works. In this case, I didn’t want to re-hash how OpenGL works, and instead covered a somewhat broad range of things I learned and did while adding meshes to my game project. A lot of things in gamedev are connected, and by doing it I’m learning what the implications of various decisions are and oftentimes, why things are as they are.

There are a lot of concepts to learn and tools to use. The last month involved:

  • OpenGL
  • GLAD
  • assimp
  • stb_image
  • Blender
  • OBS Studio
  • textures
  • Phong lighting
  • shaders

I am often struck by how something foreign and arcane like using a GPU seems mysterious and hard to do from the outset, but once you dive in you can pull back the curtains and figure it out. Sure, I still have a ton to learn, but the shape of the thing is there, and I can already do things with it.

Happy coding, folks.

A Side Scroller Physics System

With TOOM done’, I found myself wanting to try something new. So, I started a new vscode workspace and embarked on a new side project – a 2D side scroller.

My overarching goal, once again, is to explore and take it as far as I find it interesting. That being said, I want to overcome the challenge of building a stable game with general 2D physics. By that I mean, “vectorial” physics of the sort used by Braid, with arbitrary 2D level geometry:

Braid level game view and level editor view (image from Higher-Order Fun)

One reason for this goal is that I’ve tried to noodle it out before, but was surprised to find a significant lack of resources. In fact, I had to laugh out loud when I read this excellent heavily cited blog post about 2D platformers. It covers tile-based and rastered platforms in great detail, only to leave vectorial platformers with a vague “you do it yourself” and a warning not to use a physics engine. Well, challenge accepted! Let’s get into it!

There are a few other things I want to learn from this adventure. I plan to eventually use 3D GPU-accelerated graphics. Its worth knowing how that works, and the work on TOOM convinced me that drawing with the CPU is super slow. The aesthetic I have in mind is pretty low-poly, so while I’d normally go with pixel art because that’s all I can reasonably attempt myself, I nevertheless want to give 3D a try. The other big reason to go with 3D is that I can build an animation system and animate characters once rather than 8x for the various viewing angles. I was wanting to build a skeletal animation system, but felt that 2D skeletal animation systems very easily feel flat without serious artistic effort.

The general image I have in my mind is a souls-like 2D platformer with movement and combat somewhat similar to playing Link in Super Smash, except with some additional movement mechanics like ladders and other climbable surfaces and no bubble shield. It’ll probably never be a full game but maybe I can make a proper level and boss at the end.

I’ll be using C++ and vscode for this, on Linux. I’ll try to minimize library use, but will leverage SDL2 as my platform library and probably OpenGL for 3D graphics. I love ImGui and will use that for dev UI.

What I tried first Didn’t Work

When I first started, I tried to go with an object system where the world is made of convex polygonal colliders. Seems reasonable:

The player (and one enemy) have colliders represented by these six-sided shapes. The white polygons are static geometry. So far so good.

When an entity moves, it ticks forward by \(\Delta t\). I could have done this the boring way by:

  1. Moving the entity its full distance
  2. Checking for collisions
  3. Shunting the entity out of any colliding polygons

I’ve written about why this is a bad idea. In short, this approach can cause tunneling or other weird physics effects, especially when the entity is moving fast or when there are multiple intersecting bodies.

The system I implemented sought to instead find the first collision. This is the same as taking the source polygon and sweeping it along the direction of travel and finding the longest sweep \(\delta t \in [0, \Delta t]\) that does not intersect anything:

This sweeping method is harder to formulate. We aren’t really colliding the player’s collider with each static collider. Instead, we are colliding the swept player collider with each static collider, but we still want to back out the maximum step we can take in our direction of motion. So we’re effectively sweeping the player collider along \(\Delta t \cdot \boldsymbol{v}\) and then checking for collisions, and we have to find the smallest \(\delta t\) such that the volume swept along \(\delta t \cdot \boldsymbol{v}\) does not collide.

The way I implemented that was to fall back to our trusty strategy of using the Minkowski sum. That is, we can expand the other polygon by the player’s collision volume first, and then all we have to do is check whether the line segment between the player’s starting location and that location shifted by \(\Delta t \cdot \boldsymbol{v}\) intersects with the polygon:

That mostly worked. It just has a lot of trouble with corner cases. For example, when the player is standing on a surface and we’re traveling along it. We don’t want floating point errors to accidentally place the player inside the supporting polygon. This was particularly problematic in the gameplay gif above, where the player ends up wedged between two squares and slides through one of them.

I had a lot of trouble with walking. I wanted the player to be able to walk across colliders along the floor without treating it like a ramp and launching off for a bit:

This means we have to do some hacky checks to figure out when we’re walking along a polygon and reach its end, and may need to transition to walking along a new polygon. I tried getting this all to work reliably, and got pretty far, but ultimately the number of hacks I had to add made the system untenable and left a bad taste in my mouth.

Redesign

The fundamental problem is that my previous system was fully continuous, but that continuity inherently had issues with floating point effects when the player is supported by geometry. In these cases, we want something more discrete that can concretely tell us which side of the geometry we are on. Furthermore, I wanted something that could more easily tell me whether, when walking along a surface, there was another surface to walk on after passing the end of the current segment. These thoughts lead me to the trusty Delaunay mesh data structure.

I’ve written blog posts that use constrained Delaunay meshes (aka DCELs) a few times now. Now that I know this data structure exists, I keep finding uses for it.

Delaunay meshes are triangle meshes in which we try to keep the triangles as equilateral as possible (to avoid very fractured / long and narrow tiles). Constrained Delaunay meshes are the same thing, except we force certain edges to exist. This makes these meshes useful for things like representing map geometry, where we force walls to be edges in the mesh:

The E1M1 map from DOOM, after I’ve loaded it into a constrained Delaunay mesh

I actually advised that one use constrained Delaunay meshes for 2D player movement in this previous blog post. I hadn’t initially followed my own suggestion because the mesh is dependent on both the level geometry and the player collider (each polygon is expanded by the player collider via a Minkowski sum). If the player ducks and their collider shrinks, then we have to change our mesh. If an enemy has a different sized collider, then it should use a different mesh. If a piece of level geometry moves, such as a platform or door, then we need to change the mesh. All of these problems are still problems, but I’ll have to find workarounds for them. Nevertheless, I’m going with exactly what I spelled out in that earlier blog post.

My new plan was to keep track of both the player’s position and the triangle the collision mesh that they find themselves in. The collision mesh already accounts for the player’s collision volume, so we need only keep track of their center position. When they move, we check to see if their motion would take then out of their containing triangle, and if it does, we handle collisions (if necessary) at the crossed edge.

Here we see the updated system, along with the collision mesh.

Building the Mesh

Since I’ve covered how the movement works before, I’d rather focus on how we get the mesh.

This mesh is obtained by expanding each polygon by the player’s collision volume, and then adding all resulting edges to the Delaunay mesh. These edges are constrained to ensure that they remain in the final mesh (otherwise, we might flip edges in order to make them more equilateral). The main difficulty arises in the fact that the edges of the expanded polygons may intersect, and we nevertheless have to support them in our final mesh:

For example, these expanded polygons overlap.

I had previously tackled this using polygon unions via Clipper2. I could do that again, but want to avoid the loss of information that comes with replacing two or more overlapping polygons with a single polygon. For example, I might want to know which feature my player is standing on. For example, I may end up wanting to support removing objects. Plus, figuring out my own solution is a reward unto itself.

What I ended up doing was updating my method for constraining edges in a DCEL. During the development of TOOM, I had made it possible to load the DOOM level geometry and constrain the appropriate edges. Here we just added sides as we went, and constrained them after adding them:

for (Linedef linedef : linedefs) {
     // Insert vertex a and b.
     for (i16 i_vertex : {linedef.i_vertex_start, linedef.i_vertex_end}) {
         const common::Vec2f& v = doom_vertices[i_vertex];
         InsertVertexResult res = mesh.InsertVertex(v);
         if (res.category == IN_FACE || res.category == ON_EDGE) {
             mesh.EnforceLocallyDelaunay(res.i_qe);
             vertex_indices[i_vertex] = mesh.GetVertexIndex(res.i_qe);
         }
     }

     // Now ensure that there is a line between A and B.
     // If there is not, flip edges until there is.
     QuarterEdgeIndex qe_a =
        mesh.GetQuarterEdge(vertex_indices[linedef.i_vertex_start]);
     QuarterEdgeIndex qe_b =
        mesh.GetQuarterEdge(vertex_indices[linedef.i_vertex_end]);
     QuarterEdgeIndex qe_ab = mesh.EnforceEdge(qe_a, qe_b);
     mesh.ConstrainEdge(qe_ab);

     // sidedef and passability stuff...
}

For example, we might add the vertices for A and B, and then end up with a mesh that is locally Delaunay but does not have an edge between A and B:

The call to EnforceEdge then flips CD in order to get the edge we want:

This approach worked for the DOOM maps because the sides there never crossed over one another. For example, if CD is another enforced edge, there was no way to also enforce AB by flipping edges:

I needed a more general way to specify two points A and B and then adjust the mesh such that those two points are joined by a straight line, potentially via multiple colinear segments.

The updated algorithm has an outer loop that alternately tries tackling the problem from A to B and from B to A. As soon as it finds or produces a connection, it terminates. If it fails to make progress from both directions, it exits out.

Attempting to make progress from A to B starts by finding the edge from A that is as close as possible in a counter-clockwise / right-hand sense to the desired segment AB:

We’ll call the other side of this segment C. If C = B, we are done, as then AB already exists.

It is possible that C lies along AB. When this happens, we replace A with C and return to the outer loop:

If we haven’t returned yet, then AC is counter-clockwise of AB. We need to flip CD, where D is on the other side of AB from C. If CD is not constrained, we can flip it.

The most interesting case occurs when CD is constrained. We cannot flip CD because it is fixed – we want it to remain an edge in the final mesh. So we instead introduce a new point E at the intersection of AB and CD, split CD at E, and add AE and EB:

We can see this in action in the TOOM mesh editor, updated with this method:

Here we have a constrained edge

Here we have to flip or intersect multiple edges

So far I’m just building a mesh with constrained edges matching those of the expanded collision polygons, and only have a single such mesh to reflect the player’s collision polygon. I do not yet track which polygon corresponds to which triangle, nor do edges otherwise have any special properties. Nevertheless, what I have is sufficient for basic 2D player movement.

The player (and any other entity) has a 2D position and a triangle face in which their position lies. The 2D position is continuous, but the triangle face is discrete. Having something discrete resolves the issue with floating point rounding – we know exactly which side of a line we are (supposed to be) on.

Note that the constrained Delaunay mesh does not actually contain the white collider polygons:

I’ll probably talk about this more next time, but movement physics currently get processed in two different ways. If the moving entity is airborne, then they move according to the standard laws of physics. If we collide with a solid edge, we lose all velocity going into it and then continue on in our new direction. Colliding with a solid edge can result in the agent transitioning to the supported state.

If the moving entity is standing or walking on a surface, i.e. is supported, we handle things a bit differently. Here their velocity is projected along the supporting surface. Any time they move past the end of the supporting surface, we find the next supporting edge (if there is one) and allow them to transition over. If they are moving fast enough or there is no supporting edge to connect to, they transition to being airborne.

The player walking across supporting surfaces

Conclusion

Constrained Delaunay meshes resolved many of the tricky issues that arose with tackling 2D player movement for arbitrary terrain geometry. Knowing which face a player is in is a discrete fact that removes the issues with numeric imprecision in general polygon-on-polygon collisions, without having to resort to shunting. It also naturally allows us to leverage the mesh to query things like the next supporting face.

The code for the updated Delaunay mesh is available here.

In the future I hope to support additional motion states, such as climbing on ladders and surfaces. I’ll want to add support for dropping down “through” a platform, wall slides and jumps, and some other basic sidescroller movement features. I want to add 3D GPU graphics, and with that, animations and combat. It turns out there is a lot that goes into a basic game.