Into Depth – Shadow Casting & Portals

A debug view showing shadow cast segments from the vantage point of the actor.

Last month we covered the unified action system, in which we built some basic things players can do with their heroes, and set the stage for fancier futures and actions for enemy agents. The most significant action was moving, which was built based on a GridView of what the actor sees from their current vantage point. As noted in that post, we eventually wanted to expand the grid view to include edge information, such as thin little walls and platforms, but also portals that act as non-Euclidean connections to other parts of the map.

Here we have a simple two-tile room with a loop-back portal on either end. The result – the hero can see a bunch of versions of themselves!

In the last post, I said I’d be adding other entities in and adding more actions, but I decided that the map representation was so fundamental that prioritizing getting these portals in was more important. Any changes there would affect most other systems, so I wanted to get that done first.

Grid Representation

Prior to this change, the underlying level geometry representation, the Grid, was just a 2D array of cells:

After this change, I wanted to represent both the cells as well as the edges between them:

This was achieved by adding two additional 2D arrays, one for the horizontal edges and one for the vertical edges. There are definitely other ways to do this, but I felt that (1) this was simple, and (2) there are differences between those edge types. A player might stand on a platform but have a door that they can open, for example.

The edges are single-bit entries, to keep things efficient. If we need more information, we can index into a separate array. I’m currently using the first two bits to identify the edge type, and the remaining 6 bits, if the edge is a portal, index into a Portal array. The Portal stores the information necessary to know which two edges it links.

This change has the immediate consequence that getting the cell neighbor in a given direction does not necessarily produce the Euclidean neighbor. Its second consequence is that cells no longer need to be solid or not — the edges between cells can instead be solid. This slightly simplifies cells.

Grid View Representation

The Grid View, introduced in the last post to represent the level geometry as seen by an actor from their vantage point, can now represent views where the same underlying cell might show up multiple times due to portal connections. Furthermore, we might have weird cases where one square cell might contain slices from different areas of the map:

The green arrow points at a view cell which, due to the portal (magenta), has two sections of different grid cells.

Before, the grid view was a simple 2D array with one cell per tile. Now, that is no longer the case.

The updated grid view is organized around ViewNodes, ShadowSegments, and ViewTransforms:

struct GridView {
    // The grid tile the view is centered on
    CellIndex center;

    // The first ViewNode in each view cell.
    // 0xFFFF if there is no view node in that cell.
    u16 head_node_indices[GRID_VIEW_MAX_X][GRID_VIEW_MAX_Y];

    u16 n_nodes;
    ViewNode nodes[GRID_VIEW_MAX_NODES];

    // Shadowcast sectors.
    u16 n_shadow_segments;
    ShadowSegment shadow_segments[GRID_VIEW_MAX_NODES];

    // The transforms for the view as affected by portals.
    u8 n_view_transforms;
    ViewTransform view_transforms[GRID_VIEW_MAX_TRANSFORMS];
};

Most of the time, a given cell in the grid view will contain a single cell in the underlying grid. However, as we saw above, it might see more than one. Thus, we can look up a linked list of ViewNodes per view cell, with an intrusive index to its child (if any). Each such node knows which underlying grid cell it sees, whether it is connected in each cardinal direction to its neighbor view cells, and what sector of the view it contains:

struct ViewNode {
    CellIndex cell_index;

    // Intrusive data pointer, indexing into the GridView nodes array,
    // to the next view node at the same ViewIndex, if any.
    // 0xFFFF if there is no next view node.
    u16 next_node_index;

    // Whether this view node is continuous with the view node in the
    // given direction.
    // See direction flags.
    u8 connections;

    // The sector this view node represents.
    Sector sector;

    // The stencil id for this view node.
    // Identical to the transform_index + 1.
    u8 stencil_id;
};

We can now represent cases like the motivating portal scenario:

I’ve given the region on the other side of the portal different tile backgrounds to make it more obvious.

In the last post, we constructed the grid view with breadth-first search (BFS) from the center tile. Now, since we’re working with polar segments that strictly expand outwards, we can actually get away with the even simpler depth-first search (DFS). Everything is radial, and the farther out we get, the narrower our beams are:

A debug view showing the final shadow cast segments.

The search logic is a bit more complicated given that we have to narrow down the segment every time we pass through a tile’s edge, we’re saving view nodes in the new intrusive linked list structure, and edge traversals have to check for portals. Solid edges terminate and trigger the saving of a shadow segment.

Portal traversals have the added complexity of changing physically where we are in the underlying grid. If we think of the underlying grid as one big mesh, we effectively have to position the mesh differently when rendering the stuff on the other side of the portal. That information is stored in the ViewTransform, and it is also given its own stencil id.

A debug view of the view nodes. We fully see many cells, but some cells we only partially see. The origin is currently split into four views in order for view segments to never exceed 180 degrees.

Above we can see an overlay of the resulting view nodes. There is a little sliver of a cell we can see for a passage off to the left, and there are the two passages, one with a portal, that we can see on the right. The platforms (little brown rectangles) do not block line of sight.

The Stencil Buffer

Now that we have the grid view, how do we render it? It is pretty easy to render a square cell, but do we now have to calculate the part of each cell that intersects with the view segment? We very much want to avoid that!

Clean rendering is accomplished via the stencil buffer, which is a feature in graphics programming that lets us mask out arbitrary parts of the screen when rendering:

We’ve already been using several buffers when rendering. We’re writing to the RGB color buffers, and 3D occlusion is handled via the depth buffer. The stencil buffer is just an additional buffer (here, one byte per pixel), that we can use to get a stenciling effect.

At one byte (8 bits) per pixel, the stencil buffer can at first glance only hold 8 overlapping stencil masks at once. However, with our 2D orthographic setup, our stencil masks never overlap. So, we can just write 255 unique values per pixel (reserving 0 for the clear).

We simply write the shadow segments to the stencil buffer, setting the buffer to the appropriate stencil ID:

The stencil buffer has four stencil masks – the red region in the main chamber, the little slice of hallway to the left, the green region through the portal leading back into the same room, and the yellow region through a different portal to an entirely different room.

Our rendering pipeline now needs to support masking with these stencil IDs. In January we covered the render setup, which involved six different types of render commands that the game could send to the backend:

  • render mesh – render a 3D mesh with lighting, already uploaded to the GPU
  • render local mesh – render a 3D mesh without lighting, sending vertices from the CPU
  • render quad – an old simple command to render a quad
  • render sprite – self-explanatory
  • render spritesheet sprite – an old version of render sprite
  • render panel – renders a panel with 9-slice scaling by sending vertices from the CPU
  • render text – renders font glyphs by sending quads from the CPU

Having three versions of everything – rendering without stencils, rendering with stencils, and rendering to the stencil buffer itself, would triple the number of command types. That is too much!

Furthermore, we want to order the render commands such that opaque things are drawn front to back, and transparent things are then drawn back to front, and finally drawing UI elements in order at the end. That would require sorting across command types, which seems really messy.

To achieve this all neatly, I collapsed everything sending vertices from the CPU to a unified command type, and have the backend fill the vertex buffer when the game creates the command:

You will notice that the game-side changed a bit. Rather than receiving a command struct and being responsible for populating all fields, I decided to move rendering to methods that expect everything you need to be passed in. This is less error prone, allows for different function signatures that populate what you need, and means I don’t actually need structs for the different command types if we’re directly converting to vertices.

Each unified command (everything but the mesh render calls) is simply:

// Determines the order in which things are rendered.
enum class PassType : u8 {
    OPAQUE = 0,
    TRANSLUCENT = 1,
    UI = 2,
};

// Determines whether and how a command uses a stencil.
enum class StencilRequirement : u8 {
    NONE  = 0, // Does not use stencils
    WRITE = 1, // Writes to the stencil buffer
    READ  = 2, // Reads from the stencil buffer
};

// Determines how the draw gets executed.
enum class DrawMode : u8 {
    ARRAYS,   // No EBO
    ELEMENTS, // With EB
};

struct RenderCommandUnlitTriangle {
    // The commands are sorted by the sort key prior to rendering, determining the order.
    // Bits 63-62 (2 bits): Pass Type (Opaque, Translucent, UI)
    // Bits 61-60 (2 bits): Stencil Requirement
    // Bits 59    (1 bit):  Draw Mode
    // Bits 58-51 (8 bits): Stencil ID (0 for none)
    // Bits 50-32  <reserved>
    // Bits 31-0  (32 bits)
    //   In Opaque:      front-to-back distance
    //   In Translucent: back-to-front distance
    //   In UI:          sequence ID
    u64 sort_key;

    // Draw Arguments
    u32 n_vertices; // Number of vertices to draw
    u32 i_vertex;   // Index of the start vertex
    u32 n_indices;  // Number of indices, if using ELEMENTS
    u32 i_index;    // Index of the start element index, if using ELEMENTS
};

Apparently, this is much closer to how modern game engines structure their render commands. There is one vertex buffer that we write into as commands are added by the game, and the command just keeps track of the appropriate range. The same is true for the index buffer, which in our case is pre-populated buffer for quad rendering (so each quad takes 4 vertices and 6 indices instead of 6 vertices):

The pass type, stencil requirement, and draw mode are then baked into a sort key that we can sort on in order to automagically get the commands in the order we want: opaque < transparent < UI, with no stencil < write to stencil < read from stencil, and then grouping by whether we use the elements (index) buffer or not. We even use the lower 32 bits for additional sorting, rendering opaque things front to back (since lighting is expensive and we can discard more pixels that way), rendering transparent things back to front (required to get correct results), and rendering UI elements in the order they are created.

Being able to call glBufferData once to send all of the vertices up once at the beginning is really nice. Before, we were writing vertices and sending just the few we needed for every command type each time.

The end result is significantly simpler backend rendering code. It can become even simpler if we eventually unify the mesh rendering with the unlit triangle rendering, but I’m holding off on that because those render commands have a bunch of uniform data and then I’m switching between shaders. I’m not currently rendering transparent meshes, and if I eventually do I’ll bite the bullet and further consolidate.

Conclusion

I am quite happy to have tackled the fundamental grid representation, which as we saw ended up affecting a lot of stuff. The new grid view now needs to be properly integrated into the movement action, and then we can actually start introducing other entities like ropes, buckets and relics. No promises on specifics! We’ll let the project dictate how it should evolve.

Into Depth – Unified Triangle Shader

Into Depth is the name of a little game project I have been working on for over a year now. Its roots go back even further, to the 2D platformer I had started working on two years ago. The platformer ended up teaching me a lot about OpenGL, which I have continued to use in a fairly low-level manner in this new project.

Concept art of stone ruins with multiple characters running around.

Concept image by ChatGPT.

In a nutshell, the game is a simplification of the 2D platformer born both from managing complexity and wanting something I could reasonably make multiplayer and play with my friends online. Into Depth is a turn-based 2D sidescroller on a grid. Kind of like Gloomhaven, but vertical:

A square grid with solid walls, a hero in one tile, a rope extending several tiles down from an entryway in the ceiling, and a key. Two edges are marked 'a'.

An early mockup from my Google slides design doc.

The team of heroes descends into dungeons (e.g., down the rope in the image above) with the goal of retrieving relics. Oftentimes these relics are heavy and require hauling to get them back to the surface. Collected relics can be equipped and used in other levels, and teams may revisit levels in order to get even deeper. That’s the rough idea.

There are additional details, of course. For example, I want non-Euclidean geometry to play a role. In the screen above, the two edges labeled ‘a’ are joined, such that the hero’s view is:

Similar to the previous image, but expanded such that looking through edge 'a' is possible.

The hero can see the key from two different vantage directions because of the portal.

This is made more interesting by having each hero only be able to see what is in their line of sight. Not only does that add an element of discovery and the need to juggle teammates’ perspectives, it also enables stealth mechanics.

An in-game screenshot centered on the hero, with shadow casting blocking things not in sight. An enemy is visible twice due to portal effects.

Solid geometry blocks the view of what is behind it. This image was from before I went orthographic.

I am writing this myself, for fun, in my spare time. My primary goal is to learn how things work, and to actually get something playable for myself and my friends.

I am using C++ and VS Code, on Linux. I try to minimize library use, but leverage SDL2 as my platform library, use GLAD for OpenGL, the GLM types, and several Sean Barrett STB libraries. This includes avoiding the standard libraries, such as std::vector and std::map, and I’m trying to use my own string view s8 as defined by Chris Wellons instead of std::string. I actually haven’t been using ImGui in this project, but I probably will once I have more of an editor.

Why these restrictions? Because I am a big fan of Casey Muratori, Jonathan Blow, and the do-it-yourself attitude of understanding your code. It is a great way to learn how it all works, and it is fun! Presumably it leads to smaller executables and faster code too.

The art you see here is programmer art, since I am a programmer. That is good enough for now. My long-term vision is to have a low-poly aesthetic and mostly not use sprites, since I should be able to author low-poly meshes myself. The scene will be 3D with effects like lighting, but will be rendered in orthographic since that just makes it so much easier to see and work with the grid.

Font Glyphs + Sprites

I was most recently working on some basic shaders for the game. I had just added font rendering with its own shader for rendering glyph quads. A UI system typically needs things beyond just font glyphs, like UI panels and sprite icons, and those are mostly just basic quads as well. It seemed obvious that I could share a shader for the dual uses of rendering font quads and rendering sprite quads:

A vertex consisting of pos, uv, and color. Uniforms are a proj_view matrix and a texture. A font glyph is defined as a 2-triangle quad with uv coordinates indexing into an atlas.

The shader is really simple. In the vertex shader, the position is just:

gl_Position = proj_view * vec4(aPosWorld, 1.0);

And then the fragment shader uses the input color as a componentwise multiplier after looking up the texture value via the UV coordinate:

vec4 tex_color = texture(u_texture, uv);
FragColor = color * texture_rgba;

The font atlas has white glyphs, which means this color multiplier produces fonts that look like we want. Colored sprites are typically rendered with color = [1,1,1,1] so that we don’t affect them, but you can also tint them by changing up the values.

Rendering quads is also straightforward, but when you’re doing it all yourself, there is still a fair amount of stuff to keep track of. Every glyph in the font has a packed quad that tells us its size and its UV coordinates in the font atlas. We write the quad vertices into a VBO, and rather than writing six vertices to get the two triangles per quad, we only write the four unique ones and use an EBO to define the quads by indexing into the vertex array (as is standard practice):

A two-triangle quad showing four vertices in a VBO and six indices in an EBO. This uses 4 vertices * 9 f32s = 4*36 = 144 bytes in the VBO and 6*4 = 24 bytes in the EBO.

Each frame, the backend fills up a vertex buffer and sends it to the VBO for rendering. This only needs to happen once, if the sprites and the font glyphs are all packed into the same texture and everything uses the same projection and view matrix.

Render Commands

The core game code does not directly fill the vertex buffer. Instead the core game code issues more simple render commands. These commands are simple, higher-level descriptions of what we want rendered, making the core game code fast and separating out the task of deciding how to render everything up to the backend.

A simple flowchart showing Core::Tick() followed by Backend::Render().

For example, a render text command is:

struct RenderCommandTextLine {
    FontHandle font_handle;
    glm::vec2 pos;       // Bottom-left of the first glyph
    f32 letter_size;     // Multiplier off of default
    f32 letter_spacing;  // Multiplier off of default
    s8 text;             // Must last until rendered
    glm::vec4 color;     // RGBA componentwise color multipliers
};

It is up to the backend to convert that higher-level specification into the actual quad vertices, moving that non-trivial logic out of the core game loop and into its own location. The render command thus acts as an API between the game and the backend. Any new backend that we support in the future must be able to render these things.

In the future, when I might have partially transparent objects, the backend can do more sophisticated things like render all opaque geometry first and then sort the transparent objects by depth and render them in a second pass. Something like that is very hard to do when the core logic is rendering as it goes.

Including Triangles

So far, we have a shader that can render both font glyphs and sprites. That’s nice, but I want Into Depth to have a low-poly, flat-color aesthetic. I wanted to add UI panels that underlay the text, and I want those to be tessellated:

A text box backed by a panel consisting of a solid quad surounded by a low-poly border.

A panel like that is comprised of a relatively small number of triangles when compared to a modern 3D character mesh (~40 vs. 1000’s). It should be cheap enough for us to directly write that geometry out every frame.

Our font + sprite shader can actually be used to draw basic triangles, we just need to point all three vertices of the triangle to all-white uv coordinates so that we get the color we want. This is achieved by including a \(1 \times 1\) white sprite.

I thus define some additional render commands, such as RenderCommandPanel, that act as high-level descriptions of what should be rendered, and then have the backend construct the mesh live in the vertex buffer and send it over. Doing this does not use the quad-specific EBO, so for now I’m just using a glDrawArrays call instead of a glDrawElements call.

I did make one adjustment to the shader – rather than switching out the textures depending on my use, I just have an array of textures that I can leave bound and have each vertex declare which one its UV coordinate applies to:

The shader now includes `int texture_index` for each vertex and the uniforms now include an array of textures.

I pack my fonts into a font atlas that is uploaded as the first texture, then have a series of sprite atlases.

Sprite Packing

I need to bake my font glyphs and my sprites into a series of textures that my shader can then use. There are plenty of tools that do sprite packing for you. I was already using stb_truetype and stb_rect_pack to create my font atlas, so I decided to write my own sprite packer on top of stb_rect_pack.

A collection of rectangular images on the left are closely packed into a texture on the right.

Conceptually, it is really simple. I give it a set of individual sprites, and it keeps trying to pack as many as possible into a new texture until everything is packed. The only little gotcha is that a \(w \times h\) sprite is packed as a \(w+2 \times h+2\) rectangle in order for it to be padded with a 1-pixel extended border to fight edge bleeding caused by inaccuracies in UV coordinates between mipmap levels.

Shader Code Organization

Beyond assets, shaders themselves tend to more complexity than you’d think because the code to use them ends up spread all over the codebase. The vertex and fragment shader definitions are often two separate .glsl files, which live in their own folder alongside the other shaders. The C++ side typically needs a vertex struct defined somewhere, and then we need both CPU-side vertex buffers and references to the GPU-side shader program IDs, VAO, VBO, and EBOs. Whenever we use a shader, we need to activate it, which involves more code that has to use these references and set various shader uniforms.

I’ve adopted a few enhancements to reduce the degree of code spread.

The first thing I did was unify the vertex and fragment shaders into a single .glsl file. Since we just read in the file and upload it to the GPU before calling glCompileShader, we can actually process out the necessary portions ourselves. Writing both shaders in the same file helps keep their interdependencies consistent.

#version 330 core

#ifdef VERTEX_SHADER

layout (location = 0) in vec3 aPosWorld;
layout (location = 1) in vec4 aVertexColor; // rgba
layout (location = 2) in vec2 aTexCoords;

out vec4 vertex_color;
out vec2 tex_coords;

uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;

void main()
{
gl_Position = projection * view * model * vec4(aPosWorld, 1.0);
vertex_color = aVertexColor;
tex_coords = aTexCoords;
}

#endif

//////////////////////////////////////////////////////////////////////////

#ifdef FRAGMENT_SHADER

in vec4 vertex_color;
in vec2 tex_coords;

out vec4 FragColor;

uniform sampler2D atlas;

void main()
{
float atlas_value = texture(atlas, tex_coords).r;
FragColor = vec4(vertex_color.rgb, vertex_color.a * atlas_value);
}

#endif

This was nice, but I actually ended up further simplifying by having a single .h file that defines the shader data relevant to the CPU side. The vertex and fragment shaders are still defined in one place, but as explicit strings, and we additionally define a vertex layout:

#pragma once

#include "util/strings.h"
#include "util/vertex_layout.h"

constexpr int kNumUnlitTriangleShaderTextures = 8;

// The vertex data structure (CPU-side)
struct UnlitTriangleVertex {
    glm::vec3 pos;
    glm::vec4 rgba;
    glm::vec2 uv;
    int texture_index;
};

// The shader definition
struct UnlitTriangleShaderDefinition {
    static constexpr const char* kName = "UnlitTriangle";

    static constexpr VertexElement kVertexElements[4] = {
        {ShaderDataType::Vec3F, "aPosWorld", false, offsetof(UnlitTriangleVertex, pos          )},
        {ShaderDataType::Vec4F, "aRGBA",     false, offsetof(UnlitTriangleVertex, rgba         )},
        {ShaderDataType::Vec2F, "aUV",       false, offsetof(UnlitTriangleVertex, uv           )},
        {ShaderDataType::I32,   "aTexIndex", false, offsetof(UnlitTriangleVertex, texture_index)},
    };

    static constexpr VertexLayout kVertexLayout = {
        kVertexElements,
        sizeof(kVertexElements) / sizeof(kVertexElements[0]),
        sizeof(UnlitTriangleVertex)
    };

    static constexpr const char* kVertexShaderSource = R"(
        #version 330 core

        layout (location = 0) in vec3 aPosWorld;
        layout (location = 1) in vec4 aRGBA;
        layout (location = 2) in vec2 aUV;
        layout (location = 3) in int  aTexIndex;

             out vec4 rgba;
             out vec2 uv;
        flat out int  texture_index; // 'flat' means do not interpolate

        uniform mat4 proj_view; // proj * view

        void main()
        {
            gl_Position = proj_view * vec4(aPosWorld, 1.0);
            rgba = aRGBA;
            uv = aUV;
            texture_index = aTexIndex;
        }
    )";

    static constexpr const char* kFragmentShaderSource = R"(
        #version 330 core

             in vec4 rgba;
             in vec2 uv;
        flat in int  texture_index;

        out vec4 FragColor;

        uniform sampler2D textures[8];

        void main()
        {
            vec4 texture_rgba;
            
            // The 'switch trick' is faster/safer on some drivers.
            switch(texture_index) {
                case 0:  texture_rgba = texture(textures[0], uv); break;
                case 1:  texture_rgba = texture(textures[1], uv); break;
                case 2:  texture_rgba = texture(textures[2], uv); break;
                case 3:  texture_rgba = texture(textures[3], uv); break;
                case 4:  texture_rgba = texture(textures[4], uv); break;
                case 5:  texture_rgba = texture(textures[5], uv); break;
                case 6:  texture_rgba = texture(textures[6], uv); break;
                case 7:  texture_rgba = texture(textures[7], uv); break;
                default: texture_rgba = vec4(1.0,0.0,1.0,1.0); break; // magenta
            }
            
            FragColor = rgba * texture_rgba;
        }
    )";
};

The vertex layout is just a programmatic representation of the inputs to the Vertex shader. This avoids having to have VBO-creation code that lives elsewhere depend on this shader. We can go from code like this:

// vertex positions
u8 attribute_index = 0;
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, position));
// vertex normals
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, normal));
// uv coordinates
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 2, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, uv));
// bone ids (max 4)
glEnableVertexAttribArray(attribute_index);
glVertexAttribIPointer(attribute_index++, 4, GL_INT, sizeof(Vertex),
                       (void*)offsetof(Vertex, bone_ids));
// bone weights (max 4)
glEnableVertexAttribArray(attribute_index);
glVertexAttribPointer(attribute_index++, 4, GL_FLOAT, GL_FALSE, sizeof(Vertex),
                      (void*)offsetof(Vertex, bone_weights));

to just running:

for (u32 i = 0; i < vertex_layout.n_elements; i++) {
    SetUpVertexAttrib(i, vertex_layout.elements[i], vertex_layout.stride);
}

where SetUpVertexAttrib looks at the element type and executes the appropriate logic.

void SetUpVertexAttrib(const u8 attribute_index, const VertexElement& element, const size_t stride) {
    glEnableVertexAttribArray(attribute_index);
    
    switch (element.type) {
    case (ShaderDataType::F32):
        glVertexAttribPointer(attribute_index, 1, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::Vec2F):
        glVertexAttribPointer(attribute_index, 2, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::Vec3F):
        glVertexAttribPointer(attribute_index, 3, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::Vec4F):
        glVertexAttribPointer(attribute_index, 4, GL_FLOAT, GL_FALSE, stride, (void*)element.offset);
        break;

    case (ShaderDataType::I32):
        glVertexAttribIPointer(attribute_index, 1, GL_INT, stride, (void*)element.offset);
        break;
    }
}

My OpenGL backend has the general SetUpVertexAttrib, but it doesn’t have to change if we change the shader. The backend additionally has structs for the GPU object references, the shader uniforms, and their locations:

struct UnlitTriangleShaderRefs {
    GLuint shader_program_id;

    GLuint vao; // vertex array object
    GLuint vbo; // vertex buffer object
    size_t vbo_tri_capacity;  // Number of triangles we have space for in the VBO.

    GLuint ebo_quads; // prepopulated with quad indices
};

struct UnlitTriangleShaderUniforms {
    GLint uniformloc_proj_view; // proj * view from glGetUniformLocation
    GLint uniformloc_textures;

    GLuint texture_ids[kNumUnlitTriangleShaderTextures];
};

This organization scheme has, so far, been much nicer to work with.

Conclusion

When you are writing everything yourself, there is a lot to keep track of. Most of my previous attempts at coding a game “from scratch” struggled under mounting complexity. Unifying font, sprite, and basic unlit geometry rendering under one shader helps alleviate that complexity. Even more so, having good separations of concern between the core game logic and the backend lets the game logic just have simple references to assets and focus on the game, and the backend figure out how to fill vertex buffers and ship them to the GPU.

Happy New Year!

Impulse Responses

This post expands on the previous post, which looked at collision detection and resolution for 2d polygons. By the end we were able to apply impulses to colliding bodies to properly get them to bounce and spin off of one another. Not only will we take it further with friction impulses and joints, but we’ll pay special attention to how these impulses are derived to emphasize that this line of reasoning is a tool that can be extended to new applications.

There is more math in this post than normal because I want to both provide a resource that I myself could have benefited from and show that this stuff can all be derived.

An impulse \(\boldsymbol{j}\) is an instantaneous change in momentum. They are very similar to forces, which influence velocity over time:

\[\boldsymbol{f} = m \boldsymbol{a} \quad \Rightarrow \quad \boldsymbol{v}’ \approx \boldsymbol{v} + \frac{1}{m} \boldsymbol{f} \Delta t\]

An impulse applied to a rigid body at a point \(\boldsymbol{r}\) relative to the body’s center of mass similarly affects its velocity, but without the need for the timestep:

\[\boldsymbol{v}’ = \boldsymbol{v} + \frac{1}{m} \boldsymbol{j}\]

This impulse also effects the angular velocity:

\[\boldsymbol{\omega}’ = \boldsymbol{\omega} + \boldsymbol{I}^{-1} \boldsymbol{r} \times \boldsymbol{j}\]

This update models the angular velocity in three dimensions, treating it as a vector. We can work out the equivalent 2d update where we just keep track of a scalar \(\omega\):

\[\begin{align}\boldsymbol{r} \times \boldsymbol{j} = \begin{bmatrix} r_x \\ r_y \\ 0\end{bmatrix}\end{align} \times \begin{bmatrix}j_x \\ j_y \\ 0\end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ r_x j_y – r_y j_x \end{bmatrix}\]

If we assume our inertia tensor is diagonal, and just pull out the z entry \(I_{zz}\), we get:

\[\omega’ = \omega + \frac{1}{I_{zz}} \left(r_x j_y – r_y j_x\right)\]

From here on out, if you see a scalar \(I\), it refers to \(I_zz\).

The Normal Impulse

In the last blog post, we used an impulse along the impact normal to cause two colliding rigid bodies to bounce off each other. We’re going to cover the same thing, but derive it instead of just take the result for granted.

We have two bodies, A and B, and a contact point located at \(\boldsymbol{r}_A\) relative to body A’s center and at \(\boldsymbol{r}_B\) relative to body B’s center:

The velocity of the contact points on each body as they enter the collision is:

\[\begin{align}\boldsymbol{v}_{cA} & = \boldsymbol{v}_A + \boldsymbol{\omega}_A \times \boldsymbol{r}_A \\ \boldsymbol{v}_{cB} &= \boldsymbol{v}_B + \boldsymbol{\omega}_B \times \boldsymbol{r}_B \end{align}\]

In 2D that amounts to:

\[\boldsymbol{v}_{cA} = \begin{bmatrix}v_{A,x} \\ v_{A,y} \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ \omega_A\end{bmatrix} \times \begin{bmatrix} r_{A,x} \\ r_{A,y} \\ 0\end{bmatrix} = \begin{bmatrix}v_{A,x} \\ v_{A,y} \\ 0\end{bmatrix} + \begin{bmatrix}-\omega r_{A,y} \\ \hphantom{-}\omega r_{A,x} \\ 0\end{bmatrix} \]

One can do this stuff in 3D and then drop or ignore the \(z\)-axis, but in my code I just stick with 2D:

Vec2f r; // radius
Vec2f v; // linear velocity vector
f32 w;   // angular velocity
Vec2f v_c = v + w * Rotr(r);

The contact velocity — how fast the contact points collide — is the relative speed of the contact points:

\[\boldsymbol{v}_\text{rel} = \boldsymbol{v}_{cB} \> – \boldsymbol{v}_{cA}\]

To derive the impulse, we stipulate two things. First, that the impulse lies in the normal direction:

\[\boldsymbol{j} = j \hat{\boldsymbol{n}}\]

Second, that the relative velocity along the normal direction reverses sign and changes by a factor \(e\):

\[\boldsymbol{v}_\text{rel}’ \cdot \hat{\boldsymbol{n}} = \> – e \> \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}}\]

This factor \(e\) is called the restitution. Setting it to zero produces a perfectly inelastic collision (because the resulting relative velocity in the normal direction is zero), whereas setting it to 1 produces a perfectly elastic collision that preserves the kinetic energy. Most objects use a value somewhere in-between.

We now have everything we need to derive the impulse. Let’s expand the left-hand side of the restitution equation:

\[\begin{align}\boldsymbol{v}_\text{rel}’ \cdot \hat{\boldsymbol{n}} \\ \left(\boldsymbol{v}_{cB}’ \> – \boldsymbol{v}_{cA}’\right) \cdot \hat{\boldsymbol{n}} \\ \left[\left(\boldsymbol{v}_B’ + \boldsymbol{\omega}_B’ \times \boldsymbol{r}_B\right) – \left(\boldsymbol{v}_A’ + \boldsymbol{\omega}_A’ \times \boldsymbol{r}_A\right)\right] \cdot \hat{\boldsymbol{n}} \\ \left[\left(\left[\boldsymbol{v}_B + \frac{1}{m_B}\boldsymbol{j}\right] + \left[\boldsymbol{\omega}_B + \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \boldsymbol{j}\right] \times \boldsymbol{r}_B\right) – \left(\left[\boldsymbol{v}_A – \frac{1}{m_A}\boldsymbol{j}\right] + \left[\boldsymbol{\omega}_A – \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \boldsymbol{j}\right] \times \boldsymbol{r}_A\right)\right] \cdot \hat{\boldsymbol{n}} \\ \left[\left(\left[\boldsymbol{v}_B + j \frac{1}{m_B}\hat{\boldsymbol{n}}\right] + \left[\boldsymbol{\omega}_B + j \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_B\right) – \left(\left[\boldsymbol{v}_A – j \frac{1}{m_A}\hat{\boldsymbol{n}}\right] + \left[\boldsymbol{\omega}_A – j \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_A\right)\right] \cdot \hat{\boldsymbol{n}} \\ \left[\boldsymbol{v}_\text{rel} + j \left(\frac{1}{m_B}\hat{\boldsymbol{n}} + \left[ \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_B + \frac{1}{m_A}\hat{\boldsymbol{n}} + \left[ \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_A\right)\right] \cdot \hat{\boldsymbol{n}} \\ \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}} + j \left(\frac{1}{m_A}\hat{\boldsymbol{n}} + \frac{1}{m_B}\hat{\boldsymbol{n}} + \left[ \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_A + \left[ \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_B\right) \cdot \hat{\boldsymbol{n}} \\ \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}} + j \left(\frac{1}{m_A}\hat{\boldsymbol{n}}\cdot \hat{\boldsymbol{n}} + \frac{1}{m_B}\hat{\boldsymbol{n}}\cdot \hat{\boldsymbol{n}} + \left[ \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_A \cdot \hat{\boldsymbol{n}} + \left[ \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_B \cdot \hat{\boldsymbol{n}}\right) \\ \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}} + j \left(\frac{1}{m_A} + \frac{1}{m_B} + \left[ \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_A \cdot \hat{\boldsymbol{n}} + \left[ \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_B \cdot \hat{\boldsymbol{n}}\right) \end{align}\]

We then equate this to \( – e \> \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}}\) and solve for the impulse scalar \(j\), yielding:

\[j = \frac{-(1+e) \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}} }{ \frac{1}{m_A} + \frac{1}{m_B} + \left(\left[ \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_A + \left[ \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \hat{\boldsymbol{n}}\right] \times \boldsymbol{r}_B\right) \cdot \hat{\boldsymbol{n}} }\]

This is the equation that Wikipedia gives us.

The equation I gave in the last blog post looks a little different:

\[j = \frac{-(1+e)\boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}}}{\frac{1}{m_A} + \frac{1}{m_B} + \frac{(\boldsymbol{r}_A \times \boldsymbol{n})^2}{I_A} + \frac{(\boldsymbol{r}_B \times \boldsymbol{n})^2}{I_B}}\]

It only differs in the terms involving the moment of inertia.

Let’s expand the Wikipedia version:

\[\begin{align}\left[\left(\boldsymbol{I}^{-1} \boldsymbol{r} \times \hat{\boldsymbol{n}}\right) \times \boldsymbol{r}\right] \cdot \hat{\boldsymbol{n}} &= \left(\left(\frac{1}{I}\begin{bmatrix}r_x \\ r_y \\ 0\end{bmatrix} \times \begin{bmatrix} \hat{n}_x \\ \hat{n}_y \\ 0 \end{bmatrix}\right) \times \begin{bmatrix}r_x \\ r_y \\ 0\end{bmatrix}\right) \cdot \begin{bmatrix}\hat{n}_x \\ \hat{n}_y \\ 0\end{bmatrix} \\ &= \left(\frac{1}{I}\begin{bmatrix}0 \\ 0 \\ r_x \hat{n}_y – r_y \hat{n}_x\end{bmatrix} \times \begin{bmatrix}r_x \\ r_y \\ 0\end{bmatrix}\right) \cdot \begin{bmatrix}\hat{n}_x \\ \hat{n}_y \\ 0\end{bmatrix} \\ &= \frac{1}{I}\begin{bmatrix}-r_x r_y \hat{n}_y + r_y^2 \hat{n}_x \\ r_x^2 \hat{n}_y – r_x r_y \hat{n}_x \\ 0 \end{bmatrix} \cdot \begin{bmatrix}\hat{n}_x \\ \hat{n}_y \\ 0\end{bmatrix} \\ &= \frac{1}{I}\left( r_x^2 \hat{u}_y^2 – 2 r_x r_y \hat{u}_x \hat{u}_y + r_y^2 \hat{u}_y^2 \right) \end{align}\]

We get the same thing if we expand the other version:

\[\begin{align}\frac{1}{I}\left(\boldsymbol{r} \times \hat{\boldsymbol{n}}\right)^2 &= \frac{1}{I}\left(r_x \hat{n}_y – r_y \hat{n}_x\right)^2 \\ &= \frac{1}{I}\left( r_x^2 \hat{u}_y^2 – 2 r_x r_y \hat{u}_x \hat{u}_y + r_y^2 \hat{u}_y^2 \right)\end{align}\]

The two formulations are equal, but assume that we’re working in 2D and that the moment of inertia tensor is diagonal with every entry equal to \(I\). The Wikipedia version is more general, and can handle 3D collisions and general inertia tensors.

The normal impulse \(j\) in 2D is thus

\[j = \> -(1+e) \> m_\text{eff} \> \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}} \]

where the effective mass is

\[m_\text{eff} = \left(\frac{1}{m_A} + \frac{1}{m_B} + \frac{(\boldsymbol{r}_A \times \hat{\boldsymbol{n}})^2}{I_A} + \frac{(\boldsymbol{r}_B \times \hat{\boldsymbol{n}})^2}{I_B}\right)^{-1}\]

After calculating it, we apply an impulse \(j \hat{\boldsymbol{n}}\) to body B and the negated impulse to body A. Note that we only apply an impulse if the relative velocity is causing the bodies to move into contact (\(\boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{n}} < 0\)), otherwise the contact point already has a separating velocity.

The restitution \(e\) will depend on the colliding bodies. One convenient way to get it is to assign an restitution to each rigid body, and then use the maximum between the two for the collision. That allows bounciness to win out:

\[e = \max\left(e_A, e_B\right)\]

Friction Impulse

The normal impulse takes care of the relative penetration speed, but does nothing in the tangent direction along the contact line. We typically want shapes that slide along each other to have friction — that is what the friction impulse, or tangent impulse, is for.

Wikipedia and other websites will at this point show you a fancy diagram for Coulomb friction depicting the friction cone. I don’t find that to be particularly illuminating.

Instead, I simply like to think about Coulomb friction as:

  • compute a tangent impulse that causes the relative tangent velocity to be zero (\(e=0\))
  • if the required tangent impulse is sufficiently small, apply it (static friction)
  • if the required tangent impulse is too large, cap it out and apply that instead (dynamic friction)

Let’s derive the update ourselves again.

The friction impulse acts in the tangent direction \(\hat{\boldsymbol{t}}\), which we can get by rotating the normal. We once again have a scalar impulse:

\[j = j \hat{\boldsymbol{t}}\]

We compute the tangent that causes a zero relative tangent velocity (\(e = 0\)).

These two conditions are structurally the same as the normal impulse, except \(e\) is fixed. We can directly apply our previous equations to get:

\[j = \> – m_\text{eff} \> \boldsymbol{v}_\text{rel} \cdot \hat{\boldsymbol{t}}\]

where the effective mass is now in the tangent direction:

\[m_\text{eff} = \left(\frac{1}{m_A} + \frac{1}{m_B} + \frac{(\boldsymbol{r}_A \times \hat{\boldsymbol{t}})^2}{I_A} + \frac{(\boldsymbol{r}_B \times \hat{\boldsymbol{t}})^2}{I_B}\right)^{-1}\]

As long as this impulse is small enough (in magnitude), we apply it in order to prevent any tangential sliding. If it is large enough, we switch from static friction to dynamic friction:

\[j = \begin{cases} j & \text{if } |j| \leq j_\text{static} \\ j_\text{dynamic} & \text{otherwise} \end{cases}\]

Easy peasy.

The static and dynamic friction values are typically multiples of the normal impulse. We multiply the normal impulse by \(\mu_\text{static}\) to get the static friction impulse threshold and by \(\mu_\text{dynamic}\) to get the dynamic friction impulse threshold. Like the restitution, these are properties of each body, and we can obtain values to use for a pair of bodies by combining their values. Here it is recommended to use the square root of the product, which lets extremely low-friction (slippery) values dominate:

\[\mu_\text{static} = \sqrt{\vphantom{P}\mu_{A,\text{static}} \cdot \mu_{B,\text{static}}}\]

Revolute Joints

There are other things we can do with impulses. One of the main ones is to model attachment points between rigid bodies. These revolute joints allow rotation, but force the bodies to maintain a constant distance vector to the joint:

Revolute joints are useful for all sorts of things, including swinging:

I didn’t have a lot of luck finding information on how to enforce revolute joints between rigid bodies in 2D. Thankfully, we can derive the necessary impulse ourselves.

The property we want to enforce is that the contact velocity at the joint after the impulse be zero:

\[\boldsymbol{v}_\text{rel}’ = \boldsymbol{0}\]

We can expand that out:

\[\begin{align}\boldsymbol{v}_\text{rel}’ &= \boldsymbol{0} \\ \boldsymbol{v}_{cB}’ \> – \boldsymbol{v}_{cA}’ &= \boldsymbol{0} \\ \left(\boldsymbol{v}_B’ + \boldsymbol{\omega}_B’ \times \boldsymbol{r}_B\right) – \left(\boldsymbol{v}_A’ + \boldsymbol{\omega}_A’ \times \boldsymbol{r}_A\right) &= \boldsymbol{0} \\ \left(\left[\boldsymbol{v}_B + \frac{1}{m_B}\boldsymbol{j}\right] + \left[\boldsymbol{\omega}_B + \boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \boldsymbol{j}\right] \times \boldsymbol{r}_B\right) – \left(\left[\boldsymbol{v}_A – \frac{1}{m_A}\boldsymbol{j}\right] + \left[\boldsymbol{\omega}_A – \boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \boldsymbol{j}\right] \times \boldsymbol{r}_A\right) &= \boldsymbol{0} \\ \boldsymbol{v}_\text{rel} + \left(\frac{1}{m_B}\boldsymbol{j} + \left[\boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \boldsymbol{j}\right] \times \boldsymbol{r}_B\right) + \left(\frac{1}{m_A}\boldsymbol{j} + \left[\boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \boldsymbol{j}\right] \times \boldsymbol{r}_A\right) &= \boldsymbol{0} \\ \boldsymbol{v}_\text{rel} + \left(\frac{1}{m_A} + \frac{1}{m_B}\right)\boldsymbol{j} + \left(\boldsymbol{I}_A^{-1} \boldsymbol{r}_A \times \boldsymbol{j}\right) \times \boldsymbol{r}_A + \left(\boldsymbol{I}_B^{-1} \boldsymbol{r}_B \times \boldsymbol{j}\right) \times \boldsymbol{r}_B &= \boldsymbol{0}\end{align}\]

If we work out the cross products, then we end up with the following system of equations:

\[\begin{bmatrix}\frac{1}{m_A} + \frac{1}{m_B} + \frac{1}{I_A}r_{Ay}^2 + \frac{1}{I_B}r_{By}^2 & -\left(\frac{1}{I_A}r_{Ax}r_{Ay} + \frac{1}{I_B}r_{Bx}r_{By}\right) \\ -\left(\frac{1}{I_A}r_{Ax}r_{Ay} + \frac{1}{I_B}r_{Bx}r_{By}\right) & \frac{1}{m_A} + \frac{1}{m_B} + \frac{1}{I_A}r_{Ax}^2 + \frac{1}{I_B}r_{Bx}^2\end{bmatrix}\begin{bmatrix} j_x \\ j_y\end{bmatrix} = \begin{bmatrix}-v_{\text{rel},x} \\ -v_{\text{rel},y}\end{bmatrix}\]

We can write a simple function that can solve such a \(2 \times 2\) linear system:

// Solve a 2x2 system of linear equations
// [a b][x1] = [e]
// [c d][x2] = [f]
// This method assumes that it will succeed, which
// requires that the matrix be invertible.
template <typename T>
void Solve(T a, T b, T c, T d, T e, T f, T& x1, T& x2) {
    T det = d * a - c * b;
    x2 = (a * f - c * e) / det;
    x1 = (e - b * x2) / a;
}

and then use that to solve for our impulse.

// Compute the effective mass components.
f32 m_eff_11 = inv_m1 + inv_m2 + inv_I1 * r1.y * r1.y + inv_I2 * r2.y * r2.y;
f32 m_eff_12 = -(inv_I1 * r1.x * r1.y + inv_I2 * r2.x * r2.y);
f32 m_eff_22 = inv_m1 + inv_m2 + inv_I1 * r1.x * r1.x + inv_I2 * r2.x * r2.x;

// Solve for the impulse:
//  [m_eff_11   m_eff_12][j.x] = [-v_rel.x]
//  [m_eff_12   m_eff_22][j.y] = [-v_rel.y]
Vec2f j;
Solve(m_eff_11, m_eff_12, m_eff_12, m_eff_22, -v_rel.x, -v_rel.y, j.x, j.y);

We can then apply the resulting impulse to both bodies and perform a positional correction. I weight the correction such that the target joint location is biased toward the heavier object (that way joints on infinite-mass objects never move).

Conclusion

This post set the mathematical foundation for impulses, then derived the normal, friction, and revolute joint impulses. In each case, we looked at what was being preserved, formulated that mathematically, and massaged the equations to back out the impulse. While the equations were long at times, they did not lead us astray and we were able to find our way to some solid physics.

2D Collision Detection and Resolution

This post is an update on the sidescroller project. In short, I still find the project interesting, and as long as that is true, I want to work toward my original goal of a 2D sidescroller with arbitrary level geometry (i.e., not based on tiles or AABBs). I’ve achieved my stated goal of learning how to use the GPU and roll my own skeletal animation system, but I obviously haven’t built a game. A full game was, and still is, out of the question, but it would be cool to have something with a game loop and perhaps a complete level and a boss.

Fleshing out the game further requires fleshing out more of the game-play design. As such, physics must happen.

This blog post covers how I’m going about collision detection and resolution for convex polygons.

The Problem

We wish to simulate objects moving around in a 2D space. While we might eventually have fancy objects like ropes, let’s start by restricting ourselves to convex polygons:

A crude rendition of a 2D platformer scene (left) and the convex polygons in it (right).

As we can see, nonconvex polygons can be represented by multiple convex polygons, so this isn’t all that restrictive.

Most of the polygons will be static. The floors and ceilings, for the most part, do not move. Entities like the player, monsters, eventual ropes, crates, etc. will need to move. That movement needs to obey the laws of physics, which mainly means integrating accelerations into velocities, velocities into positions, and handling collisions.

More formally, we have a scene comprised of \(m\) rigid bodies, some of which are static and some of which are dynamic. Each rigid body is defined by a convex polygon and a 2D state. Each convex polygon is represented as a list of vertices in counter-clockwise (right-hand) order:

More serious implementations might include additional information, like precomputed edge normals or an enclosing axis-aligned box for broad-phase collision detection, but let’s leave those optimizations out for now. And yes, I could have used an std::vector but I like having trivially copyable types since I’m serializing and deserializing my levels to disk and I don’t want to think too hard when doing that.

I made the simplification of keeping track of static bodies separately from dynamic rigid bodies. For the static bodies, the polygon vertices are directly defined in the world frame. For the dynamic bodies, the polygon vertices are defined in the body frame of the entity, and are then transformed according to the entity’s state:

A vertex \(\boldsymbol{v}\) in the body frame of a polygon can be transformed into the world frame according to:

\[\begin{bmatrix}v_x \\ v_y\end{bmatrix}_\text{world} = \begin{bmatrix}\cos \theta & -\sin \theta \\ \sin \theta & \hphantom{-}\cos \theta\end{bmatrix}\begin{bmatrix}v_x \\ v_y\end{bmatrix}_\text{body} + \begin{bmatrix}p_x \\ p_y\end{bmatrix}\]

As the physics evolve, these states change, but the polygons do not. In every frame, we’ll accumulate forces and torques on our bodies and use them to update the state. We’ll also have to perform collision detection and handle any resulting forces and torques.

State Propagation

We need to simulate the laws of motion. That is, solve \(F = m \> a\) and compute

\[\begin{matrix}v(t + \Delta t) = v(t) + \int a(t) \> dt \\ p(t + \Delta t) = p(t) + \int v(t) \> dt \end{matrix}\]

except in 2 dimensions with torques and rotations.

We can’t do exact integration, so we’ll have to rely on approximations. Games tend to run at pretty high frame rates, so its reasonable to approximate the update across a single frame using the Euler method:

\[\begin{align} u(t + \Delta t) &= u(t) + a_x(t) \Delta t \\ v(t + \Delta t) &= v(t) + a_y(t) \Delta t \\ \omega(t + \Delta t) &= \omega(t) + \alpha(t) \Delta t \\ x(t + \Delta t) &= x(t) + u(t) \Delta t \\ y(t + \Delta t) &= y(t) + v(t) \Delta t \\ \theta(t + \Delta t) &= \theta(t) + \omega(t) \Delta t\end{align} \]

or written a bit more compactly:

\[\begin{align} u’ &= u + a_x \Delta t \\ v’ &= v + a_y \Delta t \\ \omega’ &= \omega + \alpha \Delta t \\ x’ &= x + u \Delta t \\ y’ &= y + v \Delta t \\ \theta’ &= \theta + \omega \Delta t\end{align} \]

Apparently, doing this would work okay, but using symplectic Euler integration is a bit more stable. It uses the updated velocities for the position update:

\[\begin{align} u’ &= u + a_x \Delta t \\ v’ &= v + a_y \Delta t \\ \omega’ &= \omega + \alpha \Delta t \\ x’ &= x + u’ \Delta t \\ y’ &= y + v’ \Delta t \\ \theta’ &= \theta + \omega’ \Delta t\end{align} \]

Our physics system will perform this update for every dynamic body in every game tick. After doing so, it needs to handle collisions.

Properly handling collisions is a whole big field and there is potentially a lot to talk about. In an ideal sense, if we were to identify a collision, we’d back up in time to where the collision first occurred, and handle it there. That’s more accurate, but also complicated. In a scene with a bunch of objects and a bunch of collisions, we don’t want to be sorting by a bunch of collision times.

Instead, I’ll be taking inspiration from Verlet integration and enforce constraints via iteration. A single tick of the physics system becomes:

void PhysicsEngine::Tick(f32 dt) {
    // Euler integration
    for (auto& rigid_body : rigid_bodies_) {
        EulerIntegrationStep(&rigid_body.state, dt);
    }

    for (int i = 0; i < 4; i++) {
        EnforceConstraints();
    }
}

We’re running Euler integration once, then running several iterations of constraint enforcement! This iterative approach helps things shake out because oftentimes, which you resolve one constraint, you start violating another. Its a pretty messy way to do things, but its so simple to implement and can be surprisingly robust.

What constraints will we enforce? For now we’re only going to prevent polygons from interpenetrating. In general we can enforce a bunch of other types of constraints, most notably joints between bodies.

Collision Detection

Alright, all that pretext to get to the meat and potatoes of this post.

In order to prevent our rigid bodies from interpenetrating, we need to detect when they collide:

We can do this in a pairwise manner, only considering the world-space polygons of two rigid bodies at once.

We start by observing that two polygons are in collision if they share an interior. While that may be technically true, it doesn’t really tell us how to efficiently compute intersections.

A more useful observation is that two polygons A and B are in collision if

  • A contains any vertex of B, or
  • B contains any vertex of A, or
  • any edge in A intersects any edge in B.

This definition actually works for nonconvex as well as convex polygons (but don’t ask me about degenerate polygons that self-intersect and such). Unfortunately, its a bit inefficient. If A has \(n\) vertices and B has \(m\) vertices, then we’d be doing \(n+m\) polygon–point collision checks and then \(nm\) edge intersection checks. If we keep our polygons simple then it isn’t all that bad, but there are better methods.

One go-to method for 2D polygon collision checking is the separating axis theorem, which states that if two convex polygons A and B are in collision, then you cannot draw a line between them. (Sometimes these things seem kind of obvious in retrospect…):

We can draw a separating line between the two polygons on the left, but cannot for the colliding ones on the right.

One direct consequence of the separating axis theorem is that if you can draw a separating line, then the projection of each shape perpendicular to that line will also have a gap:

Similarly, if you project the two shapes in any direction that they cannot be separated in, then their projections will not have a gap:

For convex polygons that are not colliding, a separating axis can always be constructed along one of the polygon’s edges. That is great — it means we only need to consider \(m + n\) possible separating axes.

Our collision detection algorithm is thus:

  • project both polygons onto all edge normals
  • if there are any gaps, then they do not intersect
  • otherwise, they intersect

Projecting a polygon along an axis is pretty easy:

struct PolygonProjection {
    f32 lo;
    f32 hi;
};

Overlap ProjectPolygonAlongDirection(
    const ConvexPolygon& polygon,
    const Vec2f normal)
{
    PolygonProjection proj;
    proj.lo = std::numeric_limits<f32>::infinity();
    proj.hi = -std::numeric_limits<f32>::infinity();
    for (u8 i = 0; i < polygon.n_pts; i++) {
        const f32 d = Dot(normal, polygon.pts[i]);
        proj.lo = std::min(d, overlap.lo);
        proj.hi = std::max(d, overlap.hi);
    }
    return proj;
}

Notice that the direction we pass in is the edge normal, not the edge tangent. This is because the normal is used to measure the distance away from the separating axis.

Two projects have a gap if proj_a.hi <= proj_b.lo || proj_b.hi <= proj_a.lo. Otherwise, they overlap.

Our collision detection algorithm is thus:

bool CollidesInAnAxisOfA(
    const ConvexPolygon& A,
    const ConvexPolygon& B)
{
    Vec2f a1 = A.pts[A.n_pts-1];
    for (u8 i = 0; i < A.n_pts; i++) {
        const Vec2f& a2 = A.pts[i];
        const Vec2f tangent = Normalize(b - a);
        const Vec2f normal = Rotr(tangent);
 
        PolygonProjection proj_a = 
        ProjectPolygonAlongDirection(A, normal);
        PolygonProjection proj_b = 
             ProjectPolygonAlongDirection(B, normal);
        if (proj_a.hi > proj_b.lo && proj_b.hi > proj_a.lo) {
            return true; // no gap, thus a collision
        }
    }
}

bool Collides(
    const ConvexPolygon& A,
    const ConvexPolygon& B)
{
   return CollidesInAnAxisOfA(A, B) || CollidesInAnAxisOfA(B, A);
}

Penetration Vectors

Once we have determined that two polygons are colliding, we need to separate them. In order to do that, we need to find the penetration vector, or the shortest vector that can be added to one of the polygons such that they no longer overlap:

Rather than figuring this out from scratch, let’s modify our collision-checking routine to also figure this out.

When we project the two polygons into a direction, then the overlap of their projections is the distance we would need to move them in the projection direction to separate them:

We can thus find the penetration vector by keeping track of the smallest overlap among all of the projections we do!

struct CollisionResult {
    bool collides;   // whether there is a collision
    f32 penetration; // the length of the penetration vector
    Vec2f dir;       // the unit penetration direction
}

// A helper function that projects along all of the edge tangents of A.
void CollisionHelper(
    const ConvexPolygon& A,
    const ConvexPolygon& B,
    CollisionResult* retval)
{
    Vec2f a1 = A.pts[A.n_pts-1];
    for (u8 i = 0; i < A.n_pts; i++) {
        const Vec2f& a2 = A.pts[i];
        const Vec2f tangent = Normalize(b - a);
        const Vec2f normal = Rotr(tangent);
        
        PolygonProjection proj_a = 
            ProjectPolygonAlongDirection(A, normal);
        PolygonProjection proj_b = 
            ProjectPolygonAlongDirection(B, normal);
        if (proj_a.hi > proj_b.lo && proj_b.hi > proj_a.lo) {
            retval.collides = true;
            return; // no gap, thus a collision
        }

        // Keep track of the smallest penetration
        f32 penetration = min(
            abs(proj_a.lo - proj_b.hi),
            abs(proj_a.hi - proj_b.lo))
        if (penetration < retval.penetration) {
            retval.penetration = penetration;
            retval.dir = normal;
        }
    }
}

CollisionResult Collides(
    const ConvexPolygon& A,
    const ConvexPolygon& B)
{
   CollisionResult retval = {};
   retval.penetration = std::numeric_limits<f32>::infinity();
   CollisionHelper(A, B, &retval);
   if (retval.collides) return retval;
   CollisionHelper(B, A, &retval);
   return retval;
}

Collision Manifolds

Our collision resolution logic can separate the polygons, but that isn’t enough to get realistic physics. We want to affect more than just their positions, but also their velocities and angular states. A collision should result in an impulse force that changes each body’s velocity and angular acceleration.

The torque is the counter clockwise force times the radius to the rigid body’s center of mass. That means we need the radius, which means knowing where the force is applied. So far we’ve only calculated the penetration normal, not where the penetration occurred.

Properly determining a collision response requires computing the contact point or contact manifold. Eric Catto has a nice GDC 2007 slide deck on this, but I wasn’t able to track down the actual talk that goes along with it, so it leaves something to be desired.

The contact manifold represents the points at which two objects come into contact. Ideally that would be the very first points that touch as two objects collide, but because of our discrete time steps, we have to work with partial overlaps:

In some cases, a contact manifold should be an entire edge, but we’ll be approximating it using at most two points.

The contact manifold can be constructed using either polygon as a reference. We’ll be moving the polygons out of collision afterwards, so the contact manifold will be roughly the same independent of which polygon we choose:

The reference polygon will be the one with the edge that the penetration vector is normal to. We’ll call that edge the reference edge. The other polygon is called the incident polygon, and we’ll need to identify an incident edge that crosses the reference edge.

We will start by running our latest collision resolution algorithm, which gives us a penetration unit vector and magnitude. We have to slightly modify that code to keep track of which polygon it came from, so that we know which polygon is the reference polygon.

If it determines that there is no collision, we are done. Otherwise, we then identify the incident edge, which shall be the edge in the incident polygon whose outward normal is most opposed to the reference edge normal:

int inc_edge_index = 0;
int inc_edge_index_succ = 0;
f32 min_dot = std::numeric_limits<f32>::infinity();

int i_prev = I->n_pts - 1;
for (u8 i = 0; i < I->n_pts; i++) {
    common::Vec2f normal_i = Rotl(Normalize(I->pts[i] - I->pts[i_prev]));
    f32 dot = common::Dot(normal, normal_i);
    if (dot < min_dot) {
        min_dot = dot;
        inc_edge_index = i_prev;
        inc_edge_index_succ = i;
        }
    i_prev = i;
}

We then clip the incident edge to lie between the end caps of the reference edge:

This clipping is done in two steps — first by clipping the incident edge (which is a line segment) to lie on the negative side of the left halfspace and then again to lie on the negative side of the right halfspace. We can clip a line segment to a halfspace using the dot product of the endpoints with the halfspace normal vector (which is tangent to the reference edge) and comparing that to the dot product of the reference edge endpoint.

struct ClipResult {
   int n_pts;
   Vec2f pts[2];
};
ClipResult ClipSegmentToHalfspace(
    const Vec2f a,
    const Vec2f b,
    const Vec2f plane_normal,
    const f32 plane_offset,
    const int ref_edge_index) {

    ClipResult retval = {};

    // Compute the distances of the end points to the plane
    f32 dist_a = Dot(plane_normal, a) - plane_offset;
    f32 dist_b = Dot(plane_normal, b) - plane_offset;

    // If the points are on the negative side of the plane, keep them
    if (dist_a <= 0.0f) {
        retval.pts[retval.n_pts] = a;
        retval.n_pts += 1;
    }
    if (dist_b <= 0.0f) {
        retval.pts[retval.n_pts] = b;
        retval.n_pts += 1;
    }

    // If the points are on different sides of the plane, 
    // then we need to find the intersection point. 
    // (We don't have to do anything in the case that the
    // points are both on the positive side of the plane.)
    if (dist_a * dist_b < 0.0f) {
        // Keep the intersection points
        f32 interp = distance_0 / (distance_0 - distance_1);
        retval.pts[retval.n_pts] = a + interp * (b - a);
        retval.n_pts += 1;
    }

    return retval;
}

If we fully clip the segment away (which shouldn’t happen), then we don’t have a real intersection.

The collision manifold will consist of all endpoints in the clipped incident edge that penetrate across the reference edge. This explains how we can end up with either a 1- or a 2-point collision manifold. We can once again use a dot product and compare it to a base offset:

f32 face_offset = Dot(reference_normal, reference_a);
for (int i = 0; i < 2; i++) {
    // Get the distance of the ith endpoint along the normal from the
    // reference face. A negative value indicates penetration into the
    // reference polygon.
    f32 separation = Dot(reference_normal, clip.pts[i]) - face_offset;
    if (separation <= kEps) {
        manifold.points[manifold.n_pts] = clip.pts[i];
        manifold.n_pts += 1;
    }
}

I coded up a visualization of this computation to help with debugging. The red vector is the unit normal along the reference edge, the little aligned blue vector is the same vector scaled by the penetration, the little red square is the 1 point in this collision manifold, and the little green square is the other endpoint of the clipped incident edge, which in this case was not accepted into the collision manifold:

We can set up a case where there are two points in our collision manifold:

I can interact with my visualization by translating or rotation the one square around:

Collision Resolution

Now that we can detect collisions, calculate penetration vectors, and calculate a collision manifold, we can finally do the work of resolving a collision. Wikipedia has a good overview of this, but I’ll write it out here for good measure.

If we have a collision, then our contact manifold will contain one or two contact points. For starters let’s work with the case of a single contact point \(\boldsymbol{c}\). We’ll assume that body A is the reference polygon.

We can simulate the effect of a collision by imparting opposed impulses to the colliding bodies. This impulse is the integral of the force acting on each body over that collision timestep.

The impulse force depends on the relative velocity between the two objects. Let \(\boldsymbol{r}_A\) and \(\boldsymbol{r}_B\) be the vectors from each body’s center of mass (usually its centroid) to the contact point. Then we can calculate the relative velocity at the contact point:

\[\boldsymbol{v}_\text{rel} = (\boldsymbol{v}_B + \omega_B \times \boldsymbol{r}_B) – (\boldsymbol{v}_A + \omega_A \times \boldsymbol{r}_A)\]

The impulse is zero if this relative velocity along the normal is positive.

The normal impulse for the objects is:

\[j = \frac{-(1+e)\boldsymbol{v}_\text{rel} \cdot \boldsymbol{n}}{\frac{1}{m_A} + \frac{1}{m_B} + \frac{(\boldsymbol{r}_A \times \boldsymbol{n})^2}{I_A} + \frac{(\boldsymbol{r}_B \times \boldsymbol{n})^2}{I_B}}\]

where \(e \in [0,1]\) is the coefficient of restitution that determines how elastic the collision is, \(m_A\) and \(m_B\) are the polygon masses, and \(I_A\) and \(I_B\) are their moments of inertia.

We then update each polygon’s linear and angular speed according to:

\[\begin{align} \boldsymbol{v}_A’ &= \boldsymbol{v}_A \> – \frac{j}{m_A} \boldsymbol{n} \\ \boldsymbol{v}_B’ &= \boldsymbol{v}_B + \frac{j}{m_B} \boldsymbol{n} \\ \omega_A’ &= \omega_A \> – \frac{j}{I_A} \left(\boldsymbol{r}_A \times \boldsymbol{n}\right) \\ \omega_B’ &= \omega_B + \frac{j}{I_B} \left(\boldsymbol{r}_B \times \boldsymbol{n} \right)\end{align}\]

When there are two points in the contact manifold, one might be tempted to approximate the collision with their mean. I found that that doesn’t quite work correctly in all cases. For example, in the situation below, the rotation causes the estimated contact point to move right, which imparts a torque in the wrong direction:

In order to fix this, I process both contact points, but weight the impulse in proportion to the penetration depth.

Positional correction is similar. To resolve the collision, we simply need to separate the two bodies along the penetration unit normal by a distance equal to the penetration \(\delta\). We could split the difference and shift the incident polygon by \(\delta / 2\) and the reference polygon by \(-\delta / 2\) along the normal, but we also want to account for their relative masses.

To account for the masses, we’d want to move the incident polygon by

\[\frac{\delta}{2} \frac{m_A}{m_A + m_B}\]

and the reference polygon by

\[-\frac{\delta}{2} \frac{m_B}{m_A + m_B}\]

where \(m_B\) is the mass of the incident polygon and \(m_A\) is the mass of the reference polygon. The heavier polygon is moved less.

However, as stated earlier, rather than fully resolve the collision, we will perform multiple, smaller resolution steps. If our penetration has a magnitude \(\delta\), we only apply \(\beta \delta\) where \(\beta\) is between 0 and 1, typically something like 0.4. We move the incident polygon in the penetration normal direction by that amount, and the reference polygon by the opposite amount. This will help jostle multiple conflicting collisions in a nice way such that they hopefully find a stable, valid solution.

For a nice code reference for all this, I recommend Randy Paul’s Impulse Engine.

Conclusion

And that’s enough to get us some pretty solid solid-body physics!

We actually did quite a lot in this post. We set off to do polygon collision resolution, started with polygon intersections, realized we also needed to figure out the penetration vector, did that, and then realized that to get appropriate impulses we needed to compute approximate collision manifolds. We then used that information to compute impulses, apply them, and to physically separate any colliding bodies.

We don’t have any friction. Its actually pretty easy to add, but I figured this post was already long enough.

We haven’t yet talked about any sort of spatial partitioning or other broad-phase collision tactics. No special memory allocation strategies, stable contact points, or joints. As always, there is a whole world of potential complexity out there that can be tackled if and when its needed, and we want to tackle it.

I highly recommend the Box2D repo as a helpful reference. It is a bit more mature, which can lead to some excessive complexity, but on the whole it is actually very cleanly written and extremely well documented.

Happy coding.