Optimization Enhancements

Throughout my years at Muse, I worked directly on implementing automated optimization pipelines to reduce bottlenecks across our stack. All optimization work to be done was determined by ruthlessly assessing performance profiles and analyzing where and how we could save on performance costs. Three optimization efforts I'll highlight below include caching colliders, automating GPU texture compression, and my solution to instancing draw calls.

Collider Caching

Principle to any 3D engine is providing out of the box collider support for uploaded models. At Muse, we provided 3 solutions:

  1. Bounding box collider

  2. Dynamic trimesh generated on the fly

  3. Custom collider support via mesh naming

Tutorial I made demonstrating our various collider solutions:

These functions worked great, but we faced a fundamental issue with the trimesh operation. Because the generation of new geometry was calculated client-side at load time, any world with hundreds of high-poly models would load morbidly slow. To resolve this, I wrote a function to append .glb files with the generated trimesh geometry so that this function would not have to run on the fly every time a world URL was fetched. My fix solved this issue by running the trimesh operation for a model on upload, then appending the file with the generated trimesh geometry as a custom collider. We could then fetch the appended file on load as a custom collider which did not require any real time calculations to take place.

  • //Example of deep copying mesh and removing reference so you can join and simplify
    function primitiveClone(doc: Document) {
      //create array of meshes and nodes
      const meshes = doc.getRoot().listMeshes();
      const nodes = doc.getRoot().listNodes();
    
      //iterate over each and deeply-copy them into new 'collider'
      //IMPORTANT! You must deeply-copy or else .copy() just references original! https://github.com/donmccurdy/glTF-Transform/discussions/495
      for (let i = 0; i < meshes.length; i++) {
        const mesh = meshes[i];
        const node = nodes[i];
    
        const clonedNode = doc
          .createNode("collider")
          .copy(node, (ref) => ref.clone());
        clonedNode.setName("collider");
        const clonedMesh = doc
          .createMesh("collider")
          .copy(mesh, (ref) => ref.clone());
        clonedMesh.setName("collider");
        clonedNode.setMesh(clonedMesh);
    
        // see if there are savings you can get out of manipulating the primitives more while preserving geometry
        // eventually we should make ratio a dynamic prop users can adjust value of
        clonedMesh.listPrimitives().forEach((prim) => {
          prim.setMaterial(null);
          weldPrimitive(doc, prim, { overwrite: false, tolerance: 0.0001 });
          simplifyPrimitive(doc, prim, {
            simplifier: MeshoptSimplifier,
            ratio: 0.99,
            error: 0.1,
          });
        });
        doc.getRoot().getDefaultScene()?.addChild(clonedNode);
      }
    }
    
    //Example of simply adding a bounding box collider to a model
    function addBoundingBoxCollider(doc: Document) {
      //calculate the bounding box
      const sc = doc.getRoot().getDefaultScene() || doc.getRoot().listScenes()[0];
      const bbox = getBounds(sc);
    
      //store bounding box min + max values in a vertex array
      const vertices = new Float32Array([
        bbox.min[0], bbox.min[1], bbox.min[2], // 0
        bbox.max[0], bbox.min[1], bbox.min[2], // 1
        bbox.min[0], bbox.max[1], bbox.min[2], // 2
        bbox.max[0], bbox.max[1], bbox.min[2], // 3
        bbox.min[0], bbox.min[1], bbox.max[2], // 4
        bbox.max[0], bbox.min[1], bbox.max[2], // 5
        bbox.min[0], bbox.max[1], bbox.max[2], // 6
        bbox.max[0], bbox.max[1], bbox.max[2] // 7
      ]);
    
      const indices = new Uint16Array([
        0, 1, 2, // front face, tri 1
        2, 1, 3, // front face, tri 2
        4, 6, 5, // back face, tri 1
        5, 6, 7, // back face, tri 2
        0, 4, 1, // left face, tri 1
        1, 4, 5, // left face, tri 2
        2, 3, 6, // right face, tri 1
        6, 3, 7, // right face, tri 2
        0, 2, 4, // top face, tri 1
        4, 2, 6, // top face, tri 2
        1, 5, 3, // bottom face, tri 1
        3, 5, 7 // bottom face, tri 2
      ]);
    
      //create position accessor
      const positionAccessor = doc
        .createAccessor()
        .setArray(vertices)
        .setType(Accessor.Type.VEC3);
    
      //create index accessor
      const indexAccessor = doc
        .createAccessor()
        .setArray(indices)
        .setType(Accessor.Type.SCALAR);
    
      //create primitive referencing accessors
      const colliderPrimitive = doc
        .createPrimitive()
        .setAttribute("POSITION", positionAccessor)
        .setIndices(indexAccessor);
    
      //add primitive to mesh
      const colliderMesh = doc
        .createMesh("collider")
        .addPrimitive(colliderPrimitive);
    
      //create node and set collider mesh to that node
      const colliderNode = doc.createNode("collider");
      colliderNode.setMesh(colliderMesh);
    
      //add node to the root
      doc.getRoot().getDefaultScene()?.addChild(colliderNode);
    }

KTX2 Automation

As we expanded support for more file types, we hit spikes in mobile site crashes due to creators maxing out VRAM limitations. I knew the solution was to dive into the KTX2 spec and to implement a GPU texture compression solution for uploaded images. I worked with a teammate to plan the required infrastructure, weighing options based on user needs and what we could afford as a company. I conducted R&D on the toktx Khronos Texture Tool to decide which key-value pairs to apply to all uploaded textures and got to work. We chose an 'optimize on upload' approach, benefiting both creators and us, as this method allowed us to distribute jobs efficiently, preventing network congestion and reducing wait times.

On upload, we optimized files with KTX2, accounting for potential conversion failures common in the process (e.g. divisible by 4 resolution, min mag filters set to linear, y-axis flip). We achieved an average 90% reduction in GPU payloads and 50% reduction in CPU payloads for all uploaded textures, whether standalone images or 3D model textures.

Instancing Models

To optimize rendering in cases where a single asset was causing a high number of draw calls, I developed a prototype that scraped the site's manifest.json tree to identify URLs instantiating separate components, but rendering the same file. I extracted and organized position, rotation, and scale values into three separate arrays for the singular asset, then utilized these arrays to set the TRS values for my instance component, resulting in a single draw for all duplications of a file rather than separate draws for each duplication. This instancing method proved to be optimal, as it allowed users to maintain precise control over fine-tuning the TRS values of duplicated assets within the builder tools, as was expressed as a need by our users.

  • //Example passing in an asteroid to instance
    type ModelProps = {
      model?: string;
      positionArray?: string;
      sizeArray?: string;
      rotationArray?: string;
      count?: number;
    } & GroupProps;
    
    export default function InstancedModel(props: ModelProps) {
        const { 
          model = "./asteroid.glb", 
          positionArray = "[0, 0, 0]", // paste the position array for all 2000+ asteroids
          sizeArray = "[1, 1, 1]", // paste the scale array for all 2000+ asteroids
          rotationArray = "[0, 0, 0]", // paste the rot array for all 2000+ asteroids
          count = 1, // define how many asteroids to instance
          ...restProps 
        } = props;
    
        //Custom React hooks from our development on the SpacesVR engine
        const { nodes, materials } = useModel(model) as unknown as GLTFResult;
        const keys = Object.keys(nodes);
        const meshArr: Mesh[] = [];
      
        // @ts-ignore
        keys.map((key) => meshArr.push(nodes[key]));
      
        //Transform function to parse arrays and set values of each instance
        const transform: Object3D[] = useMemo(() => {
          const transes = [];
          const parsePositionArray = JSON.parse(positionArray).map(Number);
          const parseRotationArray = JSON.parse(rotationArray).map(Number);
          const parseSizeArray = JSON.parse(sizeArray).map(Number);
          
          for (let i = 0; i < count + 1; i++) {
            const obj = new Object3D();
      
            const positionX = parsePositionArray[i * 3];
            const positionY = parsePositionArray[i * 3 + 1];
            const positionZ = parsePositionArray[i * 3 + 2];
    
            const rotationX = parseRotationArray[i * 3];
            const rotationY = parseRotationArray[i * 3 + 1];
            const rotationZ = parseRotationArray[i * 3 + 2];
            
            const scaleX = parseSizeArray[i * 3];
            const scaleY = parseSizeArray[i * 3 + 1];
            const scaleZ = parseSizeArray[i * 3 + 2];
      
            obj.position.set(positionX, positionY, positionZ);
            obj.rotation.set(rotationX, rotationY, rotationZ);
            obj.scale.set(scaleX, scaleY, scaleZ);
      
            transes.push(obj);
          }
          return transes;
        }, [positionArray, sizeArray, rotationArray, count, model]);
      
        return (
          <group name={"instance-idea"} {...restProps}>
            {meshArr.map((mesh) => (
              <InstancedObject key={mesh.uuid} mesh={mesh} transforms={transform} />
            ))}
          </group>
        );
      }