Patch 1.9.1 Notes - New Prison Control Point! Massive Performance Boost (+50%)
Author: indiefoldcreator
Date:
Mon, 02 Sep 2024
Game: Voxel Turf
HOTFIX 1.9.1 - Fixes some issues, scroll down to see the changelog
Finally a new big control point! The Old Prison:
The prison is about the same size as the Military base but can be attacked right from the game start (although its really not recommended). The reward for holding it is +5 Allure which will help you unlock perks faster. It starts at Level 3 and has a maximum level of 10.
This patch uses a combination of rendering techniques to increase performance and also removes redundant rendering and uploading of meshes to the GPU. I was going to make these in Turf2 but I needed some scenes to see what kind of performance gain can be obtained. Rather than making a dummy scene in T2 I've decided to juice up VT's rendering engine and get some real world experience implementing them.
Voxel Turf has different rendering techniques for Near-Chunks and Far-Chunks. Near Chunks have 1 graphics mesh (VBO henceforth) per texture, and far chunks use a texture atlas and therefore use 1 VBO per rendering layer (typically 1-3 layers per chunk). The problem with this is that each time a VBO is bound to be rendered there is an overhead cost. Because of this overhead the GPU is not utilised as much as it could be and performance is less than what it should be.
For near chunks I now use Texture Arrays. Textures of the same size are batched together for each chunk, massively reducing the amount of VBOs created (and therefore reduces binding overhead).
For far chunks I now use Indirect Rendering. Chunks are bucketed into regions of 256x256 and all the geometry is sent to one big, fixed size VBO ("MegaVbos"). If this VBO gets filled up then excess geometry goes into the next one, etc. MegaVbos are created once and are reused, saving creation/destruction overhead and memory fragmentation. MegaVbos are drawn with one bind and one 'glMultiDrawElementsIndirect' draw command.
In older versions of VT a mesh was generated for Near and Far rendering modes for each chunk within the players draw distance. Now only chunks within the Near draw range (typically ~100-150m) have their near modes generated and uploaded to the GPU. This removes a lot of probably-unused VBOs from being created and uploaded. In addition sometimes when a chunk is updated or loaded in it can trigger a rerender event on adjacent chunks, causing their VBOs to be regenerated. Sometimes the regenerated VBO is identical to the old one. Now the game computes a hash as the VBO is being generated, if its the same as the previous VBO then the new one discarded and not uploaded to the GPU.
Finally the game batches destruction of VBOs and temporary textures (for eg, minimap pieces). Instead individual assets being destroyed on object destruction, assets are added to a garbage list. This list is kept in a delay queue for 4 frames and then batch destroyed with 1 api call. This massively speeds up chunk-unloading on the client.
Texture Arrays and Indirect Rendering are enabled by default in the Graphics Settings menu. They require Shaders be turned on and OpenGL 4.3+.
Test Scene Results:
Performance preset: "Extreme 2", Shadows Ultra. AMD R9-3900X + Nvidia GTX 1080. Best case improvement = 88 FPS/49 FPS = +70%.
A more typical performance boost is in the order of ~+50% with both techniques enabled.
Test scene (screen resolution has changed from 1080p to 900p in these tests but do not effect results)
Old Prison Control Point:
Finally a new big control point! The Old Prison:
The prison is about the same size as the Military base but can be attacked right from the game start (although its really not recommended). The reward for holding it is +5 Allure which will help you unlock perks faster. It starts at Level 3 and has a maximum level of 10.
Massive Performance Increase:
This patch uses a combination of rendering techniques to increase performance and also removes redundant rendering and uploading of meshes to the GPU. I was going to make these in Turf2 but I needed some scenes to see what kind of performance gain can be obtained. Rather than making a dummy scene in T2 I've decided to juice up VT's rendering engine and get some real world experience implementing them.
Voxel Turf has different rendering techniques for Near-Chunks and Far-Chunks. Near Chunks have 1 graphics mesh (VBO henceforth) per texture, and far chunks use a texture atlas and therefore use 1 VBO per rendering layer (typically 1-3 layers per chunk). The problem with this is that each time a VBO is bound to be rendered there is an overhead cost. Because of this overhead the GPU is not utilised as much as it could be and performance is less than what it should be.
For near chunks I now use Texture Arrays. Textures of the same size are batched together for each chunk, massively reducing the amount of VBOs created (and therefore reduces binding overhead).
For far chunks I now use Indirect Rendering. Chunks are bucketed into regions of 256x256 and all the geometry is sent to one big, fixed size VBO ("MegaVbos"). If this VBO gets filled up then excess geometry goes into the next one, etc. MegaVbos are created once and are reused, saving creation/destruction overhead and memory fragmentation. MegaVbos are drawn with one bind and one 'glMultiDrawElementsIndirect' draw command.
In older versions of VT a mesh was generated for Near and Far rendering modes for each chunk within the players draw distance. Now only chunks within the Near draw range (typically ~100-150m) have their near modes generated and uploaded to the GPU. This removes a lot of probably-unused VBOs from being created and uploaded. In addition sometimes when a chunk is updated or loaded in it can trigger a rerender event on adjacent chunks, causing their VBOs to be regenerated. Sometimes the regenerated VBO is identical to the old one. Now the game computes a hash as the VBO is being generated, if its the same as the previous VBO then the new one discarded and not uploaded to the GPU.
Finally the game batches destruction of VBOs and temporary textures (for eg, minimap pieces). Instead individual assets being destroyed on object destruction, assets are added to a garbage list. This list is kept in a delay queue for 4 frames and then batch destroyed with 1 api call. This massively speeds up chunk-unloading on the client.
Texture Arrays and Indirect Rendering are enabled by default in the Graphics Settings menu. They require Shaders be turned on and OpenGL 4.3+.
Test Scene Results:
Performance preset: "Extreme 2", Shadows Ultra. AMD R9-3900X + Nvidia GTX 1080. Best case improvement = 88 FPS/49 FPS = +70%.
A more typical performance boost is in the order of ~+50% with both techniques enabled.
Test scene (screen resolution has changed from 1080p to 900p in these tests but do not effect results)
Full Changelog
VERSION 1.9.1 - 26/09/2022
===========
- - vtserver: flagShutdown() now waits for currently saving files to flush to disc (not just orderly shutdown)
- - vtclient: Use manual mipmap generation for texture arrays rather than automatic (AMD drivers can crash on mipmap gen for texture arrays)
- - vtclient: If the game crashes when trying to create Texture Arrays then Texture Arrays will be automatically disabled on next startup
- - vtclient: Resized some non-power-of-2 block textures. Only power-of-2 block textures can now be made into texture arrays.
- - vtserver: When a player is in a dungeon, it will now only check ~150 chunks for dungeon updates every tick instead of every chunk loaded on the server (with 40k chunks loaded this could take 16ms a tick!).
- - vtserver: Made unloading chunks a bit faster
VERSION 1.9.0 - 24/09/2022
===========
- - Added the Old Prison control point. Gives +5 Allure to whomever captures it
- - MegaVbos now use a free list to reusue free memory
- - Planes now go ~12% faster
VERSION 1.9.0-RC2 - 21/09/2022
===========
- - Fixed world holes
- - Fixed minimap pieces not appearing
- - Fixed buildings not rendering correctly in preview window
VERSION 1.9.0-RC1 - 17/09/2022
===========
- - Shaders are now enabled by default for intel iGPUs, providing that they support Opengl4.6. Drivers newer than 2019 should support this.
- - vtserver: Specualtive fix for save corruption - when requesting an orderly shutdown wait for the LotContainer saving thread to finish writing.
- - vtclient: Fixed some crash-on-exit issues
- - vtclient: Fixed some memory leaks
- - Fixed white text decals not being visible on distant chunks
- - vtclient: Implemented array-textures for near chunks. ~25% FPS increase
- - vtclient: Implemented indirect draw for far chunks & shadow maps. Far chunks are batched into groups of 256m x 256m. Another ~25% FPS increase at 600+ draw distance
- - vtclient: Only generate high quaility renders of chunks that are near the player (typically closest 100m). Up to 50% reduction in vbo usage/graphics memory consumption at long draw distances
- - vtclient: Now batches the destruction of OpenGL buffers and textures. Destruction is delayed by 4 frames to prevent OpenGL sync/stall. Makes unloading/unrendering chunks much faster
- - Both vtclient/vtserver - freshly discarded chunks are now stored in free-list to be reusued. Saves a abunch of big allocations/deletions. This pool will shrink over time to ~2048 chunks
- - vtclient: Up to 5% (additional) fps increase by removing reduntand OpenGl calls
- - vtclient: Stop processing network packets if we've spent 25ms already in a frame doing them (wait until next frame to continue)
- - Added a timeout for drawing the map (the one that appears when you press M). This is to prevent the game becoming unresponsive when zoomed out on massive maps
- - You can now choose how many render threads you have in the graphics settings menu. Default amount is square_root(num_cpu_threads), min 2.
Write your comment!