Game rendering thread


















The rendering thread can kick off draw-call submission to the job system if you've got a lot of draws. On D11, I use a deferred context to record my GUI commands on another thread, just because GUI traversal is expensive and it's intertwined with draw submission. For the main scene, I cull and collect drawables in a platform independent manner. These jobs could submit those command buffers to the GPU, but that would result in non-deterministic draw ordering.

To preserve ordering, one thread submits all of the command buffers in a single call after those jobs have completed. Ah , I see. Do you cull and collect drawables on the rendering thread i. Yeah, I take a snapshot of the gameplay state and pass it from the game thread to the render thread, which includes object transforms, etc The render thread then culls and extracts draw-items. Generally there's many draw-items for a single model, which the game doesn't care about.

It just wants to place a model in the world, not care that it's made up of sub-meshes. Log In. Sign Up. Remember me. Forgot password? Don't have a GameDev. Sign up. Email Address. Careers Careers. Learn about game development. Universal data that could be relevant to rendering sound, graphics, or any other data is kept in a sequence of objects that are volatile, or universally available to all threads but never kept in thread memory.

There's a slight performance penalty there, but used properly, it has allowed me to flexibly assign audio to one thread, graphics to another, physics to yet another, and so forth without tying them into the traditional and dreaded "game loop. So as a rule, all OpenGL calls go through the Graphics thread, all OpenAL through the Audio thread, all input through the Input thread, and all that the organizing control thread needs to worry about is thread management.

Game state is held in the GameState class, which they can all take a look at as they need to. Hopefully you see what I'm saying, I have a few thousand lines on this project already. If you would like me to try and scrape together a sample, I'll see what I can do. This way the render thread has say bucket 0 it reads from. The logic thread uses bucket 1 as it's input source for the next frame and writes the frame data to bucket 2.

At sync points, the indices of what each of the three buckets mean are swapped so that the next frame's data is given to the render thread and the logic thread can continue forward. You can in fact keep the game loop serial and decoupling your render frame rate from the logic step using interpolation.

To take advantage of multi-core processors using this kind of setup is where you would have a thread pool that operates on groups of tasks. These tasks can be simply things such as rather than iterate a list of objects from 0 to , you iterate the list in 5 buckets of 20 across 5 threads effectively increasing your performance but not over complicating the main loop.

Usually, the logic that deals with graphics rendering passes and their schedule, and when they're gonna run, etc is handled by a separate thread. However that thread is already implemented up and running by the platform you use to develop your game loop and game. So in order to obtain a game loop where the game logic updates independently of the graphics refresh schedule you don't need to make extra threads, you just tap into the already existing thread for said graphics updates.

Basically what you want is a mixed step game loop: you have some code that updates the game state, and which is called inside the main thread of your game, and you also want to periodically tap into or be called back by the already existing graphics rendering thread for heads up as to when it's time to refresh the graphics.

In Java there's the "synchronized" keyword, which locks variables you pass to it to make them threadsafe. Locking variables makes sure they don't change while running the code following it, so variables don't get changed by your updating thread while you are rendering them in fact they DO change, but from the standpoint of your rendering thread they don't. The attributes can still change without changing the pointer. To contemplate for this, you could copy the object yourself or call synchronized on all attributes of the object you don't want to change.

You might also need some kind of correlation id to determine which game object this data belongs to. How you do it depends on what language you're working with. I like immutable data so I tend to return new immutable object for every update. This is a bit of memory waste but with modern computers it's not such a big deal. Still if you want to lock shared data structures you can do it. Check out Exchanger class in Java, using two or more buffers can speed things up.

Before you get into sharing data between threads work out how much data you actually need to pass. If you have a octree partitioning your 3d space, and you can see 5 game objects out of 10 objects total, even if your logic needs to update all 10 you need to redraw only the 5 you're seeing. Having queues like this will increase the memory usage of the application due to all the data duplication, and inserting the data into these threadsafe queues is not free.

The core synchronization primitives to communicate data between cores are atomic operations, and Cpp past 11 has them integrated into the STL. Atomic operations are a specific set of instructions that are guaranteed to work well as specified even if multiple cores are doing things at once.

Things like parallel queues and mutexes are implemented with them. Atomic operations are often significantly more expensive than normal operations, so you cant just make every variable in your application an atomic one, as that would harm performance a lot. They are most often used to aggregate the data from multiple threads or do some light synchronization. As an example of what atomics are, we are going to continue with the example above of the particle system, and we will use atomic-add to add how many vertices we have across all particles, without splitting the parallel for.

You can have atomic variables of multiple base types like integers and floats. When used wrong, atomic variables will error in very subtle ways depending on the hardware of the CPU.

Debugging this sort of errors can be hard to do. A great presentation about using atomics to implement synchronized data structures, and how hard it really is, is this talk from Cppcon.

For a more in-depth explanation about how exactly std atomic works, this other talk explains it well. A mutex is a higher level primitive that is used to control the execution flow on threads. You can use them to make sure that a given operation is only being executed by one thread at a time. API details can be found here Mutexes are implemented using atomics, but if they block a thread for a long time, they can ask the OS to put that thread on the background and let a different thread execute.

Continuing with the example, we are going to implement the parallel queue above as a normal vector, but protected with a mutex. Only one thread at a time can lock the mutex. If a second thread tries to lock the mutex, then that second thread will have to wait until the mutex unlocks. This can be used to define sections of code that are guaranteed to only be executed for one thread at a time. Whenever mutexes are used, its very important that they are locked for a short amount of time, and unlocked as soon as possible.

Mutexes have the great issue that if they are used wrong, the program can completely block itself. Using it, the code above would look like this.

We have explained the ways one can parallelize things in the engine, but what about GPU calls themselves?

Not even one thread at a time, but one specific thread. You can see that on both the Doom3 engine and the UE4 engine. Those older APIs have parts of the API that can be used from multiple threads in very specific cases, for example texture loading in OpenGL from other threads, but their support can be very hit and miss given their nature as a extension on APIs that were never designed with multithreading in mind.

In the case of Vulkan, the spec defines some rules about what resources must be protected and not used at once. We are going to see some typical examples of the kind of things you can multithread in Vulkan, and their rules. For compiling pipelines, vkCreateShaderModule and vkCreateGraphicsPipeline are both allowed to be called from multiple threads at once.

A common approach for multithreaded shader compilation is to have a background thread dedicated to it, with it constantly looking into a parallel queue to receive compilation requests, and putting the compiled pipelines into another queue that then the main renderthread will connect to the simulation. Compiling shader pipelines can take a very long time, so if you have to compile pipelines at runtime outside of a load screen, then you need to implement such a multithreaded async compile scheme for your game to work well.

For descriptor set building, that can also be done from multiple threads, as long as the access to the DescriptorPool you are using to allocate the descriptor sets is synchronized and not used from multiple threads at once.

Command submission and recording is also completely parallel, but there are some rules around it. Only a single thread can be submitting to a given queue at any time. If you want multiple threads doing VkQueueSubmit, then you need to create multiple queues. As the number of queues can be as low as 1 in some devices, what engines tend to do for this is to do something similar to the pipeline compile thread or the OpenGL api call thread, and have a thread dedicated to just doing VkQueueSubmit.

When you record command buffers, their command pools can only be used from one thread at a time. While you can create multiple command buffers from a command pool, you cant fill those commands from multiple threads. If you want to record command buffers from multiple threads, then you will need more command pools, one per thread. Vulkan command buffers have a system for primary and secondary command buffers.

The primary buffers are the ones that open and close RenderPasses, and can get directly submitted to a queue. Their main purpose is multithreading. Secondary command buffers cant be submitted into a queue on their own.

Lets say you have a ForwardPass renderpass. Before making the main command buffer that will get submitted, you make sure to get 3 command pools, allocate 3 command buffers from them, and then send them to 3 worker threads to record one third of the forward pass commands each.



0コメント

  • 1000 / 1000