Shogun 2 Multithreading Issues

This article is going to take a look at one of the most successful & fun games to ever grace PC in the form of strategy based game, this game is Total War: Shogun 2. I have even redesigned my blog in its honour. However, they are underlying issues with this game which we will be taking a look at in this article.

Let me first underline the core parts of the game which have come under question from a technical aspect, such as the way it’s programmed and the contribution of how limited current technology is to take advantage of multi-core systems. The issue I’m referring to is how the game is allegedly not optimized for multi-core systems, which have been said by virtually everyone who has played the game. First let me include a quotation from this document, which is from an article saying how Intel TBB helps Shogun 2.

The game is too big now to rewrite it again, Broadhurst said. “So we’re revolutionizing and evolving parts of the game with each iteration. For Shogun 2, there have been massive changes on the campaign map and the technology behind it. Now there’s more code shared with the battle map, so we can show a lot more detail.” (See the Campaign Trail sidebar.)

Other new features include Microsoft DirectX* 11 support, which will be matured thoroughly with the next title in the series, and a deferred renderer. The latter is being used to light dramatic night-time battles, in which the only sources of illumination are paper lanterns held aloft on sticks. In one awe-inspiring sequence, the pre-fight camera pans high over the heads of several thousand neatly aligned silhouettes marked out only by small pools of light.

Since the Empire rewrite, Broadhurst has been especially keen to refine the use of multi-threaded code in the game engine. In this respect Shogun 2 represents significant progress over its immediate predecessor, with an improved ability to scale effects and details according to the number of processor cores available.

“We’ve kept the minimum spec where it was with Napoleon,” explained Mike Simpson. “Keeping the spec the same helps to retain our customer base. At the top end, you want to make sure that if somebody buys an absolutely state-of-the-art machine that they’re going to get something out of it that they wouldn’t otherwise.”

This makes it clear that Total War Shogun 2 indeed shares the load in multi-core systems albeit in a different way, this is when the strength of a single core counts the most. According to the document, graphical features were practically the only thing that supported the multi-core technology in the game, and as it appears it was done for a very good reason. One TotalWar forum user called ShellShock was able to give an opinion on how difficult it is to spread work among the cores without sacrificing speed, this is due to the overhead associated with variables being communicated with each of the cores, here is an extract on what has been happening inside the game.

There is a common misconception that equally dividing the processing across the available cores will automatically improve performance. This is not necessarily the case. There is an overhead for threads to communicate with each other, share common data, and synchronize their work. This can outweigh any benefit from using more than one core, and actually result in worst performance. Don’t you think CA would have done this by now if it actually helped performance?

S2 already runs with multiple threads – well over 30 when in a battle. The issue then becomes how you divide the processing between these threads, and how you divide the threads between the available cores. With four cores you can run four threads simultaneously  the others are temporarily suspended. Switching between threads is in itself relatively expensive, so it is a good idea not to have too many threads that you need to constantly switch between. Also threads need to communicate with other and synchronize access to shared data, which can slow things down.

Imagine you have a room of 30 people all reading from two books. 29 of them are reading from the same book so they have to take it in turns to read one word before passing the book to another person to read the next word. Also there are four chairs in the room, and they are only allowed to read from the book whilst they are sitting in one of the chairs (each person gets the same amount of time to sit down). The 30th person has their own book to read from. The books all have the same number of words in them. Which book do you think is finished first – the book shared between the 29 people, or the book being read by just one person? All 30 people have to take in turns to sit down to read – although they each get the same amount time sitting down, the person with his/her own book gets through the book much quicker because they do not have to share it with anyone else.

The same thing can be true of programs:

people = threads
books = shared data
chairs = cores/processors

When people say that Total War does not support multiple-cores what they usually mean is that all the cores are not running at a high percentage utilization  in reality all the cores are being used, but some (maybe one) is doing most of the work. This is comparable to a single person being able to read a book much quicker than sharing the same book between lots of people. It really can be better to let a single thread run flat out and not have to worry about it constantly stopping to share data with other threads.

This is where we get into the detail of the amount of potential parallelism inherent in the algorithms used by the game. In my comparison this equates to how much each person can read (one word/one paragraph/a whole book) before passing on to the next person. I can only guess that the amount of inherent parallelism is not sufficient to allow an even distribution of the processing load across all the cores. This is not unusual – a lot (most?) algorithms do not have a lot of parallelism, so if they are split across multiple threads, the threads have to spend a lot of time communicating with each other, which can kill performance.

Another example: imagine a battle between two armies, with 5,000 men each. In the real world each man has his own processor (his brain). In the game world, the 10,000 mean have to share, say, 4 processors (cores). A very naive implementation would spawn a thread for each man, and the game would run like a slide show (if at all), because of the overhead in managing so many threads. So what is the best way to split the processing for 10,000 men across the 4 cores? Perhaps we could have one thread per army, but when the armies come into contact, the threads would be constantly synchronizing data with each other (soldier X in army A has hit soldier Y in army B etc), and this synchronization can cause a bottleneck too. And what about all the other objects in the battle – do we have dedicated threads for them too? 

Improving the performance of the game is not necessarily the same as maxing out all the cores. It may not be possible to evenly spread the load across the cores without making performance worst. My assumption is that the game algorithms are already optimized for performance, and that happens to be when most of the work is done on one thread (running on one core).It is probably more efficient to have a single game loop running on one thread that is able to iterate through all the thousands of game objects without worrying too much about having to share state with other threads.

And there you have it, a very good reason on why Shogun 2 cannot fully utilize all four cores. It will be a while before Intel comes up with a solution that will be able to increase the parallelism of coding with one of their processors. In fact, this is a common problem among developers but is quickly being dealt with as more games show better support for parallelism. It has been said repeatedly that minimizing the need for synchronization instead of looking for an efficient model of synchronization is always a better alternative such as synchronizing less often since it makes code easier to debug and will usually result in the code running faster.

However, there are some games that are still showing signs of a lack of parallelism such as Assassins Creed 3 but it is well-known that the consoles were a priority when developing the game instead of PC so in this regard the engine catered more to consoles then it did for PC’s.There is also the issue of the game itself running on Windows which includes the overhead that would have been included in running the Operating System, which would create even more overheads for the programmers to deal with.  Perhaps Rome 2 will be able to do a better job at this sort of thing but for now we will just have to wait and see.

Author: Kingsmin1994

Someone who likes to obtain various information across many different fields.

Leave a comment