Hello,
If any game developers are around, feel free to chime in. I come from the other side: the hardware. Now that we've got dual and quad-cores (and we are working on an 8-core design), we've got all this untapped potential for multithreading. And as far as I can see, the gaming industry hasn't really found a way to take advantage of it yet. Am I wrong? Instead, it seems most of the work is being done on the high-end graphics end of it.
What are the ways the gaming industry is taking/could take advantage of multicore/multithreading? The only one I can think of is MMORPG, and that is only on the server side. How could you take advantage of it on the client side? Could you use it effectively on the AI? I've seen it done in Chess, where the computer thinks at the same time as the human player takes his turn (i.e. one thread is the main event loop, the other is the AI thinking). But even that, while it is technically multithreading, it's not really taking advantage of the multiple cores.
I`ll pay attention to this thread... could be interesting.
I would like to get information on this too. As long as it does not get to technical.
Personally, I think multithreading isn't that uncommon - after all, it works even with singlecore CPUs. Sound, filesystem IO, AI, anything that streams content, multiplayer components (not only server, but clientside as well), ... There are many places that can utilise propoer threading.
If the added development cost is worth it in every instance is another question. Making a complex piece of software threadsafe isn't easy.
On the other hand, even in the small game that I'm coding in my free time I'm using threads...
I'm not a games developer, I'm a real time embedded and also C# developer.
Conceptually though if I was to design a game there are a few ways I could take advantage of multi core processors.
Like you say 1 way is for the AI to be continually assessing it's situation and updating behaviours in a background thread. This could potentially be more than 1 thread. An imediate or combat AI for enemies in the vicinity who would have more advanced senses and behavour, then other threads could handle world level AI. Depending on the game you could add even more realism and complexity by giving each enemy it's own thread although it would be some task to make the AI work together. In a game such as Sins each AI could have their own thread.
In a roaming level environment I could use a thread that handles the loading and freeing of assets as the player moves around a level.
Any large algorithms to do with physics, line of sight, pathing etc could belong in their own thread but this is all stuff that needs to be synchronised so while you are taking advantage of multi core systems you are adding inefficiency to your processing, large complexity to the design and non multi core systems would suffer.
Like I said, I'm not a games developer but I can't really see how keep adding cores to the processor really helps games a great deal. To take advantage of all the extra power is going to up development costs a LOT. It ups your potential but it's not like the good old days of processors just getting faster.
I strongly regret having to drop out of college before we got on to mult-threading, I would have loved to follow this
The question should not be 'what can we do with the extra cores' but 'how much benefit can we get from it, for the effort it takes'.
As mentioned before, making multithreaded software safe/stable is hard, and it takes a lot of extra work to make it work. The more threads the more problems.
GPU's are sort of multi-core in their set-up but it doesn't require much special work from the developer to make use if that because it is dealt with by the driver. The best way to get multi-core usage used widely is to create drivers that process general code in parallel. This puts the effort on the hardware designer, and frees up the resources for the game developer.
I think Intel is going down that path with their next GPU.
GPUs are not multicore, but rather just highly parallel, as graphics rendering consists of a huge number of small floating-point calculations which mostly don't depend on each others' outcomes (excepting stuff like anti-aliasing). Such parallelization doesn't really work for general-purpose processing where things are more linear and less predictable (this is also the reason why the Cell CPU is underwhelming in a lot of tasks).
Source, the engine behind Valve's games, is multithreaded, and I believe they have some articles about it.
There are a couple major ways to break it down:
By the way, I'm a student: Computer Science major, hoping to graduate in a year.
Well, I highly recommend going back. The payoff of getting that degree is well worth the costs .
I've done some multithreading - I wrote some threaded code for an open source project, and for one of my homework assignments I'm using threading. The low level stuff can be pretty tough, but increasingly there are more techniques and tools that can used to abstract that away a bit.
You're absolutely right - and you'd be surprised how many applications are heavily threaded, and just how many threads they have!
And some interesting things about the games we already play (using process explorer, which gives me some detailed info about the threads):
GalCiv 2 TA: 15 threads while a game is in progress, three directly related to the executable, five related to directx, 1 related to bink, the rest look related to the OS.
Sins of a Solar Empire: 2 directly related to the executable, 4 for nvidia drivers? 1 system, 8 directx.
Vista's Minesweeper (Minesweeper was rewritten for Vista): 1 RPC related, 2 directly from the executable, 2 Direct Sound, 1 Direct 3D.
Thanks Cobra, I wasn't home so I couldn't check out exact figures. But your findings more or less confirm what I expected.
In the hardware side, we are taking advantage of the GPU to run our simulations. We don't care about the video, we just want the systolic array to solve our system of matrix computations. When you analyze the electrical behavior of circuits, it all boils down to a series of matrices. We're getting about a 100x speedup over a traditional single-core, and we're not even using the GPU for a video card.
It can be good practice generally to offload certain subsystems into their own thread. User interfaces for example, so they don't have that sticky feel to them while the program is busy. Many graphics frameworks handle it that way be default.
But multithreading can be very hard. It almost always makes things more difficult to debug, and it adds a whole new class of potential errors. It really depends on your problem. The first threaded application I wrote had a MessageHandler class which other threads had to register themselves with to get a Queue <Message> postBox which it could get something from whenever it was free. This turned into a horrible mess of switch-case statements.
On the other end of the spectrum, some problems are just made for it. If you can design the program so that threads don't need to communicate, it is very simple to get it all working. A good general strategy is to separate objects into immutable parts which can safely be passed around to all threads, and an instance which is the thread, and in which all dynamic data is generated. Sometimes you get lucky with the task you're solving and can take such an approach, sometimes not.
Supreme Commander is so far the only game I've played where you can really see the benefit of multithreading. I believe their approach was to use it for unit AI and physics.
Yeah - a note on that - message passing should be built into languages better. Current languages don't do it so well, but there are languages like erlang and smalltalk where it's a natural part of the language and works very well with concurrency.
In fact, message passing was supposed to be a part of OOP, but sorta fell out of favor with C like languages.
I find that with .NET and C#, event-based messageing is a lot easier. At least I didn't break my fingers doing it in my server apps I wrote.
Would it be possible to cut any game into some core block:-World (can be anything from a chessboard to a first person map)
-Objects (anything that is just subject to physics, such as bullets and vehicles)
-Agents (this can be either a human player input directly into the computer, or over the net, or a computer AI player)
-Graphics (really just an 'Agent' that only observes)
I would imagine that for a game each of these entities can run on a separate core, interacting with 'World' in it's own way and independently from it. 'World' then would not really need to know if an 'agent' is a computer player or a human player.
The objective of the 'World' is only to pass through parameters between the different cores and each core then play's it's own role in the grand scheme of things.
Depending on the type of game you might have more cores dedicated to 'objects' and less to 'Agents' or the other way around. Something like Galciv would be high on the 'Agents' (one for each player, or each group of players) and something like an RTS would be high on the Objects (every unit and projectile).
I guess you would lose one core to the interaction between the different layers but if you are talking about 4 or more cores there should still be a benefit. With proper API's it should even be quite easy to configure a game at startup to however many cores are available (up to a maximum defined by the agents and objects).
the 'Agent' for the human user would deal with inputs, and the 'Agent' for Graphics & Sound just presents stuff in whatever way the game designer wants. The information handed to these agents should be the same as for any other agent (any computer player).
I am probably thinking too simple since I mostly work with embedded micro's with only a few simple threads.
I was taking a Bachelor Honors Degree in Computer Games Development though, and was working heavily in C++. If you have some multi-threading notes I could yoink, PM me
The problem with action games I see is that the objects really need to speak to each other at the register level, or at the very least the cache. You can't safely assume a cache hit or anything if they're in separate threads. Actually, that's the single biggest reason why it seems multithreading in games does not look good. But it does makes sense if you can split off the I/O access from the CPU, such as pre-loading the levels on Halo. Even then, a lot of online games, it seems the network traffic has to be blocking, i.e. the game does need to lag if the network packets lag. So there's not a lot of I/O you can split off, either.
Kinda, sorta, not really .
World and objects are usually just data. They're stored in memory and don't take up CPU time.
Physics is generally on a separate thread(s), as it is CPU/GPU intensive.
In a game, input and graphics are deeply divided: On the motherboard the graphics is done on the GPU, which is generally on the northbridge and uses a huge amount of bandwidth, while the input generally goes through the southbridge and generally doesn't need much bandwidth. Usually it's good to have graphics on a separate thread.
This is true - you can generally detect how many cores/CPUs you have on a system, and using that info decide how many threads you want.
Or maybe it's time to re-think object level interaction. GPUs are moving towards having physics support, and nVidia now has physics APIs (thanks to their recent purchase of PhysX), which means they can take care of object interaction, and you don't have to. You just take care of the logic part if you want your game to do something interesting when certain objects interact. That can generally be done in a single thread separate from the physics.
Actually, one of the great things about multithreading is that you can turn blocking calls into non-blocking calls by performing the blocking call in a separate thread . I just did that for a recent homework assignment .
The thing with games is trying to figure out how objects are moving around and interacting while you're experiencing the lag. You can extrapolate movement a bit, but you have no idea what the other player is doing until you start receiving from that player again. And once you start receiving packets again, now you have to sync the game.
Most games take one of two paths: Either they simply "pause" the game when packets aren't received (just let it block), or if the game is using client/server the server decides what the current state of the game is (sync with the server when packets start flowing again).
Java 7 is going to have a new concurrency setup - Fork/Join - but it's basically just a new API. Should be useful for splitting algorithms over cores in the most efficient way, but doesn't look like it adds anything for components. Actually I think we might get access to it before Java 7, and it should probably be backportable.
http://www.ibm.com/developerworks/java/library/j-jtp11137.html
I don't know where C# are going to go with their concurrency. At present, it's only superficially different from Java's Thread.start() stuff in my opinion, so they have the same problem. Edit: I should add that C# does have a better delegate/event system already, which probably helps.
But a more modern language built with concurrency in mind is probably the way to go. I'm thinking I might try Scala.
This dataflow model is stressing out the PCIX bus. Before, you were sending data in real-time to the video card; now you are depending on two-way traffic, and you need a handshake. Plus, a lot of the high bandwidth on a PCIX line assumes pipelineable data; this would now no longer be true. Assuming the bus is the bottleneck--and it probably is--that is worse than where we started. OTOH, if the hardware guys integrate a GPU onto the SOC, we may be onto something. That would be a very innovative and interesting problem to solve--that involves 4 cores and 2 GPU's on a single SOC, sharing a single L1/L2 cache and with some pretty interesting bus arbitration logic. Not an easy engineering problem.
Right, I understand. But the problem is, for a lot of online games the network calls *SHOULD* be blocking. If I swing my sword at a guy, I need to know where the other guy is first. So threading it would actually be the wrong thing to do. A good rule of thumb is, if you can use a UDP protocol, thread it. If you have to have a socket, don't.
That looks interesting, and *NIX based OSes have had fork and join for a while. I'm mostly familiar with Java's current implementation of threads, as I've been using Java at my college.
I've heard of it, and I'd like to learn it sometime.
Whoah, whoah - let's not go crazy and re-invent TCP/IP here. We don't need a connection oriented protocol if we're just sending stuff across a bus. AFAIK, buses don't have the reliability issues that the Internet has, and there's no "firewall" on a bus (you can send data in both directions without impedement), so a connection oriented protocol would be pretty much overkill and a waste of bus resources.
If we're transferring information to/from a video card that contains the ability to perform physics calculations and is being used for gaming, I think it's pretty safe to say it's running on a PCIE bus.
Depends. Some game designers don't like freezing the entire game just because some packets have been lost. They usually want to show *something* and allow some interactivity until the client can communicate with the server again. At the very least, the UI should remain responsive so the player can select a different server or quit if he or she is not willing to wait.
Some MMORPGs I've played allow the player to keep playing for a period of time while the client isn't really communicating with the server. It queues up the actions, and when it can communicate with the server again, the server decides whether it can replay the queued actions it receives from the client, or whether it just tosses them out and forces the client to reset its state to whatever the server's current state is.
In addition, I've seen some games take a "hybrid" approach, where performing actions that don't normally affect the world or other players take place without blocking, but important events (such as dying or killing somebody) block.
Any word on the core usage of the newest Supreme Commander/Demigod engine? I know a dual-core machine is ideal with this technology - if I may diverge momentarily with this query I would be interested to know if GPG/Ironclad have begun tapping a 3rd or 4th core for upcoming products. I`ve got a Q9650 myself. Otherwise continue on folks...
Are you familiar with pipelining? What enables PCIX and makes other board-level buses so fast is that they pipeline. The problem is, you can't pipeline the way you're proposing doing it, because you have dependencies. Historically with a video card, you just throw a bunch of data at the video card and it does its thing. No dependencies in that. But now you're talking throwing objects at a video card, it does its thing, then it sends back physical data, from which you derive more object information. That has dependencies all over the place. You just saturated the bus. You're better off just taking advantage of the MMX/SIMD instructions on the core.
Also, you DO need protocols for shared buses. There are shared buses even within a single core, and lots of engineering time is spent designing/verifying it. Though they're more transaction protocols than connection.
It's what you do with old PCI, PCI-X, and AGP buses that support half duplex. You send data in large bursts so you're not spending a lot of time with only a few bits of data in a one way pipe.
PCI-E, on the other hand, is full duplex.
. . . and I'm not proposing it. nVidia is doing it already. Their latest upgrade installed PhysX support in their drivers for their DirectX 10 video cards, and they have a whole section of their website dedicated to CUDA.
You're joking, right? My video card has 64 stream processors, and their high end cards have 200+ stream processors. Double or triple that if you're doing SLI. Anything that can be easily be split into parallel tasks will get a boost no CPU can currently match, even with SIMD and even accounting for bus issues.
nVidia has a demo demonstrating their physics as well - a fluid demo that simulates 60000+ particles simultaneously. Something like that can certainly be split into many parallel tasks that can run on their streaming processors.
. . . which the hardware and OS do for you, as far as I know. PCI-E already defines those protocols including the transaction layer, so the person designing the game doesn't have to worry about it.
No, that is not pipelining. Pipelining has nothing to do with half-duplex or full-duplex. Or at least it does, in the sense that if your full duplex protocol has data dependencies on both sides of the bus, that will seriously degrade the pipeline's performance. Which is exactly the problem I'm describing.
The hardware does it. Who does it doesn't matter; what matters is that it's being done. You can't pipeline instructions with data dependencies between them. Well, you CAN...at the software level. At least, it'll work. But the hardware serializes the instructions, which defeats the entire purpose of pipelining.
No, NOT accounting for bus issues. The bus is the problem.
But look what you're doing: running the whole task on the GPU. Maybe a few interrupts here and there from the UI. You're not splitting the task across a CPU core and a GPU and sharing multiple data dependencies across the bus. That's where the problem is.
There are many great features available to you once you register, including:
Sign in or Create Account