Dev Diary: Synchronization Woes!
V1.12 Sync Test: http://www.ironcladgames.com/sins/Sins112SyncTest.zip
Hello all,
The goal this past weekend was to seek and destroy the elusive desync problem. It's disappointing that its showing up again in higher frequency, no doubt due to the increase in numbers of people joining Sins multiplayer after the 1.1 release. Unfortunately, we didn't get consistant desyncs during the lengthy beta so we were never able to nail it down. I also really want it out of the way before Entrenchment is released. It's multiplayer component is a lot of fun so I don't want it blemished by sync issues.
For the longest time I was sure it was some mod related problem. We hadn't personally seen a desync in over 6 months, however when the reports started coming in again after 1.1 went live I decided to dedicate the entire weekend to doing nothing but tracking it down. All of Friday, Saturday and most of Sunday I played with a lot of people who were all as dedicated as I was to eradicating the beast. There were theories to test, combinations to try, players to track down - particularly those who reportedly could produce desyncs consistently, IRC debates, logs to submit and analyze and much more.
Not one problem occurred all of Friday and Saturday but then Sunday afternoon I got word through the grapevine that a mysterious player named "Krunk" was currently the desync king. So a bunch of Sins fans on ICO hunted him down for me and we set up a game. Sure, enough - no more than a few seconds into the game - he desynced. I couldn't believe it when the red desync text materialized and proceeded to burn my eyes. Well, I guess there really is a desync problem.
Sh!t.
What is a Sync Bug?In the world of developing RTS games, there is nothing more painful than a sync bug. They consume vast amounts of time to track down and much of the engine design is focused around methodology to prevent them. Just because I think it’s interesting I'm going to explain what they are and how they are typically caused.
Unlike MMO's, FPS's or many other multiplayer games, in an RTS there is no master server who has a final say in the current state of the game. You may notice in something like World of Warcraft that your avatar suddenly gets his position corrected. What is happening there is your local simulation of the WoW universe has the character moving a certain way, but then the master server says that is incorrect and tells him to reposition himself. You were out of sync with the boss and the boss set you in your place. For an RTS game its far too impractical (in many different respects) to have a master server checking over everybody so we rely on determinism to make sure every stays in sync. Determinism is about making sure everything happens in the exact same way. If you can guarantee everything happens in the same way, you don't need a master server telling everyone what the results are and you don't have to send much information between the player's computers.
Here is a simple example from Sins: if an AI player on my machine randomly decides to attack player X, then I need to trust that the AI on your machine will also randomly decide to attack Player X. There is no communication between our two computers, the synchronization is implicit in the math, logic, and structure. If something goes wrong with this, the AI on your machine may decide to attack someone else. From this point onward our universes diverge and we see completely different results, or in the worst case our games crash. Sometimes the divergence starts so small and grows so slowly that we don't notice for a very long time, if at all. Regardless any form of divergeance is a desync, or a sync bug.Sync Bug Causes
So what can cause the divergence? Why are my ships in a different position than yours? Why did the AI make a difference decision on my machine than yours? Here are a few examples straight from Sins:
1. First, we might be using different CPU's. Different architectures can generate slightly different results, particularly between brands (say Intel vs AMD). This is usually part of the Floating Point Unit so one of the first things we do is make sure we synchronize the FPU's on everyone's machines using a special command. You may have remembered a sync bug shortly after we released the ability to load up mods much earlier in the year. This was caused by the mod setup codepath bypassing the FPU Control call the regular setup of the game used. From that point onward, there is a small chance your mathematical calculations are going to give slightly different results than mine, which usually shows up first as a miscalculation in the orientation or position of a ship since there is a lot of floating point math going on there (particularly with the matrix multiplies required for rotation).
2. Next, we might call non-deterministic operations in a deterministic code block. An RTS engine is really broken up into two separate parts: the Simulation and the Presentation. The Simulation is what is actually happening (AI, physics, gameplay etc) and is the part that has to be deterministic and in sync. The Presentation (rendering, particle systems, etc) is what you see and it doesn’t' have to be in sync or be deterministic. Typically, the Presentation is a custom interpretation of the simulation (e.g it looks at the simulation and decides how best to show you that information based on how powerful your computer is). For example, the simulation says that one of your ships blew up. The Presentation then realizes you have a very powerful graphics card so it decides to render a ton of particle effects to make the explosion look pretty. On an older graphics card it may decide to show a boring white blob grow and shrink. Now to make the nice pretty explosion the Presentation may make many calls to a random number generator to spew various fireball images in random directions while the crappy white explosion didn't make any calls to the random number generator. It's important to note here that random number generators aren't really random, they just spit out numbers that look to humans to be random but really they are numbers that follow a very predictable pattern and order. The generator "remembers" where it was the last it was called so that each successive call doesn't generate the same starting pattern over and over again. But what happens if the AI decides to use that same random number generator to randomly decide which player to attack? Because my pretty explosion made many calls to it and your crappy explosion didn't make any, our random number generators will generate different results because they were left at different positions in the sequence of numbers. Your random numbers are behind mine in the sequence. In order to solve this problem we have to use two separate random number generators - the Deterministic Generator and the Non-Deterministic Generator. Everything in the Presentation calls the ND-Generator and everything in the Simulation calls the D-Generator. One of the early sync bugs in Sins was caused by the autocast code on the Novalith cannon calling the ND Generator when trying to decide which enemy planet to fire at. This of course would give completely different results on every machine. It took a long time to find this one because A, it takes a long time to tech up to the Novalith and B. not many people use the Novalith's autocast.
3. Finally, desyncs can be caused by bad state initialization. One of the key ideas to determinism is that given the same initial conditions, a series of operations on the state of the system will generate the same result on any machine. Naturally, if the initial conditions are different you are screwed from the get go. When programming RTS games its very important that when you create various objects (ships, buildings etc) that they always have the same state from the start of the game. To be honest we rarely release code to the public that has bad state initialization because this type of desync is pretty easy to detect as soon as the game starts (as opposed to the other kind which take a long time to occur) and our testers don't have to play for hours upon hours to see if one exists. However, there are some special cases where bad state initialization sneaks in and is very difficult, highly improbable, if not near impossible to detect. In a sense, bad state initialization is both the easiest and most difficult type of desync to find. It turns out the sync bug I was tracking on the weekend was one such bug, has existed since last spring, and until Sunday afternoon I swore it didn't exist. Here is what caused it and why it was so elusive:
The Failing Market:
The market system has two special state variables called "stateStartTime" and "stateEndTime" that control the time interval of various market states (e.g Metal Boom, Crystal Crash etc). These values were not initialized properly. As I said above this is typically caught pretty quickly but this case falls under the very improbable. If two players start their first multiplayer game, both their market state variables will be incorrectly initialized in the same way, so even though they are incorrect, at least they are in sync. Every time they end a game those two values are left in whatever state they were in for the start of the next game. But even then, those two players can continue playing all day long with each other without any sync problems. So suppose they decide to play against someone else. That new player's market values are not screwed up in the same way theirs is. Normally, in this case they would go out of sync right away and we wouldn't have much of a problem. Easy find, easy fix. Nope, not in this case. How is this possible? It’s clear that everyone's market values are completely different but they stay in sync? As I said to myself on Sunday, "wtf!!!???!!!"
The reason they stay in sync is because the market simulation code that uses those particular state values is very rarely executed. It takes a very particular set of conditions to cause the market to enter the state that will use these values to determine the evolution of the market.
You need the following conditions to get this sync bug to occur:
1. One player who already played a game of Sins.
2. His original game must have entered one of a few, very rare market states.
3. This player must play someone who he hasn't already played a game with.
4. He must not have restarted Sins.
5. Their new game must also enter the same, very rare market state.
So on Sunday, I finally got to play against two players (Krunk and ZanZ) that met these rare conditions. After desyncing with them, they sent me their sync logs and I was able to compare them against my own to determine that their market state variables differed and using that information I could trace back what caused the divergence. The process of detecting a divergence and tracing its causes is also a very interesting topic so maybe I'll do a write up on that if there is some interest.
Before we officially release 1.12 I'd like to have the fix tested with a lot more people. You can grab a special 1.12 build at http://www.ironcladgames.com/sins/Sins112SyncTest.rar or http://www.ironcladgames.com/sins/Sins112SyncTest.zip if you want to give it a shot. It also fixes the buff stacking issue. Just extract the exe into your Sins install folder and run it. You will only be able to play against people also using this build so don't overwrite your 1.11 exe if you want to jump back and forth between versions. Also keep in mind that this new exe will need to be allowed through your firewall.
It's also possible there is another sync bug out there but I doubt it very much given that all the logs on the weekend pointed to the same cause. But just in case I won't be making any "monkey's uncle" claims like I did before
Blair
Special thanks to everyone who participated in tracking this down, especially:
(in no particular order)
AnnatarRaknorHowDidYouDoThat (HowThe?)KrunkZanZCoolJetsCykurSting and ofcourse SpaceFish for his sync snapshot code.
If you need an app to extract rar files, you can use winrar or winace. I think both are free to use.
Aaaah! That would have been such fun!
I've added a zip file in the original post as well for those of you having trouble with the rar file.
fyi
i use this for zip's and rar's. got to love open source.
http://www.7-zip.org/
I like a program called Extract Now.
Great that this has been found.
I play mostly LAN multiplayer games - and we have not had a SINGLE game successfully completed since the patch. Every single game diverges - they are still linked in the market and in the Pirate screens to some extent, but otherwise you are basically playing in solitary mode. Had one game last for about 2hrs before divergence - best we have seen so far.
so, just a quick "time to make sure" when 1.12 comes out, will it be a hotfix like 1.11 where we don't need to re-code our mods? thanks
Yes, its just an exe change. Mods and save games are safe.
If someone has problems with aechives, get http://www.7-zip.org/download.html- it is open sourse and i use it. It is free.
I think i know one desink problem in 1.1.. it does not actually say "desink" but what i found out is me and other guy had completly diffent games going on. I need to encounter it in the new version of sins first, to make sure it's not some old and solved problem.
I whant to ask- is it possible to choose to copy all the data (where ships are, what they are doing and such) from a chosen computer, so you don't have to restart game? i mean a fix according to information on one of the computers?
Holy mother of... thanks for fixing this. Seriously.
Good job blair. It's awesome to know this game is being continuously worked on. This thread is really, really a very good idea.Write me a PM if you ever head to Poland, you've got a beer ready for ya .
my experience is that Vasari Ra does not work nor Advent Cap automatic upgrades to ten or TEC Nov... gun. After months of waiting Nobody debugged 1.1 version. iS THIS A FEATURE?
wth are u talking about? RA is fine. What Advent automatic upgrades? And what in the world is wrong with novalith??
used 1.12 today... and got syncerror too after start of the game. we were 4 players...
Yes i was also in the game of Whity2XLC and i have 5 error Logs if someone is interested.
Seems to be a desync if someone gives build orders and other things when the some players are still loading?
Blair, we just had a game with v1.12 and it had a desync. JohnJames is sending you the logs via e-mail. The person it desynced with is shalom_don.(de) Please hunt this guy down. We couldn't get him to hang around long enough to explain to him that you need his files.
My brother and I had an interesting sync issue that is worth noting. Basically we started a game yesterday, saved it and picked it back up today. Well, I discovered him during our session today, but he a single planet and doing nothing. According to him he had 10 planets. So we allied and he only saw that I had one planet that was likewise doing nothing when on my end I had a dozen or so planets. We then compared planets and he had some that I had and vice versa. What is really odd though is that many of our coinciding planets had been captured yesterday. So basically we played out of sync for hours yesterday and an hour or so today out of sync without an error, warning or anything.
The chat system worked perfectly. The metal black market was in sync, but the crystal and bounties were not. He would see 600 for crystal where I would see 480 for purchase. We were different factions, but I dont think that matters for the BM. He would see orange with the highest bounty where I would see blue and he would see blue as no bounty where I would see orange with a low bounty. Really messy stuff.
Anyway, I thought that was worth noting if it helps identify and fix the problems.
he posted above your post, dude
Did I mention I was legally blind? Shalom, send your whole checksum folder to Blair.
Pretty screenshot...interesting article...
P.S. Can't wait for beta...any hint as to when it will show up Mr. Fraser?
Jugding by the last few posts - Blair and Craig have a truckload of sync work to do before they start working on the beta again.
err i read the desync post all in the top, just, why is the market the cuase?
i remeber having a desync on 1.1 on a 1 v 1 map (sadly i do not have a replay becuase short after (1 week???) my comp decided it was tired and enterd a state of hybernation, never to awaken again
anyways, all was reinstalled afterwards, without ofcourse the replays i've had b4 it crashed
about this desync
- happend in 1.096
-1 v 1 map
-i played as Vasari
-oponent played as Advent
-on my vieuw oponent didnt do anything , not expanding, nothing....
-i noticed a Advent trade port on one of my planets (i was vasari) (it was also under my control)
-i didnt build any tradeports myself yet
-i got confused, and asked my oponenet how he was doing and about my discovery of an advent trade port on my planet
-he got confused, explaining me that not me, but he was the one controlling that particular planet
-i started to suspect a bug, and to try and confirm this i scuttled the Advent trade port
-when scuttler% reached 100% Desync
in the lobby we talked a little, he wasnt very good at english however.... so i never was able to explain him why it happend
now i dont see the market being a problem in this one... and i havent encounterd or heard of something like this anytime before...
issue #2 probably nothing do do with a desync, and im still unsure whether this has been fixed or even reported / seen...
to try and keep it short, i enterd a 3 v 3 match. one of my allies had around 6 capital ships at 7 min into the game
i asked for some metal , i received 2500 (!?)
i have a replay of this game, tough after watching it, he didnt have 6 capital ship after 7 min, (instead he had 1)
replay does however show him diving me 2500 metal, and the text log with us chatting about him having large amounts of capital ships (being confirmed by my other team mate)
If the topic continues on like this, Blair will simulate a suicide, change name and move into icecream industry .
as long as he'll give em free to us >.>
Ugh
There are many great features available to you once you register, including:
Sign in or Create Account