Introduction
Capture the Flag (CTF) contests are a staple of security conferences and BSides Las Vegas is no exception. However the Pros vs Joes (PvJ) CTF I help support there is a bit unique. Not only is it a blue vs blue CTF with red aggressor and gray user teams, but the game dynamics are a fundamental development point for the CTF team. (There's a lot more to it such as it's educational goal or that we allow blue teams to attack each other on the second day. You can read more about it at http://prosversusjoes.net/.)
Game Dynamics
When we say 'game dynamics', we mean a couple of things. First we mean what's scored and how much. In our case that is currently four things:
- hosts (score given to teams for maintaining service availability)
- beacons (score deducted when the red team signals a host is compromised)
- flags (score deducted when the red team breaches specific files)
- tickets (score deducted when the gray team is not being appropriately supported)
At a more fundamental level though, we mean the scenario the CTF is meant to represent. As a blue team CTF, we try and simulate the real world. As such, starting last year, we began to transition our game model to simulate an economy. Score is not granted so much as transferred. For example, the gold team pays the gray team for accomplishing some task, then the gray team pays a portion of that score to the blue team for maintaining the services necessary to accomplish that task. Alternately, when the red team (or another blue team) installs a beacon, the score isn't lost, but instead transferred to the team that placed the beacon.
Beginning with last year, we have started to then simulate the way we expect the game to run. This year we have also captured detailed scoring logs. This blog is about our analysis of the score from this year's game and how it helps us plan for the future.
Simulation
The first thing we do is create a game narrative and scoring profile for the game. The profile is the servers that will come online, go offline, and how much they will be scored per (5 minute) round. It is picked to produce specific outcomes such as inflation (to decrease point value early in the game when teams are just getting going and to allow dynamism throughout the game).
We then try and build distributions of how likely servers will be to go offline, how likely beacons will be and how long they will last, and how many flags will be found. This year we used previous years simulations and logs as well as expert opinion to build the distributions. The distributions we used are below:
Based on this we ran Monte Carlo simulations to try and predict the outcome of the game.### Define distributions to sample from## Based on previous games/simulations and expert opinion# H&W outage distributionsdoutage_count <- distr::Norm(mean=8, sd = 8/3)doutage_length <- distr::Norm(mean=1, sd = 1/3)# flag distributionsdflags <- distr::Norm(mean=2, sd= 2/3) # model 0 to 4 flags lost with an average of 2# beacon distributionsgamma_shapes <- rriskDistributions::get.gamma.par(p=c(0.5, 0.7), c(0.75, 4)) # create a gamma distribution to draw number of tickets fromdbeacons_length <- distr::Gammad(shape=gamma_shapes['shape'], scale=1/gamma_shapes['rate']) # in hoursdbeacon_count <- distr::Norm((4-3)/2+3, (4-3)/3)
First, we analyzed the expected overall score.
Finally we wanted to look at the distributions of potential final scores and the contributions from the individual scoring types
The Game
And then we run the game.
The short answer is, it's VERY different. We had technical issues that prevented starting the game on time. We were not able to complete some development that prevented automatic platform deployment, some hosts were not available, and some user simulation was also not available. This is not a critique of the development team who did a crazy-awesome job both rebuilding the infrastructure for this game in the months leading up to it as well as dynamically deploying hosts during the game. It's just reality. The scoring profile was built for everything we want. I am pleased with how much of it we got on game day.
The Scoreboard
The Final Scoreboard |
You can find the final scoreboard and scores here. It gives you an idea of what the game looked like at the end of the game, but doesn't tell you a lot about how we got there. I'm personally more interested in the journey than the destination so that I can support improving the game narrative and scoring profile for the next game.
Scores Over Time
The first question is how did the scores progress over time? (You'll have to forgive the timestamps as they are still in UTC I believe.) What we hoped for was relatively slow scoring the first two hours of the game. This allows teams the opportunity to make up ground later. We also do not want teams to follow a smooth line or curve. A smooth line or curve would mean very little was happening. Sudden jumps up and down, peaks and valleys, mean the game is dynamic.
What we see is a relatively slow beginning game. This is due to beacons initially being scored below the scoring profile and one of three highly-scored puzzle servers being mistakenly scored lower from it's start late in day 1 until it was corrected at the beginning of day 2.
We do see an amount of trading back and forth. ForkBomb (as an aside, I know they wanted the _actual_ fork bomb code for their name, but for this analysis text is easier) takes an early lead while Knights suffer some substantial losses (relative to the current score). Day two scores take off. The teams are relatively together through the first half of day 2, however, Arcanum takes off mid-day and doesn't look back.
The biggest difference is that when teams started to have several beacons, as part of their remediation they tended to suffer self-inflicted downtime. This caused a compound loss of score (the loss of the host scoring they would have had plus the cost of the beacons). We did not account for this duplication in our modeling, but plan to in the future.
Ultimately I take this to mean scoring worked as we wanted it to. The game was competitive throughout and the teams that performed were rewarded for it.
It does leave the question of what contributed to the score...
Individual Score Contributions
What we find is roughly what we expected but not quite. The rate of host contribution on day two is more profound than expected for both Paisley and Arcanum suggesting the second day services may have been scored slightly high.
Also, no flags were captured. However, we do have tickets which were used by the gold team to incentivize the blue teams to meet the needs of the gray team.
The biggest difference is in beacons. We see several interesting things. First, for a period on day two, Knights employed a novel (if ultimately overruled) method for preventing beacons. We see that in the level beacon score for an hour or two. We also see a shorter level score in beacons later on when the red team employed another novel (if ultimately overruled) method that was significant enough that had to be rolled back. We also see how Arcanum benefited heavily from the day 2 rule allowing blue-on-blue aggression. Their beacon contribution actually goes UP (meaning they were gaining more score from beacons than they were losing) for a while. On the other side, Paisley suffers heavily from blue-on-blue aggression with significant beacon losses.
Ultimately this is good. We want players _playing_, especially on day 2. Next year we will try to better model the blue-on-blue action as well as find ways to incentivize flags and provided a more substantive and direct way for the gray team to motivate the blue team.
Before we move on, two final figures to look at. The first lets us see individual scoring events per team and over time. The second shows us the sum of beacon scores during each round. It gives an idea of the rate of change of score due to beacons and provides an interesting comparison between teams.
But there's more to consider such as the contributions of individual hosts and Beacons to score.
Hosts
The first thing we want to look at is how the individual servers influenced the scores. What we want to see is starting servers contributing relatively little by the late game, desktops contributing less, and puzzle servers contributing substantially once initiated. This is ultimately what we do see. (This was the analysis, done at the end of day 1, that allowed us to notice puzzle-3 scoring substantially lower than it should. We can see it's uptick on day 2 as we correct it's scoring.)
More interesting are the differences in scoring for servers such as Redis, Gitlab, and Puzzle-1. This suggests maybe these servers are harder to defend as they provided score differentiation. Also, we notice teams strategically disabling their domain controller. This suggests the domain controller should be worth more to disinsentivize this approach.
Beacons
For the purpose of this analysis, we consider a beacon new if it misses two scoring rounds (is not scored for 10 minutes).
First it's nice to look at the beacons over time. (Note that beacons are restarted between day 1 and day 2 during analysis. This doesn't affect scoring.) I like this visualization as it really helps show both the volume and the length of beacons and how they varied by team. You can also clearly see the breaks in beacons on day two that are discussed above.
The beacon data is especially helpful for building distributions for future games. First we want to know how many beacons each team had:
Day 1:
- Arcanum - 17
- ForkBomb - 24
- Knights - 18
- Paisley - 21
Day 2:
We also want to know how long the beacons last. The aggregate distribution isn't particular useful. However the distributions broken out by teams are interesting. They show substantial differences between teams. Arcanum had few beacons, but they lasted a long time. Paisley had very few long beacons (possibly due to self-inflicted downtime). Rather than be a power law distribution, the beacons are actually relatively even with specific peaks. (This is very different from what we simulated.)Conclusion
In conclusion, the take-away is certainly not how any given team did. As the movie "Any Given Sunday" implied, sometimes you win, sometimes you lose. What is truly interesting is both our ability to attempt to predict how the game will go as well as our ability to then review afterwards what actually happened in the game.
Hopefully if this blog communicates anything, it's that the scoreboard at the end simply doesn't tell the whole story and that there's still a lot to learn!
Future Work
This blog is about scoring from the 2018 BSides Las Vegas PvJ CTF so doesn't go into much detail about the game itself. There's a lot to learn on the PvJ website. we are also in the process of streamlining the game while making the game more dynamic. As mentioned above, the process started in 2017 and will continue for at least another year or two. Last year we added a store so teams can spend their score. We also started treating score as a currency rather than a counter.
This year we added additional servers coming on and off line at various times as well as began the process of updating the gray team's role by allowing them to play a puzzle challenge hosted on the blue team servers.
In the next few years we will refine score flow, update the gray team's ability to seek compensation from the gray team for poor performance, and additional methods to maximize blue team's flexibility in play while minimizing their requirements. Look forward to future posts as we get the details ironed out!
very useful information shared by you.I would appreciate if you would share more information about other courses.
ReplyDeleteadvanced excel training center in delhi
advanced excel training center in delhi
advanced excel training center in delhi
Thanks for sharing this information. FreeJobAlert is one of the best website to get Latest Govt Jobs Information and to know more about Rojgar Nirman visit official website.
ReplyDeleteI liked Your Article.
nice and wonderful article . i would love to visit again again your website share more
ReplyDeleteAi & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
This blog on information security analytics is instructive and educative. Thanks for sharing this blog. Web Designing Course Training in Chennai | Web Designing Course Training in annanagar | Web Designing Course Training in omr | Web Designing Course Training in porur | Web Designing Course Training in tambaram | Web Designing Course Training in velachery
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteNice read ! I would appreciate if you would share more information about other courses. aws sysops certification
ReplyDeleteGreat analysis of the 2018 Pros vs. Joes CTF! For those looking to level up their game, check out tigerexch247 for valuable resources and tools that can enhance your competitive edge. Keep up the awesome work!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteLOGO DESIGN USA, a team of expert logo designers in the USA, goes beyond creating images. We craft powerful visual stories for your brand. A beautiful logo is more than just an aesthetic choice; it's the foundation of your brand identity.
ReplyDelete