Nintendo Console Dashboard

If it weren’t apparent enough from my Splatoon 2 analyses, I love Nintendo. So, of all the things one can make a Dashboard about, of course I went ahead and created a Dashboard using data about Nintendo’s array of home consoles.

This was done using Excel – but I’m really interested in working on my Python coding and implementing this into a stand-alone embeddable tool.

If you want to have a poke around yourself and see how it all works, you can download the Dashboard here.


Excel Examples

HOW TO… Excel: A Truly Non-Volatile Solution for Dynamic Data Tables

This is a follow-up to the last tutorial. Thanks to reddit user /u/small_trunks for noticing my blunder!

Unfortunately, ‘SUMIFS’ couldn’t be used on its own to create the dynamic data tables. But don’t fret! I have another solution up my sleeve that doesn’t rely on volatile functions like ‘INDIRECT’ and ‘OFFSET’! It relies entirely on ‘INDEX’ and ‘MATCH’.

We’ll need to add an additional list next to our measure names and back-sheet names lists on our output sheet, which I’ve named the ‘Master Sheet Start Point’ column. In this column we can use the ‘MATCH’ function to work out the first iteration of each of the measure names in our backsheet. The formula in this column is:


Then, below our two cells that work out which sheet name and measure name (H16 and H17) is corresponding to what has been selected by the combo box we can add another cell that will find the starting point of the selected measure in the master back-sheet. The formula will therefore be:


We need to include the ‘-1’ as we’ll also be using ‘MATCH’ in the main formula for finding the row number of each organisation so we don’t want to ‘double count’.


The formula we’ll be using in the dynamic data table will be an ‘INDEX’, ‘MATCH’ formula as follows:

=INDEX(MasterBacksheet!$D$2:$AA$31,MATCH(‘Output Sheet’!$B4,MasterBacksheet!$B$2:$B$11,0)+’Output Sheet’!$H$18,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0))

Breaking this down gives us:

  1. =INDEX(Full Array of Master Back-sheet Data,
  2. MATCH(Org name, Org Name list in Master Back-sheet, Look for an exact match) + Measure’s Master Sheet Starting Point,
  3. MATCH(Date, Date Headings in Master Back-sheet, Look for an exact match))

And then this formula can be copied across the entire data table! With that we’ve solved the issue, we have a non-volatile method of creating a dynamic data table.


Once again if anyone has any other methods of solving this solution please share in the comment section below – I’d love to see what creative methods the community can come up with!

The workbook can be downloaded here.

How to How to... Excel

HOW TO… Excel: Using SUMIFS to Create a Dynamic Data Table

This is a continuation of the previous tutorial.

Due to the volatility of the INDIRECT function, it may be necessary to avoid using it when trying to create large dashboards which will be handling and outputting large amounts of data. So, we’ll need an alternative in these cases. One of the ways this can be done is by using the SUMIFS function, and structuring our data to be compatible with the SUMIFS function.

This has been achieved using SUMIFS, but OFFSET (another volatile function) has been used to cover one of SUMIFS weaknesses. A completely non-volatile method can be found here.

This will be done on the worksheet from the previous tutorial. A download link is available at the bottom of that page so you can follow along.

Step 1.

First, we’ll need to create a new ‘master’ data back-sheet that compiles all the data back-sheets to allow SUMIFS to work correctly.


A new column called ‘Measure’ has been added to the old structure which contains the measure names, and the data has been pulled from the other back-sheets by simple cell references.

Step 2.

Now we need to change the formula in the output sheet’s data table to use SUMIFS. But before that, we’ll need add a new INDEX that checks the Measure Name selected by the combo box.  H16 on the output sheet can be copied down to H17 and then have its INDEX array changed to the measure name list as opposed to the back-sheet name list so the new formula in H17 is:


We can now change the dynamic data table’s formula. The formula used in the top left (C4) of the table is as follows:

=SUMIFS(MasterBacksheet!D$2:D$31,MasterBacksheet!$C$2:$C$31,’Output Sheet’!$H$17,MasterBacksheet!$B$2:$B$31,’Output Sheet’!$B4)

If we break this down we get:

  1. =SUMIFS(SUM_Range , Criteria_Range1 , Criteria1, Criteria_Range2, Criteria 2)
  2. =SUMIFS(Apr-15 Column (frozen to row numbers using absolute referencing, $), Measure Column, Selected Measure, Org Name Column, Org Name (frozen to column number using absolute referencing, $)

A weakness of this formula as it stands is the way we’ve referenced the SUM range, using direct references. To allow this formula to be more flexible we can instead use another function to work out the SUM range instead of directly referencing it. One way to do this is to use the OFFSET function. The way to construct the OFFSET function within this formula to correctly determine the SUM range dynamically is as follows:

As pointed out by reddit user /u/small_trunks, OFFSET is also volatile so isn’t a good alternative to the ‘Indirect’ function! This is still a valid method to create the dynamic data table if volatility is not an issue, but a better method can be found here – although unfortunately ‘SUMIFS’ hasn’t been used in this instance.

UPDATE: Reddit user /u/ubbm has come to the rescue! The example below with OFFSET can be replaced with an INDEX function. INDEX can be used in a very similar way to OFFSET; if you specify an array and then only choose a row number or column number, the output will be the entire row or column within the array. So for this example the following can be used instead of OFFSET:

=INDEX(MasterBacksheet!$D$2:$AA$31,,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0))

See the full comment for a break down of the logic behind this method of using INDEX.

=OFFSET(MasterBacksheet!$C$2:$C$31,,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0))

The structure of the OFFSET function is:

  1. =OFFSET(Reference Range, Number of Rows to Move from Reference, Number of Columns to Move from Reference, [OPTIONAL] Height of the Reference, [OPTIONAL] Width of the Reference)
  2. So, our ‘OFFSET’ function is: = OFFSET(Measure Name Column (1 column before the date columns), Move 0 Rows, Find ‘Apr-15’ in the date headings and move that many columns across). We start one column before the date headings as ‘MATCH’ returns 1 if it finds a match in the first cell of the range.

So our full formula is:

=SUMIFS(OFFSET(MasterBacksheet!$C$2:$C$31,,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0)),MasterBacksheet!$C$2:$C$31,’Output Sheet’!$H$17,MasterBacksheet!$B$2:$B$31,’Output Sheet’!$B4)

This formula can then be copied across the entire data table.


And there we have it! This of course is just one way of going about solving this issue, and part of the beauty of Excel is the ability to bend it to your will when you know all the different functions and tricks of the trade. If you have any other ways in which you would go about creating a dynamic data table, you can share it in the comment section below. Any feedback on my methodology is also welcome!

You can download the finished workbook here.

How to How to... Excel

HOW TO… Excel: Create a Dynamic Data Table Using INDIRECT and Combo Boxes


Creating a full and cohesive user experience in an Excel based dashboard can be really difficult if you don’t know where to start. Often, the easiest way to approach the situation will be to utilise Pivot Tables and Pivot Charts, which come with useful tools like slicers to dynamically filter the data with a touch of a button. I have a few pet peeves with Pivots however, mainly being that they require data be in a particular layout; that they can be restrictive on the type and layout of the output they produce (although they do provide a good level of options); and when updating or changing data within a spreadsheet Pivoted data does not refresh unless it’s explicitly told to (which can create a jarring experience when handling and presenting larger data sets).

So, how would you go about creating a dynamic data table and dashboard without using Pivot Tables? There are a seemingly massive number of ways to go about this task, but the way I find most versatile and that I’ve become comfortable using is to make use of Combo Boxes and different Excel functions like INDIRECT (covered in this tutorial), OFFSET, SUMIFS and INDEX.

In this tutorial I will show how to utilise combo boxes and the INDIRECT function to create a dynamic data table which can be used to better visualise your data and be used as a solution for a client or manager-facing reporting tool. This tutorial will be using dummy data to show a basic Key Performance Indicator dashboard with data from April 15 to March 17. The INDIRECT function will be used specifically to dynamically reference all of our backing sheets.

Step 1.

First we’ll need to create structured backing sheets which follow a consistent lay-out. This is essential to allow the INDIRECT function to consistently output the correct arrays for our data.


The above example shows my preferred structure:

The top-left of the sheet includes a quick description of where we got the data from and how up-to-date our data is. Including this is general best practice for maintaining data back-sheets to allow other people to understand the data’s history and for us to better quality assure our data. All three of the back-sheets in this workbook will follow the same format, including the same organisation list and dates.

Step 2.

Now we’ll create the output sheet that will hold the dynamic data table. In this example I’ve copied the same structure used in the back-sheets, with the same organisations and all the dates. It would be possible to pick and choose dates and organisations for this output sheet, as the formula used will be able to deal with any combination of the 2 variables.


Step 3.

We can now create the combo box and the lists of measures and sheet names that feed the combo box. To create a combo box, go to the developer tab of the ribbon

(If the developer tab of the ribbon isn’t visible for you, go to:
File -> Options -> Customise Ribbon
then on the list on the right hand side of this tab tick the selection box for the developer tab).

From the developer tab, select Insert and under ‘Form Controls’ select the combo box option. You should then be able to draw a combo box on the spreadsheet.


Now we will create 2 lists: one of measure names and one corresponding to each of our back-sheets. In this example we have 3 measures:

  1. ‘Average Time Taken’;
  2. ‘Total Throughput’; and
  3. ‘Number of Staff’

Corresponding to our back-sheets:

  1. ‘AverageTimeData’;
  2. ‘TotalThroughputData’; and
  3. ‘StaffData’

I’ve created the lists next to one another below the data table as below:


Step 4.

Now we need to link the combo box we created earlier to the measure list. Right click the combo box and select ‘Format Control’. For the ‘Input Range’ select the measure list; in this example it’s A16:A18. For the ‘Cell Link’ select an empty cell nearby; in this example the cell chosen was G16. Hit OK, the combo box should now allow you to select the measures in the measure list, with the linked cell showing the number corresponding to the measure’s position in the list. In the cell beside the linked cell we will now use an INDEX function to work out the sheet name from the list created beside the measure names.


INDEX looks at an array and works out the position of a cell based on a given row and column number. As the linked cell gives the position of the measure in the measure list (thus acting as the row number), it will also return the corresponding back-sheet if we use it in an INDEX function.

Step 5.

Now we can construct the INDIRECT function; in this example it will be done within a named range, but it’s also possible to do this within a cell’s formula. Using named ranges just cleans up the final formula and makes it easier to parse for other users. To create a new named range go to the ‘Formulas’ tab of the ribbon and select the ‘Define Name’ option. Give the named range an appropriate name like ‘SelectedData’, and insert the INDIRECT function. In this example the formula will be:

=INDIRECT("'"&'Output Sheet'!$H$16&"'!$C$7:$Z$16")

The purpose of the INDIRECT function is to allow excel to interpret text as an array or cell. Breaking down this formula gives us the following:

  2. =INDIRECT(“ ‘ “& CELL WITH SHEET NAME &” ‘! DATA TABLE’S ARRAY”) this is then interpreted by excel as:
Note: The data table’s array does not include the column headings or the row headings i.e. the dates or the organisation names – it starts at the actual data.


Step 6.

We can now construct the formula used in the dynamic data table. We’ll be using an INDEX function, in conjunction with our ‘SelectedData’ named range and the MATCH function.

The formula in the top left of our dynamic data table is:


Breaking this down to each part gives us:

  1. =INDEX(SelectedData, this is the link to our named range, so our named range is the array we are ‘indexing’.
  2. MATCH($B4,$B$4:$B$13,0), this is where we work out the row number to look for. B4 is our first organisation name ‘Supplier A’ (its column is ‘frozen’ by using the ‘$’ before B), and the array B4:B13 is the full list or organisation names. The 0 shows we’re looking for an exact match. MATCH then returns the row number of ‘Supplier A’ in the list of organisation names; 1 in this case. The array (the list of organisation names) in this function is able to be the list on the output sheet as it’s the same as the list in the data back-sheets. If you were to create a custom list of organisations to show, you would need to check B4 against the full list of organisations used within the data back-sheets.
  3. MATCH(C$3,$C$3:$Z$3,0) this is where we work out the column number to look for. C3 is our first date ‘Apr-15’ (its row is ‘frozen’ by using the ‘$’ before C), and the array C3:Z3 is the full list of dates. The 0 again shows we’re looking for an exact match. MATCH then returns the column number of ‘Apr-15’ in the list of dates; 1 in this case. As mentioned in the previous point, the array (the list of dates) in this function is able to be the list on the output sheet as it’s the same as the list in the data back-sheets.

This formula can then be copy and pasted across the entire data table. This should then result in a table that can dynamically update like so:


Step 7.

Now that we have a dynamically updating data table we can add in dynamically updating conditional formatting! Alongside the measure and sheet lists we can include a boundary at which our green turns to red and vice versa. I’ve set these to 2, 200, and no boundary for ‘Average Time Taken’, ‘Total Throughput’ and ‘Number of Staff’ respectively. Next to these lists I’ve included a cell that contains the boundary for the selected measure. This is done in the same way as the sheet name cell, using the INDEX function. For this example the formula is:


This looks at the array of boundaries (C16:C18) and returns the one corresponding to the chosen measure.

Now you can highlight the data table, go to the ‘Home’ tab of the ribbon and select the ‘Conditional Formatting’ then ‘Highlight Cells Rules’ and ‘Greater Than…’.


Select the cell that contains the boundary for the selected measure; D16 in this case. Choose the ‘Light Red Fill with Dark Red Text’ option and hit OK. Do the same for ‘Less Than…’, selecting the same cell and this time choosing ‘Light Green Fill with Dark Green Text’. Sorted!

Step 8.

Now you may have realised upon changing the measures that these don’t make sense for our ‘Total Throughput’ measure and the ‘Number of Staff’ measure is incorrectly showing the conditional formatting. To combat this we’ll need to change the rules to an IF statement rather than a simple greater than or less than rule. To change the rules, go to the ‘Manage Rules’ section of the ‘Conditional Formatting’ menu. For the Red formatting, select ‘Edit Rule’ and change the rule to ‘Use a formula to determine which cells to format’. The formula used in this example is:


This checks the selected data back-sheet, and if it’s the back-sheet for the Total Throughput data: highlight the cells that are less than D16 (our cell containing the selected measure’s boundary) red; and if it’s not highlight the cells that are greater than D16 red. This means that the rule now correctly shows those organisations with low throughput as red, instead of green.

Now edit the green formatting in the same way, using the formula:


This checks the selected data back-sheet, and if it’s the back-sheet for the Total Throughput data: highlight the cells that are greater than or equal to D16 (our cell containing the selected measure’s boundary) green; and if it’s not highlight the cells that are less than or equal to D16 red. This means that the rule now correctly shows those organisations with high throughput as red, instead of green. By using the ‘greater/less than or equal to’ inequality we have now also included the boundary’s number within the formatting which is another error that needed to be corrected.

The final error that needs to be corrected is that ‘Number of Staff’ will still show the conditional formatting despite not having a boundary. To correct this add a new rule with ‘Use a formula to determine which cells to format’, and add the formula that will exclude those measures with no boundary – in this example we’ve used ‘‘ to denote no boundary in the list of boundaries. The formula used is then as follows:


Leave the format as ‘No Format Set’ and hit OK. Then tick the ‘Stop If True’ checkbox in the ‘Manage Rules’ options window for this rule.

The sheet is now complete from a functionality standpoint – I’ve made some general formatting changes to try and make the table clearer and easier to interpret. I’ve increased the row size of the table rows to 20, removed the gridlines on the sheet by going to ‘View’ on the ribbon and unchecking the ‘Gridlines’ box. I also added a title: ‘Select Measures’, above the combo box to help aid user experience. I’ve also used the ‘Freeze Panes’ setting in the ‘View’ tab at cell C2 to lock the organisation names and combo box in the view of the spreadsheet. I’ve hid rows 15-18 and added a filter to the column headings, by highlighting them all and then going to the ‘Home’ tab and selecting ‘Filter’ under the ‘Sort & Filter’ option on the right of the tab.

One thing to be aware of with this method is a weakness of the INDIRECT function. INDIRECT is known as a volatile function, this means whenever any change is made to the workbook it will cause any formulae with an INDIRECT function in it to recalculate. Other functions that share this property are the functions like RAND and RANDBETWEEN. This means incredibly large data tables which use INDIRECT can cause massive slow down when making changes to the workbook. A way to avoid the INDIRECT function would be to format the data back-sheets in an alternative way – with one master data back-sheet instead of multiple. I’ll cover an example of this in future tutorials!

So after following this method you should be able to create a dynamic data table perfect for Key Performance Indicator dashboards! It offers more flexibility (in my opinion) than Pivot Tables, and using combo boxes allows us to link multiple parts of a spreadsheet to a single dynamic cell to create a more cohesive user experience.

Thanks for reading, if you have any questions or feedback please do comment below!

You can access the workbook here.

How to How to... Excel

How Cryptocurrencies are Shaking Up the World of Big Data

Following the hubbub of Bitcoin reaching $10,000 (and shortly thereafter $11,000) per coin, the cryptocurrency world piqued my interest. After following a variety of subreddits regarding cryptocurrency, and doing a lot of reading around the subject I came across a really interesting new development – IOTA, the non-blockchain based cryptocurrency. What particularly interested me with the project was its attempt at monetising the data collected by the ever-expanding internet of things. It seems a new and potentially disruptive player in the world of big data has entered the world stage, with endorsements and partnerships from many bigger players in the industry like Microsoft and Fujitsu amongst others.

Utilising the ‘tangle’ as opposed to the traditional blockchain that most other cryptocurrencies use, the team behind IOTA have been able to create a system wherein there are no transaction fees and transactions are fast and get faster the more people utilise the network.

The Data Marketplace announced a few days ago feels like a stroke of genius and if it takes off is definitely going to be a massive game changer. The ability to easily and quickly purchase data from sensors around the world for small fees is in my opinion revolutionary and it’s incredibly exciting to think about the potential this idea has.

For more information (especially regarding the nitty gritty of the tangle) here are some links to look into:

All in all, this is an incredibly interesting development that I’m definitely going to be watching closely!

Data News

Splatfest Analysis (Part 3)

In this final Splatfest Analysis, we will be taking a closer look at the distributions of the 9 games I had with the Splattershot to determine whether its consistent place in the top 3 was due to me consistently playing well with the weapon, or whether the averages were skewed by particularly high leverage outliers.

The points scored with the Splattershot were relatively consistent around 800-900, but a particularly good game where I scored 1180 points has skewed the mean positively, but it wasn’t so dramatic a swing to cause the mean to drift far from the median; it was near where most of the points were clustered.

The number of kills I got each game with the Splattershot were also consistently around 5-6 kills each game. A slightly low game with 3 kills did cause some skew negatively to the mean, but again it stayed close to the median and where most points were clustered.

The number of deaths each game also showed very little variation, with the most frequent number of deaths being 0. It ranged from 0-4 with a fairly even spread, meaning no individual point had a particularly large influence on the mean of 1.67.

In conclusion, it does seem that I was consistently good with the Splattershot – meaning it was well deserving of its top 3 positions despite its lower sample size.

And with that this utterly riveting multi-part analysis comes to a close! My main take-aways from a data visualisation and analysis stand-point when carrying out these analyses were that:

  • Box and Whisker Plots in Excel 2016 are fickle and I’ll likely avoid them in future!
  • A more heavily annotated set of graphics may have been more suitable, instead of writing up everything in these report-style blog posts. This would allow the analyses to better live as separate entities from this blog, which is especially important when doing analyses of sensitive data to make sure the data can’t easily be divorced from the analysts’ findings and any caveats to the data.
  • Using Excel to do ‘proper’ data analysis with the AnalysisToolPak is very easy. Until starting this I was unaware Excel could do things like linear regression (almost) natively, usually I would move my data over to a more specialised analytical software package like SPSS (and I have a particularly large disdain for SPSS!).

Thanks to those of you who took the time to read my ramblings and please don’t be afraid to leave any feedback you have in the comments section.

Here is the Excel file I used to create all of the graphics for this analysis: Link.


Zipf’s Law

After watching a long session of ‘VSauce’ videos (great brain food videos, albeit very addictive!), I came across this video discussing ‘Zipf’s Law’. Zipf’s law states that in any corpus of natural language, the frequency of any word is inversely proportional to its rank in the frequency table. This Zipfian distribution applies to many different types of data studied across a variety of fields (the video discusses a large variety of these instances). Zipfian distributions also follow the ‘Pareto Principle’, the 80-20 rule. 80% of the words used in any corpus are only 20% of the unique words used.

After being utterly captivated by this phenomenon, I decided it would be fun to check whether my blog, being a corpus of natural language, followed Zipf’s law.

To my delight it does! I also thought it would be a nice opportunity for me to experiment with colour, utilising a dark midnight blue as the background instead of the usual white. Using the complimentary orange for the data points allows them to ‘pop’ from the background which I think produces a nice effect.

If you disagree or have any other non-traditional colour combinations for charts feel free to let me know in the comment section!

General Write-Up

Splatfest Analysis (Part 2)

Up next in this multi-part analysis is a look at my performance with each weapon I used during the Splatfest. I’ll take a closer look at my Win to Loss ratio by each weapon and how well I did overall with them regardless of outcome of the game.

Weapon Distribution

To get things started we’ll take a look at the number of games I used each weapon – this is necessary to help contextualise the rest of this analysis. Those weapons that have a high number of uses are much more likely to be reliable when making assumptions about how good or bad I was with the weapon as they are much less likely to be influenced by random chance. For example with the Krak-On Splat Roller, I can’t be 100% sure whether any of my results are definitely because I’m bad with the weapon, because with only 2 uses my performance could be as a result of a number of factors like: my team composition (what weapons my team mates were using); the enemies team composition (did I just get unlucky and fight against those weapon types that counter Roller weapons?); or whether I was just having a bad/off game. Weapons like the Sploosh-o-Matic are more likely to have the effects of those previously mentioned factors smoothed out simply by having a larger sample size – I’m much less likely to have uncomplimentary team composition every game for 21 games than I am for 2 games as an example.

Now let’s look at my Win to Loss ratio with each weapon:

Win Loss ratio by weapon v2

Ketchup vs Mayo, indicating my best (ketchup-coloured) and worst (mayo-coloured) Win:Loss ratios, yum.

Those weapons I did worst with (Blaster, Tentatek Splattershot and .52 Gal) are all weapons I used less than 5 times, so it’s hard to draw any concrete conclusions about whether I would have averaged out higher/lower with more games – but I can tell you that I did purposefully stop using these weapons because I felt completely useless with them (when we look at the next couple of charts we’ll see whether I actually was useless with them). The Tri-Slosher is the clear winner out of these – there are rumblings within the Splatoon 2 community that this weapon is overpowered so I’m not massively surprised by this outcome. Its large ink spread, ability to 2-hit kill from a decent range and lack of ‘RNG’ (it will always have the same ink spread every shot unlike other guns) mean it’s much easier to consistently do well with, which is most likely why I did so much better with it than other weapons over the course of the Splatfest. Of the weapons I used a lot of times (> 5 uses), the lowest win to loss ratio was with the Kelp Splat Charger, which as discussed in my previous post I got significantly (statistically speaking!) worse with throughout the Splatfest, overall losing a game for every game I won.

Mayo coloured bars are those weapons I did particularly poor with… Or you could think of it as the colour for the Krak-On Splat Roller, either works.

There’s one clear conclusion we can draw from these… I suck with the Krak-On Splat Roller. As you can see I definitely was not doing well with this weapon with the lowest average points, KDA and average number of kills (and highest average number of deaths). I stopped using that weapon with good reason! Even accounting for potential bad team compositions or enemy counters I don’t think I would have done any better, I was absolutely terrible with it.

Shooter weapons like the Splattershot and Sploosh-o-matic tended to be at the top of the average points overall, and as these weapons are generally recommended for Turf War it makes sense.

Despite my high win to loss ratio with the Tri-Slosher it ended up fairly middle of the pack for average points scored, being at the bottom of the group of weapons that I used frequently. However, it topped the average number of kills and was very close to the front of the pack with regards to KDA, potentially being why I won more games with it.

The other weapons I used less than 5 times also don’t seem to have performed particularly awfully. The .52 Gal was consistently at the bottom of the pack but was a country mile better than my performance with the Krak-On Splat Roller, and wasn’t too far from the rest of the pack, while the others were fairly spread out, with the Blaster even being the weapon with the 2nd most kills and 3rd highest KDA. The Blaster’s low point scores were its major downside, meaning I may be good with it, but it’s not particularly suited for Turf War so I may have been correct in swapping out from it.

The Splattershot maintained a top 3 position in each of these metrics, which is probably why it averaged out to a positive win to loss ratio too. Being the all-rounder weapon it fits in well here – especially as it was the first weapon I used a good amount in both Splatoon 1 and during the first week after Splatoon 2 was released. The lower sample size (9) makes it hard to properly compare it with my other most used weapons, so I may take a look at its distributions for each of these metrics in a future part to see if any high leverage points may have skewed the average heavily for this weapon.

So in conclusion, I should never ever use the Krak-On Splat Roller (without a good amount of practice) and I was right to only use it twice. I tended to do best with the Tri-Slosher, with high amounts of kills and a decent KDA. My most consistently good weapon seems to be the Splattershot, keeping top 3 positions across all the metrics I looked at (although its lower sample size means it’s less easily comparable with my other most used weapons).

That’s it for this part! I’ll be taking a deeper dive into some of the specifics I brought up within this part of the analysis and looking at how I performed when winning and losing in the next exciting (… ahem) installment of this analysis.



Splatfest Analysis (Part 1)

Splatoon 2 came out on the Nintendo Switch a couple weeks back, and they’ve had their first ‘Splatfest’ since the game was released. ‘Splatfests’ are worldwide events that pit two similar things against each other in an opinion poll (for this Splatfest it was Mayo vs Ketchup). After voting for your favourite, you’ll then be playing the game mode ‘Turf War’ against members of the opposite team, to determine which side is the best at Splatoon. I picked Ketchup, and I battled valiantly to prove that Mayo is bad and everyone who likes it deserves to lose at Splatoon.

During this Splatfest, I used Nintendo’s phone app to collect the data of my Splatfest matches in order to analyse my performance over the 24-hour event. I’ll be using this data for a multi-part analysis, covering a variety of sections like which weapons was I best with, what stages I was better at and more!

Today’s post will be covering whether I improved at the game throughout the Splatfest, mainly focused on correlation analysis.

Did I improve my performance during Splatoon 2’s ‘Splatfest’ event?

Correlation Matrix

I wanted to check whether I had improved at all throughout the Splatfest – I did this by looking at my points scored, number of kills and number of deaths, and seeing whether any improved as I played more.

Looking at the correlation matrix and the P-Values for each of the options, most of my performance stayed relatively similar and the main reason my performance tended to vary was down to random chance, not me showing marked improvements or deterioration; apart from two things:

  1. I actually got worse at covering turf the more I played with the Charger weapon type (shown by the highly negative correlation coefficient of -0.63). This wasn’t simply due to random chance (and was therefore significant), as its P-Value was below the Alpha value of 0.05.
  2. I tended to die less the more I played. I had slightly negative correlation (with a coefficient of -0.27) between me playing more games in the Splatfest and the number of deaths I had. This was also not due to random chance, as the P-Value again was less than the Alpha value.

I could tell that both things were happening as I played. When I first started playing with the ‘Kelp Splat Charger’ I was having absolutely amazing games, often being top of my team’s scoreboard, but as I played with it more, and when playing during the second day of the Splatfest I could definitely tell my aim was getting worse and I was generally not playing as tightly as when I started. As the Splatfest went on I could also tell that I was slowly becoming more conservative in my play style; I felt that my team mates were always going in and dying constantly and I thought it would be best to generally stay back and make sure my team could super-jump to me (it rarely ever happened though… my team mates were playing like they had never played the game before, despite most being a similar level to me).

I also did a quick check to see whether points scored, number of kills and number of deaths all correlated, and it wasn’t really a surprise to see that as I scored more points I tended to get more kills and die less, but it’s nice to confirm it!


Tinkering with Bullet Charts

After reading some of my other favourite data visualisation blogs (Storytelling with Data to be precise) I came across a new type of visualisation I’d not seen before – the Bullet Chart.

Once I saw Bill Dean combine the two-series bar chart into a bullet chart, I couldn’t help but think of my last blog post and the set of two-series charts I produced comparing British and Non British populations in London. Curious, I went about converting the charts I had made into this new format, and I’m very pleased with the results:

Data source: ONS, Population of the United Kingdom by Country of Birth and Nationality 2015

This way of displaying the data dealt with my main issues I had with the original versions:

  1. It was hard to tell which bars belonged to which Local Authority (LA);
  2. Due to the visual clutter of having so many bars in one chart, it was hard to pick out and compare the two series for each LA.

Bullet charts are definitely going to be something I consider in future when making two-series charts or gauges – I’m sold!

Download the excel file with the charts here.