Quick Steps to Make Effective Data Visualisations

Original.png

This is an example of the default settings for a bar chart in Excel. This is functional but could be far more effective; here are my main steps I use to create quick, effective data visualisations.

  • Remove chart borders.
  • Show data in a useful order (alphabetically, or ranked, or some other important order).
  • Keep gridlines and reduce their colour, or remove them entirely.
  • If you remove gridlines, consider adding data labels.
  • If data labels have been added consider removing y-axis labels.
  • Add axis titles where the values shown aren’t inherently obvious.
  • Reduce the colour used in the chart.
  • Highlight the important data points.
  • Use white space effectively, but don’t leave excessive amounts.
  • Where possible use a chart title that helps the user see the points that you’ve highlighted and their context. Annotate any other points you’ve highlighted, why are they interesting?
  • Use colour to create a relationship between your annotations/title and the highlighted data points.

These steps will help you turn your default Excel charts into impactive and attractive data visualisations:

New.png

Advertisements
Data Visualisation Excel Examples How to

Analytics and Ethics

Upon the enlightening release of the Cambridge Analytica exposé that revealed the company’s underhanded tactics it utilises to influence and manipulate people with a combination of data analytics, espionage, and ‘honeypotting’, I think it’s incredibly important to discuss how the analytical community needs to ensure a commitment to honest and unbiased analytics, and ethically sourced data.

It should always be on the analyst’s mind about how their work may be used and interpreted by all parties who consume the analysis that they’ve produced. This is underlined by the importance of clarity in data visualisation. Producing data visualisations that have little to no confounding elements (such as dual axis and scales that don’t start at zero) is important to ensure that our analysis can be correctly and easily interpreted by our audience, and minimise the risk that our audience draws false conclusions from the visualisation. If there are any particular nuances to the data that may cause an incorrect interpretation of that data, it’s important that it is made clear to the user and isn’t allowed to cloud their judgement. Techniques like visualising uncertainty are therefore incredibly useful when showing analysis that includes things like forecasting or correlation analysis that rely heavily on confidence intervals.

Evidently when it comes to enacting these methods, to create clarity in data visualisations, there is a wide scope for interpreting how this should be carried out. This creates the subjectivity of data visualisation which opens a gap which is ripe for ethical dilemmas! One thing that we, as analysts, have to be salient of is making sure we don’t infer our biases in the analyses we carry out. When ensuring our audience don’t mistakenly draw false conclusions from the data we’re presenting to them, we need to make sure that these ‘false’ conclusions aren’t simply the conclusions that we don’t want them to make because it doesn’t align with our world view. This is especially important in the land of data journalism. The vast majority of publications come with a litany of biases; which side of the political spectrum they lean toward, their views on particular hot-button topics, and whomever sponsors their content. It’s important that the data being presented in these publications isn’t being twisted to create misleading and false narratives that align with the publication’s biases. Doing so is morally reprehensible and incredibly manipulative, but we see it time and time again, and the general public that aren’t well versed in statistics and good data visualisation will fall for these cheap tricks again and again.

This is simply a microcosm of the scope for this topic. It’s obvious that despite my misgivings, analytics will still be used for a variety of dishonest and malicious practices – hell, most marketing companies use analytics to make sure us chumps in the public are enticed as much as possible by their product/message/service and that’s not likely to stop any time soon. Hopefully, in time and likely after a large enough scandal with big data (e.g. accounting and the Enron scandal influencing the profession’s push for a greater emphasis on ethics, and physics with the Manhattan project) we’ll solidify a code of ethics that all analysts will strive to follow.

Write-Up

HOW TO… Excel: Create a Dynamic Data Range for a Chart

Continuing the theme of automatically updating charts, we’re going to look at how to make a chart deal with a dynamically changing data range. This is useful for situations where you have multiple measures that each have a different number of independent variables (the variables plotted on our x-axis, which was the month of the year in our last example). If you use the method as lined out in the last tutorial for this you will find that the chart updates automatically, but the number of variables shown on the x-axis will not change. To solve this, we will need to use Named Ranges; OFFSET; and COUNT to produce the ranges that will feed the automatically updating charts.

We’ll be continuing with the workbook used in the last tutorial (downloadable here) for our example.

Step 1.

Step1 Example

First, we’ll need to create our Named Ranges. We will need a named range for our horizontal axis labels and for our data. We’ll name these ‘ChartDateRange’ and ‘ChartOrgRange’. Now we’ll need to construct the OFFSET function that will be used to determine the ranges of the data dynamically. For both named ranges we will be using almost the same formula, just pointed at a different reference cell. Our ‘ChartDateRange’ will be as follows:

=OFFSET(Charts!$C$30,,,,COUNT(Charts!$C$31:$CF$31))

The first parameter is the ‘Reference Cell’; C30 – the starting cell of our date ranges. We will then pass through the next 3 parameters which are as follows:

  1. The number of rows to move from the reference cell.
  2. The number of columns to move from the reference cell.
  3. The height of the range starting from the reference cell.

Which means the final parameter with our COUNT function determines the width of the range starting from the reference cell. We count all the data within the data range; in this instance I’ve extended the range of the COUNT to include extra columns. We will be extending our data range further out to allow the chart to handle extra data in the backing sheets and adding in additional months of data.

Our ‘ChartOrgRange’ is then simply:

=OFFSET(Charts!$C$31,,,,COUNT(Charts!$C$31:$CF$31))

We’ve just shifted the reference cell down one row to C31 in this case.

Step 1b.

We’ll quickly extend the data ranges of our back sheets and our INDIRECT named range ‘SelectedData’ to also cover the new extended data range. We’ve extended the back sheets to column CF and made the blank data cells show an error by using:

 

=NA()

 

This is necessary to prevent us counting these cells as they will show as 0s when not errors.

 

To match this, we’ve extended ‘SelectedData’ to include columns CF, so the formula is now:

 

=INDIRECT("'"&'Output Sheet'!$H$16&"'!$C$7:$CF$16")

 

We also need to change our chart’s data table to have the formula:

 

=INDEX(SelectedData,OrgNameLinkedCell,MATCH(C$30,AverageTimeData!$C$6:$CF$6,0))

 

We’ve changed the MATCH function to now look at the range C6:CF6 in our back sheets. We then need to extend the data table all the way across to column CF too.

Step 2.

Step 2 Example

Now we can set the chart to look at these Named Ranges instead of the actual data range.

Right click the chart and go to ‘Select Data’. First, we’ll assign a new series data range, so click the ‘Legend Entries’ option for our current series then select ‘Edit’. Change the ‘Series values’ to:

=Charts!ChartOrgRange

It’s important to include the worksheet reference ‘Charts!’ as otherwise the chart will not accept the new input.

Next, we’ll change the horizontal axis labels range, click the ‘Edit’ button for ‘Horizontal (Category) Axis Labels’. Change the ‘Axis label range’ to:

=Charts!ChartDateRange

Once again, make sure you include the worksheet reference ‘Charts!’.

Step 3.

We can now add in new data to test our changes. To showcase this, we’re going to add in some new dummy data to just one of our back sheets; ‘StaffData’ in this case.

To achieve this, we’ve added in 2 more months of data using:

 

=RANDBETWEEN(20, 500)

 

This generates some random numbers for us between 20 and 500. To ‘freeze’ these values we’ll just copy them and paste them as values.

Once this is done you should be able to change freely between the measures and note that for Number of Staff we have data all the way up to May-17, and for the other two measures they only extend up to Mar-17.

And that should be it! This method is also especially useful for automation of Excel reports. If you set up the back sheets to automatically pull their data from wherever their data source is, utilising this method will allow your front end to also automatically update meaning you will have more time to check your data and free up your time to do more proper, less tedious work!

You can download the completed workbook here.

Once again thanks so much for reading and I really do hope this helps. If you have any comments or queries about this tutorial, please do reply below or email us at staticinterest@gmail.com.

How to How to... Excel

Nintendo Console Dashboard Update

I was taking a glance around the Excel tag in the WordPress Reader, when I came across the following blog post about Linked Pictures in Excel.

This was a little bit of a revolution to me; when I made the Nintendo console dashboard I struggled to work out a passable fix for including dynamic pictures in VBA, ultimately deciding to scrap the code and just go with no pictures. Now that I know how to use Linked Pictures with named ranges to allow dynamic updates I could easily go back and update the old Dashboard!

Dashboard With PicsWhere it’s appropriate I’ll definitely use this method elsewhere, it saves busting out VBA and causing a potential hiccup when a user doesn’t enable macros which is always a concern I have when using VBA.

The link to the blog post above is a tutorial for using this method if you’d like to find out how to do it yourself!

The file is available here if you want to prod around yourself.

 

Excel Examples

Nintendo Console Dashboard

If it weren’t apparent enough from my Splatoon 2 analyses, I love Nintendo. So, of all the things one can make a Dashboard about, of course I went ahead and created a Dashboard using data about Nintendo’s array of home consoles.

This was done using Excel – but I’m really interested in working on my Python coding and implementing this into a stand-alone embeddable tool.

If you want to have a poke around yourself and see how it all works, you can download the Dashboard here.

 

Excel Examples

HOW TO… Excel: A Truly Non-Volatile Solution for Dynamic Data Tables

This is a follow-up to the last tutorial. Thanks to reddit user /u/small_trunks for noticing my blunder!

Unfortunately, ‘SUMIFS’ couldn’t be used on its own to create the dynamic data tables. But don’t fret! I have another solution up my sleeve that doesn’t rely on volatile functions like ‘INDIRECT’ and ‘OFFSET’! It relies entirely on ‘INDEX’ and ‘MATCH’.

We’ll need to add an additional list next to our measure names and back-sheet names lists on our output sheet, which I’ve named the ‘Master Sheet Start Point’ column. In this column we can use the ‘MATCH’ function to work out the first iteration of each of the measure names in our backsheet. The formula in this column is:

=MATCH($A16,MasterBacksheet!$C$2:$C$31,0)

Then, below our two cells that work out which sheet name and measure name (H16 and H17) is corresponding to what has been selected by the combo box we can add another cell that will find the starting point of the selected measure in the master back-sheet. The formula will therefore be:

=INDEX($E$16:$E$18,$G$16)-1

We need to include the ‘-1’ as we’ll also be using ‘MATCH’ in the main formula for finding the row number of each organisation so we don’t want to ‘double count’.

NewSolution

The formula we’ll be using in the dynamic data table will be an ‘INDEX’, ‘MATCH’ formula as follows:

=INDEX(MasterBacksheet!$D$2:$AA$31,MATCH(‘Output Sheet’!$B4,MasterBacksheet!$B$2:$B$11,0)+’Output Sheet’!$H$18,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0))

Breaking this down gives us:

  1. =INDEX(Full Array of Master Back-sheet Data,
  2. MATCH(Org name, Org Name list in Master Back-sheet, Look for an exact match) + Measure’s Master Sheet Starting Point,
  3. MATCH(Date, Date Headings in Master Back-sheet, Look for an exact match))

And then this formula can be copied across the entire data table! With that we’ve solved the issue, we have a non-volatile method of creating a dynamic data table.

NewSolution2.PNG

Once again if anyone has any other methods of solving this solution please share in the comment section below – I’d love to see what creative methods the community can come up with!

The workbook can be downloaded here.

How to How to... Excel

HOW TO… Excel: Using SUMIFS to Create a Dynamic Data Table

This is a continuation of the previous tutorial.

Due to the volatility of the INDIRECT function, it may be necessary to avoid using it when trying to create large dashboards which will be handling and outputting large amounts of data. So, we’ll need an alternative in these cases. One of the ways this can be done is by using the SUMIFS function, and structuring our data to be compatible with the SUMIFS function.


This has been achieved using SUMIFS, but OFFSET (another volatile function) has been used to cover one of SUMIFS weaknesses. A completely non-volatile method can be found here.


This will be done on the worksheet from the previous tutorial. A download link is available at the bottom of that page so you can follow along.

Step 1.

First, we’ll need to create a new ‘master’ data back-sheet that compiles all the data back-sheets to allow SUMIFS to work correctly.

Step1.PNG

A new column called ‘Measure’ has been added to the old structure which contains the measure names, and the data has been pulled from the other back-sheets by simple cell references.

Step 2.

Now we need to change the formula in the output sheet’s data table to use SUMIFS. But before that, we’ll need add a new INDEX that checks the Measure Name selected by the combo box.  H16 on the output sheet can be copied down to H17 and then have its INDEX array changed to the measure name list as opposed to the back-sheet name list so the new formula in H17 is:

=INDEX($A$16:$A$18,$G$16)

We can now change the dynamic data table’s formula. The formula used in the top left (C4) of the table is as follows:

=SUMIFS(MasterBacksheet!D$2:D$31,MasterBacksheet!$C$2:$C$31,’Output Sheet’!$H$17,MasterBacksheet!$B$2:$B$31,’Output Sheet’!$B4)

If we break this down we get:

  1. =SUMIFS(SUM_Range , Criteria_Range1 , Criteria1, Criteria_Range2, Criteria 2)
  2. =SUMIFS(Apr-15 Column (frozen to row numbers using absolute referencing, $), Measure Column, Selected Measure, Org Name Column, Org Name (frozen to column number using absolute referencing, $)

A weakness of this formula as it stands is the way we’ve referenced the SUM range, using direct references. To allow this formula to be more flexible we can instead use another function to work out the SUM range instead of directly referencing it. One way to do this is to use the OFFSET function. The way to construct the OFFSET function within this formula to correctly determine the SUM range dynamically is as follows:


As pointed out by reddit user /u/small_trunks, OFFSET is also volatile so isn’t a good alternative to the ‘Indirect’ function! This is still a valid method to create the dynamic data table if volatility is not an issue, but a better method can be found here – although unfortunately ‘SUMIFS’ hasn’t been used in this instance.

UPDATE: Reddit user /u/ubbm has come to the rescue! The example below with OFFSET can be replaced with an INDEX function. INDEX can be used in a very similar way to OFFSET; if you specify an array and then only choose a row number or column number, the output will be the entire row or column within the array. So for this example the following can be used instead of OFFSET:

=INDEX(MasterBacksheet!$D$2:$AA$31,,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0))

See the full comment for a break down of the logic behind this method of using INDEX.


=OFFSET(MasterBacksheet!$C$2:$C$31,,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0))

The structure of the OFFSET function is:

  1. =OFFSET(Reference Range, Number of Rows to Move from Reference, Number of Columns to Move from Reference, [OPTIONAL] Height of the Reference, [OPTIONAL] Width of the Reference)
  2. So, our ‘OFFSET’ function is: = OFFSET(Measure Name Column (1 column before the date columns), Move 0 Rows, Find ‘Apr-15’ in the date headings and move that many columns across). We start one column before the date headings as ‘MATCH’ returns 1 if it finds a match in the first cell of the range.

So our full formula is:

=SUMIFS(OFFSET(MasterBacksheet!$C$2:$C$31,,MATCH(‘Output Sheet’!C$3,MasterBacksheet!$D$1:$AA$1,0)),MasterBacksheet!$C$2:$C$31,’Output Sheet’!$H$17,MasterBacksheet!$B$2:$B$31,’Output Sheet’!$B4)

This formula can then be copied across the entire data table.

Step2

And there we have it! This of course is just one way of going about solving this issue, and part of the beauty of Excel is the ability to bend it to your will when you know all the different functions and tricks of the trade. If you have any other ways in which you would go about creating a dynamic data table, you can share it in the comment section below. Any feedback on my methodology is also welcome!

You can download the finished workbook here.

How to How to... Excel

HOW TO… Excel: Create a Dynamic Data Table Using INDIRECT and Combo Boxes

IndirectTutorialGif

Creating a full and cohesive user experience in an Excel based dashboard can be really difficult if you don’t know where to start. Often, the easiest way to approach the situation will be to utilise Pivot Tables and Pivot Charts, which come with useful tools like slicers to dynamically filter the data with a touch of a button. I have a few pet peeves with Pivots however, mainly being that they require data be in a particular layout; that they can be restrictive on the type and layout of the output they produce (although they do provide a good level of options); and when updating or changing data within a spreadsheet Pivoted data does not refresh unless it’s explicitly told to (which can create a jarring experience when handling and presenting larger data sets).

So, how would you go about creating a dynamic data table and dashboard without using Pivot Tables? There are a seemingly massive number of ways to go about this task, but the way I find most versatile and that I’ve become comfortable using is to make use of Combo Boxes and different Excel functions like INDIRECT (covered in this tutorial), OFFSET, SUMIFS and INDEX.

In this tutorial I will show how to utilise combo boxes and the INDIRECT function to create a dynamic data table which can be used to better visualise your data and be used as a solution for a client or manager-facing reporting tool. This tutorial will be using dummy data to show a basic Key Performance Indicator dashboard with data from April 15 to March 17. The INDIRECT function will be used specifically to dynamically reference all of our backing sheets.

Step 1.

First we’ll need to create structured backing sheets which follow a consistent lay-out. This is essential to allow the INDIRECT function to consistently output the correct arrays for our data.

Step1

The above example shows my preferred structure:

The top-left of the sheet includes a quick description of where we got the data from and how up-to-date our data is. Including this is general best practice for maintaining data back-sheets to allow other people to understand the data’s history and for us to better quality assure our data. All three of the back-sheets in this workbook will follow the same format, including the same organisation list and dates.

Step 2.

Now we’ll create the output sheet that will hold the dynamic data table. In this example I’ve copied the same structure used in the back-sheets, with the same organisations and all the dates. It would be possible to pick and choose dates and organisations for this output sheet, as the formula used will be able to deal with any combination of the 2 variables.

Step2

Step 3.

We can now create the combo box and the lists of measures and sheet names that feed the combo box. To create a combo box, go to the developer tab of the ribbon

(If the developer tab of the ribbon isn’t visible for you, go to:
File -> Options -> Customise Ribbon
then on the list on the right hand side of this tab tick the selection box for the developer tab).

From the developer tab, select Insert and under ‘Form Controls’ select the combo box option. You should then be able to draw a combo box on the spreadsheet.

Step3a

Now we will create 2 lists: one of measure names and one corresponding to each of our back-sheets. In this example we have 3 measures:

  1. ‘Average Time Taken’;
  2. ‘Total Throughput’; and
  3. ‘Number of Staff’

Corresponding to our back-sheets:

  1. ‘AverageTimeData’;
  2. ‘TotalThroughputData’; and
  3. ‘StaffData’

I’ve created the lists next to one another below the data table as below:

Step3

Step 4.

Now we need to link the combo box we created earlier to the measure list. Right click the combo box and select ‘Format Control’. For the ‘Input Range’ select the measure list; in this example it’s A16:A18. For the ‘Cell Link’ select an empty cell nearby; in this example the cell chosen was G16. Hit OK, the combo box should now allow you to select the measures in the measure list, with the linked cell showing the number corresponding to the measure’s position in the list. In the cell beside the linked cell we will now use an INDEX function to work out the sheet name from the list created beside the measure names.

Step4a

INDEX looks at an array and works out the position of a cell based on a given row and column number. As the linked cell gives the position of the measure in the measure list (thus acting as the row number), it will also return the corresponding back-sheet if we use it in an INDEX function.

Step 5.

Now we can construct the INDIRECT function; in this example it will be done within a named range, but it’s also possible to do this within a cell’s formula. Using named ranges just cleans up the final formula and makes it easier to parse for other users. To create a new named range go to the ‘Formulas’ tab of the ribbon and select the ‘Define Name’ option. Give the named range an appropriate name like ‘SelectedData’, and insert the INDIRECT function. In this example the formula will be:

=INDIRECT("'"&'Output Sheet'!$H$16&"'!$C$7:$Z$16")

The purpose of the INDIRECT function is to allow excel to interpret text as an array or cell. Breaking down this formula gives us the following:

  1. =INDIRECT(“TEXT TO BE INTERPRETED”)
  2. =INDIRECT(“ ‘ “& CELL WITH SHEET NAME &” ‘! DATA TABLE’S ARRAY”) this is then interpreted by excel as:
  3. =’SHEETNAME’!DATA TABLE’S ARRAY
Note: The data table’s array does not include the column headings or the row headings i.e. the dates or the organisation names – it starts at the actual data.

Step5

Step 6.

We can now construct the formula used in the dynamic data table. We’ll be using an INDEX function, in conjunction with our ‘SelectedData’ named range and the MATCH function.

The formula in the top left of our dynamic data table is:

=INDEX(SelectedData,MATCH($B4,$B$4:$B$13,0),MATCH(C$3,$C$3:$Z$3,0))

Breaking this down to each part gives us:

  1. =INDEX(SelectedData, this is the link to our named range, so our named range is the array we are ‘indexing’.
  2. MATCH($B4,$B$4:$B$13,0), this is where we work out the row number to look for. B4 is our first organisation name ‘Supplier A’ (its column is ‘frozen’ by using the ‘$’ before B), and the array B4:B13 is the full list or organisation names. The 0 shows we’re looking for an exact match. MATCH then returns the row number of ‘Supplier A’ in the list of organisation names; 1 in this case. The array (the list of organisation names) in this function is able to be the list on the output sheet as it’s the same as the list in the data back-sheets. If you were to create a custom list of organisations to show, you would need to check B4 against the full list of organisations used within the data back-sheets.
  3. MATCH(C$3,$C$3:$Z$3,0) this is where we work out the column number to look for. C3 is our first date ‘Apr-15’ (its row is ‘frozen’ by using the ‘$’ before C), and the array C3:Z3 is the full list of dates. The 0 again shows we’re looking for an exact match. MATCH then returns the column number of ‘Apr-15’ in the list of dates; 1 in this case. As mentioned in the previous point, the array (the list of dates) in this function is able to be the list on the output sheet as it’s the same as the list in the data back-sheets.

This formula can then be copy and pasted across the entire data table. This should then result in a table that can dynamically update like so:

IndirectTutorialGif2

Step 7.

Now that we have a dynamically updating data table we can add in dynamically updating conditional formatting! Alongside the measure and sheet lists we can include a boundary at which our green turns to red and vice versa. I’ve set these to 2, 200, and no boundary for ‘Average Time Taken’, ‘Total Throughput’ and ‘Number of Staff’ respectively. Next to these lists I’ve included a cell that contains the boundary for the selected measure. This is done in the same way as the sheet name cell, using the INDEX function. For this example the formula is:

=INDEX($C$16:$C$18,$J$1)

This looks at the array of boundaries (C16:C18) and returns the one corresponding to the chosen measure.

Now you can highlight the data table, go to the ‘Home’ tab of the ribbon and select the ‘Conditional Formatting’ then ‘Highlight Cells Rules’ and ‘Greater Than…’.

Step7

Select the cell that contains the boundary for the selected measure; D16 in this case. Choose the ‘Light Red Fill with Dark Red Text’ option and hit OK. Do the same for ‘Less Than…’, selecting the same cell and this time choosing ‘Light Green Fill with Dark Green Text’. Sorted!

Step 8.

Now you may have realised upon changing the measures that these don’t make sense for our ‘Total Throughput’ measure and the ‘Number of Staff’ measure is incorrectly showing the conditional formatting. To combat this we’ll need to change the rules to an IF statement rather than a simple greater than or less than rule. To change the rules, go to the ‘Manage Rules’ section of the ‘Conditional Formatting’ menu. For the Red formatting, select ‘Edit Rule’ and change the rule to ‘Use a formula to determine which cells to format’. The formula used in this example is:

=IF($H$16="TotalThroughputData",C4<$D$16,C4>$D$16)

This checks the selected data back-sheet, and if it’s the back-sheet for the Total Throughput data: highlight the cells that are less than D16 (our cell containing the selected measure’s boundary) red; and if it’s not highlight the cells that are greater than D16 red. This means that the rule now correctly shows those organisations with low throughput as red, instead of green.

Now edit the green formatting in the same way, using the formula:

=IF($H$16="TotalThroughputData",C4>=$D$16,C4<=$D$16)

This checks the selected data back-sheet, and if it’s the back-sheet for the Total Throughput data: highlight the cells that are greater than or equal to D16 (our cell containing the selected measure’s boundary) green; and if it’s not highlight the cells that are less than or equal to D16 red. This means that the rule now correctly shows those organisations with high throughput as red, instead of green. By using the ‘greater/less than or equal to’ inequality we have now also included the boundary’s number within the formatting which is another error that needed to be corrected.

The final error that needs to be corrected is that ‘Number of Staff’ will still show the conditional formatting despite not having a boundary. To correct this add a new rule with ‘Use a formula to determine which cells to format’, and add the formula that will exclude those measures with no boundary – in this example we’ve used ‘‘ to denote no boundary in the list of boundaries. The formula used is then as follows:

=$D$16="-"

Leave the format as ‘No Format Set’ and hit OK. Then tick the ‘Stop If True’ checkbox in the ‘Manage Rules’ options window for this rule.

The sheet is now complete from a functionality standpoint – I’ve made some general formatting changes to try and make the table clearer and easier to interpret. I’ve increased the row size of the table rows to 20, removed the gridlines on the sheet by going to ‘View’ on the ribbon and unchecking the ‘Gridlines’ box. I also added a title: ‘Select Measures’, above the combo box to help aid user experience. I’ve also used the ‘Freeze Panes’ setting in the ‘View’ tab at cell C2 to lock the organisation names and combo box in the view of the spreadsheet. I’ve hid rows 15-18 and added a filter to the column headings, by highlighting them all and then going to the ‘Home’ tab and selecting ‘Filter’ under the ‘Sort & Filter’ option on the right of the tab.

One thing to be aware of with this method is a weakness of the INDIRECT function. INDIRECT is known as a volatile function, this means whenever any change is made to the workbook it will cause any formulae with an INDIRECT function in it to recalculate. Other functions that share this property are the functions like RAND and RANDBETWEEN. This means incredibly large data tables which use INDIRECT can cause massive slow down when making changes to the workbook. A way to avoid the INDIRECT function would be to format the data back-sheets in an alternative way – with one master data back-sheet instead of multiple. I’ll cover an example of this in future tutorials!

So after following this method you should be able to create a dynamic data table perfect for Key Performance Indicator dashboards! It offers more flexibility (in my opinion) than Pivot Tables, and using combo boxes allows us to link multiple parts of a spreadsheet to a single dynamic cell to create a more cohesive user experience.

Thanks for reading, if you have any questions or feedback please do comment below!

You can access the workbook here.

How to How to... Excel

Splatfest Analysis (Part 3)

In this final Splatfest Analysis, we will be taking a closer look at the distributions of the 9 games I had with the Splattershot to determine whether its consistent place in the top 3 was due to me consistently playing well with the weapon, or whether the averages were skewed by particularly high leverage outliers.

The points scored with the Splattershot were relatively consistent around 800-900, but a particularly good game where I scored 1180 points has skewed the mean positively, but it wasn’t so dramatic a swing to cause the mean to drift far from the median; it was near where most of the points were clustered.

The number of kills I got each game with the Splattershot were also consistently around 5-6 kills each game. A slightly low game with 3 kills did cause some skew negatively to the mean, but again it stayed close to the median and where most points were clustered.

The number of deaths each game also showed very little variation, with the most frequent number of deaths being 0. It ranged from 0-4 with a fairly even spread, meaning no individual point had a particularly large influence on the mean of 1.67.

In conclusion, it does seem that I was consistently good with the Splattershot – meaning it was well deserving of its top 3 positions despite its lower sample size.

And with that this utterly riveting multi-part analysis comes to a close! My main take-aways from a data visualisation and analysis stand-point when carrying out these analyses were that:

  • Box and Whisker Plots in Excel 2016 are fickle and I’ll likely avoid them in future!
  • A more heavily annotated set of graphics may have been more suitable, instead of writing up everything in these report-style blog posts. This would allow the analyses to better live as separate entities from this blog, which is especially important when doing analyses of sensitive data to make sure the data can’t easily be divorced from the analysts’ findings and any caveats to the data.
  • Using Excel to do ‘proper’ data analysis with the AnalysisToolPak is very easy. Until starting this I was unaware Excel could do things like linear regression (almost) natively, usually I would move my data over to a more specialised analytical software package like SPSS (and I have a particularly large disdain for SPSS!).

Thanks to those of you who took the time to read my ramblings and please don’t be afraid to leave any feedback you have in the comments section.

Here is the Excel file I used to create all of the graphics for this analysis: Link.

Write-Up

Zipf’s Law

After watching a long session of ‘VSauce’ videos (great brain food videos, albeit very addictive!), I came across this video discussing ‘Zipf’s Law’. Zipf’s law states that in any corpus of natural language, the frequency of any word is inversely proportional to its rank in the frequency table. This Zipfian distribution applies to many different types of data studied across a variety of fields (the video discusses a large variety of these instances). Zipfian distributions also follow the ‘Pareto Principle’, the 80-20 rule. 80% of the words used in any corpus are only 20% of the unique words used.

After being utterly captivated by this phenomenon, I decided it would be fun to check whether my blog, being a corpus of natural language, followed Zipf’s law.

To my delight it does! I also thought it would be a nice opportunity for me to experiment with colour, utilising a dark midnight blue as the background instead of the usual white. Using the complimentary orange for the data points allows them to ‘pop’ from the background which I think produces a nice effect.

If you disagree or have any other non-traditional colour combinations for charts feel free to let me know in the comment section!

General Write-Up