Patient Pathway Analysis Tool – Update

This post is a discussion of the ongoing development of the Patient Pathway Analysis tool I was working on over the winter of 2018. I’ll be covering the techniques I’ve used, some of the challenges I’ve had while producing the tool, and techniques I’d like to use in future.

As mentioned in my previous blog post regarding the Patient Pathway Tool, it utilises Chi Squared tests (on contingency tables) to determine whether a CCG’s patients are being treated differently to their similar 10 peers. This is done across a selection of indicators from the SUS inpatient spells and episodes tables. Once an indicator has been determined to have been carried out significantly differently compared with the CCG’s peers, I’ve worked on providing the tools to investigate what happened to the patients for each category of the indicator across their inpatient stay.

This was achieved by using the interactive elements of Jupyter Notebooks and the visualisation package Altair. The user can select a category from a given indicator and see what occurred to that group of patients across all other indicators from the data set. This allows an analyst to determine the possible reasons for any given difference – for example some differences may be explained by the specific diagnosis or group of procedures that are being carried out on a set of patients, showing that the difference in patient cohort might have led to the differences determined by the tool. Or that one of the CCG’s peers are admitting patients for a set of procedures that might normally be done as outpatient procedures, causing differences throughout each indicator.

This allows the analyst to then remove any group of patients that might be considered outliers and examine whether any differences have occurred in the new, more homogenous subset of patients.
As an example, looking at Head and Neck Cancer patients for a CCG – the tool identified a difference in the proportion of patients having an operation between the CCG and its peer group (the best 5 CCGs for length of stay (LoS) for the Head and Neck Cancer patients).

Operation Status Patient Pathway Example

The tool then allows us to drill down on the group of patients coded as having an operation status of 8 (Not Applicable), to see what occurred with this group of patients across the rest of their inpatient care. We identified that the patients were most commonly having the ‘X654’ procedure, the code for a type of Radiotherapy. Patients having this procedure from the best 5 CCGs were almost entirely from one large CCG – skewing the proportions a large amount. This cohort of patients could then be removed to explore whether any differences occurred across a more similar set of patients.

Drilldown Patient Pathway Example
Procedures with small numbers have been removed

A large part of this project that I generally didn’t account for has been managing the queries to the SUS database. Trimming the query down to fewer fields (for example dropping procedure codes past the 10th due to the small number of patients this affects), creating more efficient joins and creating stored procedures the tool calls were all important parts of reducing the load times from 5 minutes (!) to around 30 seconds. This has allowed small changes to be made to the query (adding/removing diagnosis codes etc.) without having to worry about the massive amount of time that would be wasted waiting for the data to be passed to the tool.

As the tool increased in complexity it was also important to create modules of functions to be called upon within the notebook environment – to abstract the complexity and reduce the cognitive load on the user when making small changes to the tool. It required breaking down the actions carried out in each cell into functions; working out where similar functions existed and then slightly adjusting those similar functions or their outputs to work for all the required cells. I used Visual Studio Code as my editor of choice for writing the .py files that housed the functions, but any text editor or IDE would suffice. Here are some examples of the functions that were created; functions for producing each of the types of visualisation used in the tool; functions for calling SQL queries; functions for transforming the data into contingency tables and functions for enabling suppression of small numbers from outputs.

def ClusteredBarChart(data: pd.DataFrame, indicatorCol: str, descCol: str, x_Sort: list, indicatorTitle=''):

    ''' The standard clustered bar chart showing the chosen CCG against the best 5 or similar 10.'''

    chart = alt.Chart(data[data['Include'] == 1]).mark_bar(stroke='transparent').encode(

        x = alt.X('variable:N', axis = alt.Axis(title=''), sort=x_Sort),

        y = alt.Y('value:Q', axis = alt.Axis(title='% of Patients', format='.0%')),

        color= alt.Color('variable:N', legend = alt.Legend(title = ''), sort=x_Sort),

        column=alt.Column(indicatorCol, type='nominal', title=indicatorTitle),



                            title='% of Patients',

                            format='.2%'), 'Counts', descCol]


        stroke ='transparent'


    return chart

An example of a function used to produce visualisations in the tool, making use of the Altair package.

I’ve looked into other techniques that we may be able to use in the future; my favourite of the techniques so far has been clustering using K-prototypes, a technique combining the K-means and K-modes methods for clustering data. K-prototypes would be needed due to the mix of categorical (e.g. operation status and procedures carried out) and continuous variables (e.g. the patient’s distance from provider and their length of stay). The package needed to do this type of analysis in python is kmodes. Unfortunately, I’ve moved roles recently so won’t be able to work on this myself – but hopefully this will be able to be implemented by my colleagues when they get the time to work out the code base!

Data Visualisation General python Write-Up

Data Science Project – Patient Pathway Analysis

This is a short piece on a project I’m developing at NHS England, using Python to analyse patients’ inpatient care and determining where a Clinical Commission Group’s (CCG) patients are being treated significantly differently to those patients in its peer group CCGs (similar 10 CCGs as determined by NHS Rightcare Methodologies).

General Write-Up

HOW TO… Excel: Create a hierarchical series of drop down boxes

As a result of prodding around the excel subreddit I found a query from user innocuous_gorilla about creating drop downs that are dependent upon each other. Due to working at NHS England where we have a massive array of organisation types and hierarchies, I quickly learnt how to do this! For example, in NHS England we have different regions, sub-regions and commissioning groups which tend to be massive long lists if un-filtered, so it makes it hard to select one in a drop down where all options available. Using the following method, you can make a set of drop downs that all depend on a master drop down(s), so choosing a region or sub-region filters the commissioning group drop down box to only those groups that belong to the chosen parent region.

How to How to... Excel

HOW TO… VBA: Compare VBA Combo Box Values with Their Lists


So, a recent issue I’ve had using VBA to create UserForms to handle data input is data validation. Throughout my UserForm I’ve included a variety of Combo Boxes that contain lists of information the user can select. In most instances I would like the user to have inputted exactly one of the options within the drop down provided with the combo box; VBA has its own property to handle whether Combo Boxes require an exact match – the MatchRequired property. While this can be used, it creates an issue where if a user leaves a combo box unfilled the combo box will say there is a no match and therefore an error – locking the user out of the form until they input something that is a match. My UserForm is used for modifying and inputting requests our team has had, therefore certain Combo Boxes won’t be able to be filled until a request itself has been completed, with fields such as ‘Quality Assured By:’ only being relevant when the work done has been QA’d. In this instance this Match Required error would cause the user to be forced to enter incorrect data to proceed with the form if they’d accidently clicked into this combo box.

To get around this I turned off the MatchRequired property and created my own data validation Subroutine to check if the value held by each combo box matches a value within the combo box’s list. To do this we will need to create an array of arrays and then iterate through those arrays, something that’s fairly straight forward in more robust languages like Python and C# but a bit more finicky in VBA. Below is some example code which we’ll go through step by step.

How to How to... Excel how to... vba

Analytics and Ethics

Upon the enlightening release of the Cambridge Analytica exposé that revealed the company’s underhanded tactics it utilises to influence and manipulate people with a combination of data analytics, espionage, and ‘honeypotting’, I think it’s incredibly important to discuss how the analytical community needs to ensure a commitment to honest and unbiased analytics, and ethically sourced data.


HOW TO… Excel: Create a Dynamic Data Range for a Chart

Continuing the theme of automatically updating charts, we’re going to look at how to make a chart deal with a dynamically changing data range. This is useful for situations where you have multiple measures that each have a different number of independent variables (the variables plotted on our x-axis, which was the month of the year in our last example). If you use the method as lined out in the last tutorial for this you will find that the chart updates automatically, but the number of variables shown on the x-axis will not change. To solve this, we will need to use Named Ranges; OFFSET; and COUNT to produce the ranges that will feed the automatically updating charts.

How to How to... Excel

HOW TO… Excel: Create an Automatically Updating Chart

Following on from our last series of HOW TO… Excel, we’re going to expand the concepts used to create the dynamic data table and create another sheet for the same workbook that will be the home of an automatically updating chart! Unlike pivot charts, this will be able to update as soon as new data is inputted into the data backsheets, and will be connected to the same ‘linked cell’ as our previous combo box, so the changes will be persistent throughout the workbook. This will allow us to create a consistent and pleasing user experience for whom ever will be using the tool.

How to How to... Excel

Florida Ticket Price Comparisons

I’m going on holiday soon! I’m headed to Florida with my parents and being the cost savvy Brits that we are we’re always on the hunt for a bargain. We wanted to go do some of the Theme Parks (we settled on Disney and/or Universal Studios) and went about finding the cheapest ticket rates we could. During the internet hunt my Mum kept using the Calculator on her PC to do things like exchange rate conversions and to calculate price differences – but, being the Excel monkey that I am I proposed a much better solution: let’s whack it all in Excel and let me make some pretty charts!

General Write-Up