Introduction

This document details the data cleaning and analysis applied to gain reported results. Supplementary results are also presented here.

Structure

  1. Preliminaries
  2. Data Cleaning
  3. Descriptive statistics
  4. Model results

1.Preliminaries

1.1 Packages

Load required packages, additional functions, and set themes. additional comments are provided in the code chunk.

library(finalfit)  #Summary table
library(cowplot)   #for multiplots
library(grid)      #for multiplots
library(gridExtra) #for multiplots
library(tidyverse) #Meta package including dplyr, ggplot2 etc.
library(lubridate) #Dates (not loaded with tidyverse)
library(broom)     #Tidy model coefficients and fit stats
library(margins)   #Calculate Average Marginal Effects from models
library(DT)        #Print pretty tables in html with csv buttons
library(ggthemes)  #scale_*_ptol() argument for plot colour scheme
library(ggrepel)   #plot labelling
library(rlang)     #for exprs() function used below

#Set the base plotting options
theme_set(theme_minimal(base_size = 16) + 
            theme(panel.grid.minor = element_blank()))

#My own helper functions
`%nin%` <- Negate(`%in%`) #negate the %in% function
source("Research/functions/round_df.R") #Round-up all numeric values in a df
source("Research/functions/my_datatable.R") #quick datatable with csv button
source("Research/functions/meds_pct_plot.R")#specific desc plot for medicines
source("Research/functions/simd_pct_plot.R")#specific desc plot for SIMD
source("Research/functions/sex_pct_plot.R") #specific desc plot for sex


#From @kjhealy's book "Data visualisation"
#Two functions to tidy up variable names in model outputs
source("Research/functions/prefix_funcs.R")  

#As of October 2018 I am using margins v0.3.23. An update in 
#August 2018 included the below function to summarise the margins command
#in a way that can be mapped using purrr::map. This has not been pushed
#to CRAN yet and there will be delay even after it is before it will be available
#in the safe haven. It is fairly simple so have recreated it here
source("Research/functions/margins_summary.R")

#Set labels for financial years to be used in plots
year_labels <- c(`2015-03-31` = "2014/15", 
                 `2016-03-31` = "2015/16")

#Finally an expression that can be used programatically with case_when()
#note: needs library(rlang)
year_char <- exprs(
           year == ymd(20110331) ~ "2010/11",
           year == ymd(20120331) ~ "2011/12",
           year == ymd(20130331) ~ "2012/13",
           year == ymd(20140331) ~ "2013/14",
           year == ymd(20150331) ~ "2014/15",
           year == ymd(20160331) ~ "2015/16",
           year == ymd(20170331) ~ "2016/17"
)

1.2 Load data

Main data frame created in thesis project “Multimorbidiy and social care: Exploiting emerging adminstrative datasets in Scotland”.

load("Research/created_objects/clean_data/sc_mm_df.rds")

2. Data Cleaning

2.1 Drop variables and rows

The thesis project contains more data than currently required. As described in the methods section, only required variables for two years of data (2014/15 & 2015/16) are kept.

df %<>%
  select(index,       #ID variable
         year,        #Financial year
         sex,         #Male/Female
         age,         #at 31st March in each financial year
         age_grp,     #5 year bands 65-69, 70-74 etc. to 95 plus
         dod,         #Date of death (if present in fin year otherwise NA)
         simd,        #Deprivation decile of residence 1(deprived)-10(affluent)
         care_home,   #Indicator if resident in a care home
         scs_flag,    #Flag if individual has any record in Social Care Survey
         total_meds,  #Total number or repeat medicines prescribed in fin year
         total_ch) %>% #Number of BNF chapters these meds are prescribed from
  filter(year %in% c(ymd(20150331), ymd(20160331))) #Keep only these years

2.2 Tidy flags

Some of the flags require additional cleaning here - specifically coercing from integer to factor format. NA values in the care home, social care, total medicines, and total BNF chapter variables are replaced with zero here also.

df %<>%
  mutate(
    #Replace NA values with zero
    care_home = replace_na(care_home, "0"),
    #factorise care home
    care_home = factor(
      care_home,
      levels = c("0", "1"),
      labels = c("Not Care Home", "Care Home")
    ),
    #Replace NA values with zero
    scs_flag = replace_na(scs_flag, "0"),
    #factorise scs_flag
    scs_flag = factor(
      scs_flag,
      levels = c("0", "1"),
      labels = c("No Social Care", "Social Care")
    ),
    #replace NA with zero in count variables
    total_meds = if_else(is.na(total_meds) | total_meds == 0, 0L,
                         as.integer(total_meds)),
    total_ch = if_else(is.na(total_ch) | total_ch == 0, 0L,
                       as.integer(total_ch))
  )

2.3 Collapse BNF chapters

BNF chapters are used as the proxy measure of multimorbidity in the primary models. Higher numbers of BNF chapters are less frequent so are collapsed down into one category. The decision where to collapse was based by looking for as an even a distribution as possible.

df %<>%
  mutate(total_ch = fct_collapse(
    factor(total_ch),
    `6+` = c("6", "7", "8", "9", "10", "11", "12", "13")
  ))

The most even fit is to collapse everything above 6 chapters into one level. This plot shows the distribution.

df %>%
  ggplot(aes(total_ch)) +
  geom_bar(aes(y = ..prop.., group = 1)) +
  scale_y_continuous(limits = c(0, 0.2),
                     labels = scales::percent_format(accuracy = 1)) +
  facet_wrap(~ year, labeller = as_labeller(year_labels)) +
  labs(y = "",
       x = "Number of BNF chapters meds prescribed from")

2. 4 Add medicine group

Here the medicine group variable used in sub-analysis 1 (not reported in the main paper) is created here and called meds_grp. The aim here was also to get groups of roughly similar size in the population (as values are integers algorithms were not useful here). The end result is four groups: 0-2, 3-5, 6-8, and 9plus medicines.

#create meds_grp variable
df %<>%
  mutate(meds_grp = factor(
    case_when(
      total_meds <= 2 ~ "0-2",
      total_meds >= 3 & total_meds <= 5 ~ "3-5",
      total_meds >= 6 & total_meds <= 8 ~ "6-8",
      total_meds >= 9  ~ "9+"
    )
  ))

A bar plot shows the distribution in each year of data. A zero category would be nice but skews the distribution making similar sized groups impossible.

#Show distribution
df %>%
  ggplot(aes(meds_grp)) +
  geom_bar(aes(y = ..prop.., group = 1)) +
  scale_y_continuous(limits = c(0, 0.3),
                     labels = scales::percent_format(accuracy = 1)) +
  facet_wrap(~ year, labeller = as_labeller(year_labels)) +
  labs(y = "",
       x = "Repeat medicine group")

2.5 Add a flag for multimorbidity

Multimorbidity is defined as prescribing from more than two BNF chapters. A flag denoting those with multimorbidity is created here. As mentioned in the paper, BNF chapter counts are a crude measure of body systems treated.

df %<>% 
  mutate(mm = if_else(total_ch %nin% c("0", "1"),
                      "multimorbid", "not mm"),
         mm = factor(mm))

3. Descriptive statistics

3. 1 Table 1 - Characteristics of cohort

Using the finalfit package to create table 1. The process is started here but further numbers still need to be added. The output from this chunk of code is a count of the total number of unique individuals across both years of data.

# Coerce the year variable to a character and then factor
# in order to play nicely with finalfit
df$year <- as.character(df$year)
df$year <- factor(df$year)

#Set variables
dependent = "year"
explanatory = c("is.na(dod)", "sex", "age", "age_grp", "simd", "scs_flag",
                "care_home", "total_meds", "meds_grp", "total_ch")

#Create table and assign to a new object: tab_main
df %>% 
  summary_factorlist(dependent, explanatory, 
                     column = TRUE, 
                     add_dependent_label = TRUE, 
                     dependent_label_prefix = "") -> tab_main

#Now calculate total number of individuals across both years
df %>% 
  summarise(n = n_distinct(index))

Now add a count for each year and then add that to the overall table. Pressing the CSV button will automatically download the data to your machine.

df %>% 
  group_by(year) %>%
  summarise(n = n()) %>% 
  ungroup %>% 
  mutate(n = as.character(n)) %>% 
  spread(year, n) %>% 
  mutate(year = "n") %>%  
  bind_rows(., tab_main) %>% 
  select(c(3:4), everything()) -> tab_main

my_datatable(tab_main)

3.2 Drop care home residents

3.2.1 Save full data frame

Save df_all as an object before dropping rows for further analysis. This is used in sub-analyses later.

df_all <- df
save(df_all, file = "Research/created_objects/clean_data/sc_mm_df_all.rds")
rm(df_all) # remove from memory for now

3.2.2 Drop rows

Those resident in a care home are dropped from analysis here as they are a distinct population and not the focus of the study. These rows account for ~2.2% of the total population and ~8.6% of those receiving social care - most likely free personal care.

This chunk is also used for convenience to coerce the year variable back to a date format.

df %<>%
  filter(care_home == "Not Care Home") %>% 
  mutate(year = ymd(year))

3.3. Table 2 - Characteristics by multimorbidity

Breakdown by multimorbidity (measured by two or more BNF chapters) using the previously created flag. Only data for 2015/16 is reported for the main paper. 2014/15 results are also shown in this table (and are similar across all variables).

#Name variables
explanatory = c("sex", "age_grp", "simd", "scs_flag")
dependent = "mm"
#Nest dataframe by year then apply summary_factorlist()
df %>% 
  group_by(year) %>%
  nest() %>% 
  mutate(tab = map(data,
                   summary_factorlist,
                   dependent, explanatory,
                   na_include = TRUE, column = TRUE,
                   add_dependent_label = TRUE,
                   dependent_label_prefix = "", 
                   total_col = TRUE)) %>%
  select(-data) %>%
  ungroup -> sc_table_2

#Pull correct labels out into a vector
V1_labels <- sc_table_2$tab[[1]][[2]]
#Apply these labels repeatedly for the 2 year of data and
#drop uneccessary empty variables. Also rename year data to better labels
sc_table_2 %<>% 
  unnest(cols = tab, names_repair = tidyr_legacy) %>% 
  arrange(year) %>% 
  mutate(V1 = rep(V1_labels,2),
         year = case_when(!!!year_char)) 

my_datatable(sc_table_2)

3.4 Table 3 - Characteristics by outcome variable

Now print the descriptive table using receipt of social care as outcome.

#Name variables
explanatory = c("sex", "age_grp", "simd", "meds_grp", "total_ch")
dependent = "scs_flag"
#Nest dataframe by year then apply summary_factorlist()
df %>% 
  group_by(year) %>%
  nest() %>% 
  mutate(tab = map(data,
                   summary_factorlist,
                   dependent, explanatory,
                   na_include = TRUE, column = TRUE,
                   add_dependent_label = TRUE,
                   dependent_label_prefix = "", 
                   total_col = TRUE)) %>%
  select(-data) %>%
  ungroup -> sc_table_3
#Pull correct labels out into a vector
V1_labels <- sc_table_3$tab[[1]][[2]]
#Apply these labels repeatedly for the 2 years of data and
#drop uneccessary empty variables. Also rename year data to better labels
sc_table_3 %<>% 
  unnest(cols = tab, names_repair = tidyr_legacy) %>% 
  arrange(year) %>% 
  mutate(V1 = rep(V1_labels,2),
         year = case_when(!!!year_char)) 

my_datatable(sc_table_3)

3. 5 Figure 1

Visualise receipt of social care by age acrsoss three variables: sex, deprivation, and multimorbidity for the year 2015/16 only (2014/15 has very similar patterns)

Each plot is created seperately with previously defined functions and then merged into one plot utilising the cowplot package.

#Plot by sex
#Summarise data -> counts and percentages at each age
df %>%
  filter(year == ymd(20160331)) %>%
  group_by(age, sex, scs_flag) %>%
  count %>%
  group_by(age, sex) %>%
  mutate(pct = n / sum(n)) %>%
  ungroup %>%
  #Keep only those receiving social care and under 95
  filter(scs_flag == "Social Care" & age <= 95) %>%
  #apply plot function
  sex_pct_plot(., age, pct, limits = c(0, 0.65)) +
  #customise axis labels and legend
  labs(colour = "",
       y = "",
       x = "") +
  theme(
    legend.position = "top",
    legend.text = element_text(size = 12),
    legend.title = element_text(size = 12)
  )  -> age_sex_sc_plot

#Plot by deprivation
#Summarise data -> counts and %
df %>%
  filter(year == ymd(20160331)) %>%
  group_by(age, simd, scs_flag) %>%
  count %>%
  group_by(age, simd) %>%
  mutate(pct = n / sum(n)) %>%
  ungroup %>%
  #keep only those receiving social care and under 95
  filter(scs_flag == "Social Care" & age <= 95) %>%
  #apply plot function
  simd_pct_plot(., age, pct, limits = c(0, 0.65), scale_alpha = FALSE) +
  #custom plot labels
  geom_label_repel(data = . %>% filter(simd == "1" & age == 73), 
                   aes(label = "1 - most deprived"), 
                   colour = "#332288", nudge_y = 0.05, nudge_x = -3) +
  geom_label_repel(data = . %>% filter(simd == "10" & age == 81), 
                   aes(label = "10 - most affluent"), 
                   colour = "#AA4499", nudge_y = -0.05, nudge_x = 5) +
  #custom axis labels and legend
  labs(x = "",
       y = "",
       colour = "SIMD decile") +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 8)) +
  scale_colour_ptol(
    guide = guide_legend(nrow = 1),
    labels = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")) +
  theme(
    legend.position = "top",
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 10)
  ) -> age_simd_sc_plot
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
#Plot by multimorbidity
#summarise data -> counts and %
df %>%
  filter(year == ymd(20160331)) %>%
  group_by(age, total_ch, scs_flag) %>%
  count %>%
  group_by(age, total_ch) %>%
  mutate(pct = n / sum(n)) %>%
  ungroup %>%
  #keep only those with social care and under 95
  filter(scs_flag == "Social Care" & age <= 95) %>%
  #apply plot function
  meds_pct_plot(., age, pct, limits = c(0, 0.65)) +
  #custom axis labels and legend
  labs(x = "",
       y = "",
       colour = "BNF chapters") +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 8)) +
  scale_colour_ptol(guide = guide_legend(nrow = 1),
                    labels = c("0", "1", "2", "3", "4", "5",
                               "6+")) +
  theme(
    legend.position = "top",
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 10)
  ) -> age_meds_sc_plot
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
#Combine plots together
p1 <- plot_grid(
  age_sex_sc_plot,
  age_simd_sc_plot,
  age_meds_sc_plot,
  nrow = 1,
  align = "h"
)

#Single x axis title
x_grob <- textGrob("Age", gp = gpar(fontface = "bold", fontsize = 16))
#Combine to p1
p1 <- grid.arrange(arrangeGrob(p1, bottom = x_grob))

#Now add title 
title <- ggdraw() +
  draw_label(
    "Percentage of individuals at specific ages receiving any form of social care\nby sex, by SIMD decile, and by multimorbidity group. 2015/16",
    fontface = "bold",
    size = 18
  )
p1 <- plot_grid(title, p1, ncol = 1, rel_heights = c(0.1, 1))
#Add caption
p1 <-
  add_sub(
    p1,
    "over 95s removed due to small numbers",
    x = 1,
    hjust = 1,
    size = 10
  )
p1 <- ggdraw(p1)
p1

4. Models results

4. 1 Main reported model.

Presence in Social care survey as outcome. Independent variables and interactions as commented in the code chunk below. R automatically fits each individual variable seperately before applying interactions. Both years of data with those living in a care home removed.

4.1.1. Fit model

Model data. Logistic regression with Average Marginal Effects (AME) calculated. To do this over both years of data the dataframe is “nested” and then the model function is applied over the column containing data for each year. New columns are created with model coefficients and AMEs in a tidy fashion so they can be easily tabulated and plotted.

df %<>% 
  #keep only required variables for the model
  select(index, year, scs_flag, sex, age_grp, simd, total_ch) 
df$simd <- fct_rev(df$simd) #Reverse order of SIMD so Decile 10 is reference
start <- Sys.time() #Set timer start

#Create a function for the model
sc_model <- function(my_df){
  glm(scs_flag ~                 #The formula for the model
        sex*age_grp +            #Interaction: Sex and age group       
        simd*total_ch +          #Interaction: SIMD and total BNF chapters
        age_grp*total_ch,        #Interaction: Age group and BNF chapters
      family = binomial(),       #Logistic regression
      data = my_df)              #identify which data to use
  }
 

sc_mm_mod_1 <- 
  df %>%  #Main dataframe
  group_by(year) %>% 
  #Nest data for each year
  nest() %>% 
  #Create model column using above function
  mutate(mod_var = map(data, sc_model),
         #Tidy coefficients using broom::tidy into a column
         tidy_var = map(mod_var, tidy), 
         #compute AMEs and add as a column
         marginals = map2(mod_var, data, ~margins_summary(.x, data = .y))) 

print(Sys.time()-start)

The above code chunk took 12hrs to run - mostly in calculating variances for marginal effects.

4.1.2 Main analysis - Tabulate Odds Ratios

This next chunk unnests the tidied model coefficients from both years of data, calculates odds-ratios (and CIs) for all coefficients, then tidies up labels for the resulting table.

sc_mm_mod_1 %>% 
  select(year, tidy_var) %>%
  unnest(cols = tidy_var, names_repair = tidyr_legacy) %>%
  #exponentiate the coefficents for OR
  mutate(odds_ratio = exp(estimate),  
         #Calculate Lower CI and exponentiate
         conf_int_low = exp(estimate - 1.96*std.error),
         #Calculate Upper CI and exponentiate
         conf_int_high = exp(estimate + 1.96*std.error)) %>% 
  #round to three decimal places
  round_df(., dig = 3) %>% 
  #Tidy up p-values so they don't show as "0"
  mutate(p.value = as.character(p.value),
         p.value = if_else(p.value == "0", "<0.001", p.value)) -> tidy_tab

#Tidy up labels
prefixes <- c("sex", "scs_flag")
tidy_tab$term <- prefix_strip(tidy_tab$term, prefixes)
tidy_tab$term <- prefix_replace(tidy_tab$term, "Total_ch",
                                       "BNF chpts: ")
tidy_tab$term <- prefix_replace(tidy_tab$term, 
                                       "Age_grp", "Age: ")
tidy_tab$term <- prefix_replace(tidy_tab$term,
                                       "Simd", "SIMD Decile: ", 
                                       toTitle = FALSE)
tidy_tab %<>% 
  mutate(year = case_when(!!!year_char))

my_datatable(tidy_tab)

4.1.3. Main analysis - Tabulate goodness-of-fit

Here (utilising broom::glance) an additional column is added to the main model object with fit statistics for both models.

sc_mm_mod_1 %<>% 
  mutate(glance_var = map(mod_var, glance))

Now manually calcluate McFadden’s pseudo \(R^{2}\) which has the formula:

\[R^{2}_{McFadden} = 1 - \frac {ln(LM_{1})}{ln(LM_{0})} \]

Where \(ln(LM_{1})\) is the log likelihood value for the fitted model and \(ln(LM_{0})\) is the log likelihood for the null model with only an intercept as a predictor.

Calculate the pseudo \(R^2\), then rearrange variable order- Print this out.

sc_mm_mod_1 %>% 
  ungroup %>%
  select(year, glance_var) %>%
  unnest(cols = glance_var, names_repair = tidyr_legacy) %>% 
  mutate(pseud_r2 = 1 - (logLik / (null.deviance / -2)), #Calculate McFadden's 
         year = case_when(!!!year_char),
         null_LogLik = null.deviance/-2) %>% 
  select(year, pseud_r2, deviance, logLik, df.residual,
         null.deviance, null_LogLik, df.null, AIC, BIC) %>% 
  #round to three sig figs
  round_df(dig = 3) -> gof_tab

my_datatable(gof_tab)

Anything with pseudo \(R^2\) 0.2-0.4 is considered a very good fit

4.1.4 Main analysis - Tabulate and Plot AMEs

Now look at AMEs. Unnest the values from the model object and then print out the table

sc_mm_mod_1 %>%  
  select(year, marginals) %>%
  unnest(marginals, names_repair = tidyr_legacy) %>% 
  #round to 3 sig fig
  round_df(dig = 3) %>% 
  arrange(year) -> sc_mm_marg_1

#Tidy up labels
prefixes <- c("sex")
sc_mm_marg_1$factor <- prefix_strip(sc_mm_marg_1$factor, prefixes)
sc_mm_marg_1$factor <- prefix_replace(sc_mm_marg_1$factor, "Total_ch",
                                       "BNF chpts: ")
sc_mm_marg_1$factor <- prefix_replace(sc_mm_marg_1$factor, 
                                       "Age_grp", "Age: ")
sc_mm_marg_1$factor <- prefix_replace(sc_mm_marg_1$factor,
                                       "Simd", "SIMD Decile: ", 
                                       toTitle = FALSE)
sc_mm_marg_1 %<>% 
  mutate(year = case_when(!!!year_char),
         p = as.character(p),
         p = if_else(p == "0", "<0.001", p))

my_datatable(sc_mm_marg_1)

Rather than comparing lots of figures over two models it is always easier to visualise these values….

sc_mm_marg_1 %>% 
  ggplot(aes(factor(factor), AME, 
             ymin = lower, ymax = upper,
             colour = factor(year))) +
  geom_hline(yintercept = 0, colour = "#555555") +
  geom_pointrange(fatten = 6, position = position_jitter(width = 0.3)) +
  coord_flip() +
  scale_colour_ptol(labels = year_labels) +
  scale_y_continuous(limits = c(-0.01, 0.55),
                     breaks = scales::pretty_breaks(n = 8),
                     labels = scales::percent_format(accuracy = 1)) +
  theme(legend.position = "top",
        plot.title = element_text(hjust = 0.5)) +
  labs(x = "",
       y= "Average Marginal Effect", 
       colour = "",
       title = "Influence of variables on receiving social care",
       caption = "Ref group SIMD: Decile 10\nRef group BNF Chpts: 0\nRef group Age: 65-69") -> sc_mm_bnf_AME

sc_mm_bnf_AME

4.2. Sub-analysis 1

The first sub-analysis is a repeat of the main reported model but replaces the BNF chapter count with the variable showing repeat medicine groups.

4.2.1 Sub-analysis 1 - Model fit

The code is virtually identical but printed here for clarity

df %<>% 
  #keep only required variables for the model
  select(index, year, scs_flag, sex, age_grp, simd, meds_grp) 
df$simd <- fct_rev(df$simd) #Reverse order of SIMD so Decile 10 is reference
start <- Sys.time()

#Create a function for the model
sc_model <- function(my_df){
  glm(scs_flag ~                   #The formula for the model
        sex*age_grp +              #Interaction: Sex and age group
        simd*meds_grp +            #Interaction: SIMD and repeat medicine group
        age_grp*meds_grp,          #Interaction: Age group and repeat medicines
      family = binomial(),         #Logistic regression
      data = my_df)                #identify which data to use
  }
 

sc_mm_mod_2 <- 
  df %>%  #Main dataframe
  group_by(year) %>% 
  #Nest data for each year
  nest() %>% 
  #Create model column using above function
  mutate(mod_var = map(data, sc_model),
         #Tidy coefficients using broom::tidy into a column
         tidy_var = map(mod_var, tidy), 
         #compute AMEs and add as a column
         marginals = map2(mod_var, data, ~margins_summary(.x, data = .y))) 

print(Sys.time()-start)

This took 5 hrs. Much shorter than the previous model as there are four, as opposed to 7, values in the medicine variable which exponentially reduces the calculations for interaction terms.

4.2.2 Sub-analysis 1. Tabulate Odds ratios

Again, for completeness…

sc_mm_mod_2 %>% 
  select(year, tidy_var) %>% 
  unnest(tidy_var, names_repair = tidyr_legacy) %>% 
  #exponentiate the coefficents for OR
  mutate(odds_ratio = exp(estimate),  
         #Calculate Lower CI and exponentiate
         conf_int_low = exp(estimate - 1.96*std.error),
         #Calculate Upper CI and exponentiate
         conf_int_high = exp(estimate + 1.96*std.error)) %>% 
  round_df(., dig = 3) %>% 
  mutate(p.value = as.character(p.value),
         p.value = if_else(p.value == "0", "<0.001", p.value)) -> tidy_tab_2

#Tidy up labels
prefixes <- c("sex", "scs_flag")
tidy_tab_2$term <- prefix_strip(tidy_tab_2$term, prefixes)
tidy_tab_2$term <- prefix_replace(tidy_tab_2$term, "Total_ch",
                                       "BNF chpts: ")
tidy_tab_2$term <- prefix_replace(tidy_tab_2$term, 
                                       "Age_grp", "Age: ")
tidy_tab_2$term <- prefix_replace(tidy_tab_2$term,
                                       "Simd", "SIMD Decile: ", 
                                       toTitle = FALSE)
tidy_tab_2 %<>% 
  mutate(year = case_when(!!!year_char))

my_datatable(tidy_tab_2)

4.2.3 - Sub-analysis 1. Goodness-of-fit

Code is the same as used for the main model. Outputting results only here.

my_datatable(gof_tab_2)

Anything with pseudo \(R^2\) 0.2-0.4 is considered a very good fit

4.2.4 - Sub-analysis 1. Tabulate and plot AMEs

Sub-analysis 1 AMEs

my_datatable(sc_mm_marg_2)

And associated plot.

sub_an_1_plot

4.3 Sub-analysis 2

This second analysis applies the main model (using BNF chapters as the measure of multimorbidity) but includes all available observations i.e. those living in care homes are included.

4.3.1 Fit model

First of all, code to run the model

#Using df_all object created earlier
df_all %<>% 
  mutate(year = ymd(year)) %>% 
  #keep only required variables for the model
  select(index, year, scs_flag, sex, age_grp, simd, total_ch) 
df_all$simd <- fct_rev(df_all$simd) #Reverse order of SIMD so Decile 10 is reference
start <- Sys.time()

#Create a function for the model
sc_model <- function(my_df){
  glm(scs_flag ~                 #The formula for the model
        sex*age_grp +            #Interaction: Sex and age group       
        simd*total_ch +          #Interaction: SIMD and total BNF chapters
        age_grp*total_ch,        #Interaction: Age group and BNF chapters
      family = binomial(),       #Logistic regression
      data = my_df)              #identify which data to use
  }
 

sc_mm_mod_3 <- 
  df_all %>%  #Main dataframe
  group_by(year) %>% 
  #Nest data for each year
  nest() %>% 
  #Create model column using above function
  mutate(mod_var = map(data, sc_model),
         #Tidy coefficients using broom::tidy into a column
         tidy_var = map(mod_var, tidy), 
         #compute AMEs and add as a column
         marginals = map2(mod_var, data, ~margins_summary(.x, data = .y))) 

print(Sys.time()-start)

12 hrs to complete this model

4.3.2 Sub-analysis 2. Tabulate Odds ratios

Output the table of odds-ratios.

my_datatable(tidy_tab_3)

4.3.3 Sub-analysis 2 Tabulate Goodness-of-fit

Print out table with goodness-of-fit

my_datatable(gof_tab_3)

Anything with pseudo \(R^2\) 0.2-0.4 is considered a very good fit.

4.3.4 Sub-analysis 2. Tabulate and plot AMEs

Print out table of marginal effects

my_datatable(sc_mm_marg_3)
sc_mm_suban_3

4.4 Sub analysis 3

The final sub-analysis runs the main model again, but with a subset where those in a care home are included, but those that died during the financial year on question are ommitted.

4.4.1 Fit model

Model code…

#Again using df_all object
df_all %<>% 
  #Drop all those that died
  filter(is.na(dod)) %>% 
  mutate(year = ymd(year)) %>% 
  #keep only required variables for the model
  select(index, year, scs_flag, sex, age_grp, simd, total_ch) 
df_all$simd <- fct_rev(df_all$simd) #Reverse order of SIMD so Decile 10 is reference
start <- Sys.time()

#Create a function for the model
sc_model <- function(my_df){
  glm(scs_flag ~                 #The formula for the model
        sex*age_grp +            #Interaction: Sex and age group       
        simd*total_ch +          #Interaction: SIMD and total BNF chapters
        age_grp*total_ch,        #Interaction: Age group and BNF chapters
      family = binomial(),       #Logistic regression
      data = my_df)              #identify which data to use
  }
 

sc_mm_mod_4 <- 
  df_all %>%  #Main dataframe
  group_by(year) %>% 
  #Nest data for each year
  nest() %>% 
  #Create model column using above function
  mutate(mod_var = map(data, sc_model),
         #Tidy coefficients using broom::tidy into a column
         tidy_var = map(mod_var, tidy), 
         #compute AMEs and add as a column
         marginals = map2(mod_var, data, ~margins_summary(.x, data = .y))) 

print(Sys.time()-start)

11 hrs to run this model

4.4.2 Sub-analysis 3 Tabulate Odds Ratios

Print out table with Odds-ratios.

my_datatable(tidy_tab_4)

4.4.3 Sub-analysis 3 Tabulate Goodness-of-fit

Print out goodness of-fit table for sub_analysis 3

my_datatable(gof_tab_4)

Anything with pseudo \(R^2\) 0.2-0.4 is considered a very good fit.

4.4.4 Sub-analysis 3 Tabulate and plot AMEs

Print out table of marginal effects for sub-analysis 3

my_datatable(sc_mm_marg_4)
sc_mm_suban_4

5. Session Information

devtools::session_info()
## - Session info ---------------------------------------------------------------
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       Windows Server 2012 R2 x64  
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United Kingdom.1252 
##  ctype    English_United Kingdom.1252 
##  tz       Europe/London               
##  date     2020-03-18                  
## 
## - Packages -------------------------------------------------------------------
##  package      * version  date       lib source        
##  acepack        1.4.1    2016-10-29 [1] CRAN (R 3.6.1)
##  assertthat     0.2.1    2019-03-21 [1] CRAN (R 3.6.1)
##  backports      1.1.5    2019-10-02 [1] CRAN (R 3.6.1)
##  base64enc      0.1-3    2015-07-28 [1] CRAN (R 3.6.0)
##  boot           1.3-22   2019-04-02 [2] CRAN (R 3.6.1)
##  broom        * 0.5.2    2019-04-07 [1] CRAN (R 3.6.1)
##  callr          3.3.2    2019-09-22 [1] CRAN (R 3.6.1)
##  cellranger     1.1.0    2016-07-27 [1] CRAN (R 3.6.1)
##  checkmate      1.9.4    2019-07-04 [1] CRAN (R 3.6.1)
##  cli            1.1.0    2019-03-19 [1] CRAN (R 3.6.1)
##  cluster        2.1.0    2019-06-19 [2] CRAN (R 3.6.1)
##  colorspace     1.4-1    2019-03-18 [1] CRAN (R 3.6.1)
##  cowplot      * 1.0.0    2019-07-11 [1] CRAN (R 3.6.1)
##  crayon         1.3.4    2017-09-16 [1] CRAN (R 3.6.1)
##  crosstalk      1.0.0    2016-12-21 [1] CRAN (R 3.6.1)
##  data.table     1.12.6   2019-10-18 [1] CRAN (R 3.6.1)
##  desc           1.2.0    2018-05-01 [1] CRAN (R 3.6.1)
##  devtools       2.2.1    2019-09-24 [1] CRAN (R 3.6.1)
##  digest         0.6.22   2019-10-21 [1] CRAN (R 3.6.1)
##  dplyr        * 0.8.3    2019-07-04 [1] CRAN (R 3.6.1)
##  DT           * 0.10     2019-11-12 [1] CRAN (R 3.6.1)
##  ellipsis       0.3.0    2019-09-20 [1] CRAN (R 3.6.1)
##  evaluate       0.14     2019-05-28 [1] CRAN (R 3.6.1)
##  fastmap        1.0.1    2019-10-08 [1] CRAN (R 3.6.1)
##  finalfit     * 0.9.5    2019-09-10 [1] CRAN (R 3.6.1)
##  forcats      * 0.4.0    2019-02-17 [1] CRAN (R 3.6.1)
##  foreign        0.8-71   2018-07-20 [2] CRAN (R 3.6.1)
##  Formula        1.2-3    2018-05-03 [1] CRAN (R 3.6.0)
##  fs             1.3.1    2019-05-06 [1] CRAN (R 3.6.1)
##  generics       0.0.2    2018-11-29 [1] CRAN (R 3.6.1)
##  ggplot2      * 3.2.1    2019-08-10 [1] CRAN (R 3.6.1)
##  ggrepel      * 0.8.1    2019-05-07 [1] CRAN (R 3.6.1)
##  ggthemes     * 4.2.0    2019-05-13 [1] CRAN (R 3.6.1)
##  glue           1.3.1    2019-03-12 [1] CRAN (R 3.6.1)
##  gridExtra    * 2.3      2017-09-09 [1] CRAN (R 3.6.1)
##  gtable         0.3.0    2019-03-25 [1] CRAN (R 3.6.1)
##  haven          2.1.1    2019-07-04 [1] CRAN (R 3.6.1)
##  Hmisc          4.3-0    2019-11-07 [1] CRAN (R 3.6.1)
##  hms            0.5.2    2019-10-30 [1] CRAN (R 3.6.1)
##  htmlTable      1.13.2   2019-09-22 [1] CRAN (R 3.6.1)
##  htmltools      0.4.0    2019-10-04 [1] CRAN (R 3.6.1)
##  htmlwidgets    1.5.1    2019-10-08 [1] CRAN (R 3.6.1)
##  httpuv         1.5.2    2019-09-11 [1] CRAN (R 3.6.1)
##  httr           1.4.1    2019-08-05 [1] CRAN (R 3.6.1)
##  jomo           2.6-10   2019-10-22 [1] CRAN (R 3.6.1)
##  jsonlite       1.6      2018-12-07 [1] CRAN (R 3.6.1)
##  knitr          1.26     2019-11-12 [1] CRAN (R 3.6.1)
##  labeling       0.3      2014-08-23 [1] CRAN (R 3.6.0)
##  later          1.0.0    2019-10-04 [1] CRAN (R 3.6.1)
##  lattice        0.20-38  2018-11-04 [2] CRAN (R 3.6.1)
##  latticeExtra   0.6-28   2016-02-09 [1] CRAN (R 3.6.1)
##  lazyeval       0.2.2    2019-03-15 [1] CRAN (R 3.6.1)
##  lifecycle      0.1.0    2019-08-01 [1] CRAN (R 3.6.1)
##  lme4           1.1-21   2019-03-05 [1] CRAN (R 3.6.1)
##  lubridate    * 1.7.4    2018-04-11 [1] CRAN (R 3.6.1)
##  magrittr       1.5      2014-11-22 [1] CRAN (R 3.6.1)
##  margins      * 0.3.23   2018-05-22 [1] CRAN (R 3.6.1)
##  MASS           7.3-51.4 2019-03-31 [2] CRAN (R 3.6.1)
##  Matrix         1.2-17   2019-03-22 [2] CRAN (R 3.6.1)
##  memoise        1.1.0    2017-04-21 [1] CRAN (R 3.6.1)
##  mice           3.6.0    2019-07-10 [1] CRAN (R 3.6.1)
##  mime           0.7      2019-06-11 [1] CRAN (R 3.6.0)
##  minqa          1.2.4    2014-10-09 [1] CRAN (R 3.6.1)
##  mitml          0.3-7    2019-01-07 [1] CRAN (R 3.6.1)
##  modelr         0.1.5    2019-08-08 [1] CRAN (R 3.6.1)
##  munsell        0.5.0    2018-06-12 [1] CRAN (R 3.6.1)
##  nlme           3.1-140  2019-05-12 [2] CRAN (R 3.6.1)
##  nloptr         1.2.1    2018-10-03 [1] CRAN (R 3.6.1)
##  nnet           7.3-12   2016-02-02 [2] CRAN (R 3.6.1)
##  pan            1.6      2018-06-29 [1] CRAN (R 3.6.1)
##  pillar         1.4.2    2019-06-29 [1] CRAN (R 3.6.1)
##  pkgbuild       1.0.6    2019-10-09 [1] CRAN (R 3.6.1)
##  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 3.6.1)
##  pkgload        1.0.2    2018-10-29 [1] CRAN (R 3.6.1)
##  plyr           1.8.4    2016-06-08 [1] CRAN (R 3.6.1)
##  prediction     0.3.14   2019-06-17 [1] CRAN (R 3.6.1)
##  prettyunits    1.0.2    2015-07-13 [1] CRAN (R 3.6.1)
##  processx       3.4.1    2019-07-18 [1] CRAN (R 3.6.1)
##  promises       1.1.0    2019-10-04 [1] CRAN (R 3.6.1)
##  ps             1.3.0    2018-12-21 [1] CRAN (R 3.6.1)
##  purrr        * 0.3.3    2019-10-18 [1] CRAN (R 3.6.1)
##  R6             2.4.1    2019-11-12 [1] CRAN (R 3.6.1)
##  RColorBrewer   1.1-2    2014-12-07 [1] CRAN (R 3.6.0)
##  Rcpp           1.0.2    2019-07-25 [1] CRAN (R 3.6.1)
##  readr        * 1.3.1    2018-12-21 [1] CRAN (R 3.6.1)
##  readxl         1.3.1    2019-03-13 [1] CRAN (R 3.6.1)
##  remotes        2.1.0    2019-06-24 [1] CRAN (R 3.6.1)
##  rlang        * 0.4.1    2019-10-24 [1] CRAN (R 3.6.1)
##  rmarkdown      1.17     2019-11-13 [1] CRAN (R 3.6.1)
##  rpart          4.1-15   2019-04-12 [2] CRAN (R 3.6.1)
##  rprojroot      1.3-2    2018-01-03 [1] CRAN (R 3.6.1)
##  rstudioapi     0.10     2019-03-19 [1] CRAN (R 3.6.1)
##  rvest          0.3.5    2019-11-08 [1] CRAN (R 3.6.1)
##  scales         1.0.0    2018-08-09 [1] CRAN (R 3.6.1)
##  sessioninfo    1.1.1    2018-11-05 [1] CRAN (R 3.6.1)
##  shiny          1.4.0    2019-10-10 [1] CRAN (R 3.6.1)
##  stringi        1.4.3    2019-03-12 [1] CRAN (R 3.6.0)
##  stringr      * 1.4.0    2019-02-10 [1] CRAN (R 3.6.1)
##  survival       2.44-1.1 2019-04-01 [2] CRAN (R 3.6.1)
##  testthat       2.3.0    2019-11-05 [1] CRAN (R 3.6.1)
##  tibble       * 2.1.3    2019-06-06 [1] CRAN (R 3.6.1)
##  tidyr        * 1.0.0    2019-09-11 [1] CRAN (R 3.6.1)
##  tidyselect     0.2.5    2018-10-11 [1] CRAN (R 3.6.1)
##  tidyverse    * 1.2.1    2017-11-14 [1] CRAN (R 3.6.1)
##  usethis        1.5.1    2019-07-04 [1] CRAN (R 3.6.1)
##  vctrs          0.2.0    2019-07-05 [1] CRAN (R 3.6.1)
##  withr          2.1.2    2018-03-15 [1] CRAN (R 3.6.1)
##  xfun           0.11     2019-11-12 [1] CRAN (R 3.6.1)
##  xml2           1.2.2    2019-08-09 [1] CRAN (R 3.6.1)
##  xtable         1.8-4    2019-04-21 [1] CRAN (R 3.6.1)
##  yaml           2.2.0    2018-07-25 [1] CRAN (R 3.6.0)
##  zeallot        0.1.0    2018-01-28 [1] CRAN (R 3.6.1)
## 
## [1] C:/Users/dhenderson_1617-0304/Documents/R/win-library/3.6
## [2] C:/Program Files/R/R-3.6.1/library