This paper outlines an extensive study of applying machine learning to the analysis of publicly available health and social care data within Scotland, with a focus on learning the most significant variables involved in key health care outcome factors, such as for male life expectancy and premature deaths. It uses the publicly available data set from ScotPHO Profiles and uses the important metrics from the Profiles for the training. The paper analyses 56 routinely available variables based on local authority regions within Scotland, and then uses linear regression to match them to health risks. A forest regression method is then used to find the best prediction for machine learning methods. Each training variable is then trained against three other variables, which provides 26,235 different models. These models are later assessed for their success using the complete dataset. The top models are assessed for the metrics used. A frequency analysis method is finally used to determine the most defined variables for each of the variables being trained against. The results outline the significant factors that match to key health care objectives using a best match machine learning method. Other variables are however more gender-specific for example crime rates in men and claiming pension credits in women for life expectancy. There is a range of success scores for the variables, with many giving a success rate of over 87%. Along with this, there are several significant findings, and a key one is that obesity at primary school has a strong relationship with deaths for those 15-44 years old. In conclusion, the method provides a way of analysing open-source data and provides new insights into contributory factors within the health and social care conditions. It provides a ranked listing of the matches of variables to health and social care factors, and also an ordered list of the most significant variables. These can be used to further focus on health population surveys. Strengths and limitations of this study are: New methodology in the assessment of variables within health and social care and their linkages with gathered health assessment metrics, using machine learning; Processing time-efficient time for the selection of every possible model for 56 variables; New observations found within variables for health and social care conditions; Scope identifies local authority regions in Scotland, which ranges from highly populated areas, such as within cities, and less populated areas. Metrics gathered can vary across different countries, such as in England; and short-listing of key variables for health and social care related metrics.
Buchanan, W. J., Smales, A., Lawson, A., & Chute, C. (2019, November). Machine Learning for Health and Social Care Demographics in Scotland. Paper presented at HEALTHINFO 2019, Valencia, Spain