Skip to main content
Blog Post 16 May 2019

Home Analytics: building a unique dataset

Energy Saving Trust’s Home Analytics service pulls together data on residential properties across Great Britain. It combines energy efficiency metrics with the full range of property attributes, geographical factors, such as region or rurality, and socio-demographic information, such as tenure and fuel poverty.

Combining this wide range of variables provides a unique dataset of homes throughout Great Britain, with detail available down to the level of individual addresses. This information can support local authorities and businesses in targeting intervention strategies, growth plans, product and energy efficiency initiatives.

We’ve got more of an overview of the service in our blog on targeting energy efficiency interventions. Read on for more on how we create the dataset.

How does Home Analytics work?

Most variables from Home Analytics come from using statistical modelling techniques. Statistical modelling means mapping the available data, writing formulas (the statistical models) that describe the associations between the data and then using those formulas to fill in the gaps where data is currently unavailable.

We combine two types of data: actual values – where we have concrete data to use – and modelled values, where we use formulas to produce the data.

We get the actual values from a variety of sources, such as:

  • Energy Performance Certificates (EPCs)
  • Home Energy Efficiency Database (HEED)
  • various national datasets (such as the English Housing Survey, Census data or Index of Multiple Deprivation data)
  • proprietary Energy Saving Trust data from the Home Energy Check tool
  • geospatial data from Ordnance Survey

The problem is that all these datasets list their records in different formats, which means it cannot be easily combined.

The first step towards making sense of this messy data is matching each address to its correct Unique Property Reference Number, known as address matching. After address matching, we can move on to cleaning up the data. This involves:

  • bringing the records up to date and ensuring we use the most recent data
  • cross referencing to other data sets
  • correcting for skews and other errors in raw data by applying different logical assumptions
  • sorting data into useful categories. For example, we’ll condense the dozens of wall description categories from the EPC database into four significant categories

The raw data from the available records typically represents about 45% of the housing stock. We then use this data to model the remaining properties for which there is no current available data, to provide a more complete overview.

Last but not least, we compare the final distributions for each variable, such as tenure, fuel type and so on, with regional or national datasets and we calibrate the data if necessary. For example, we will use the national Census to check our modelling of the final property tenure i.e. whether a home is a social rental, private rental, privately owned and so on, in order to eliminate any possible biases.

We use a range of other modelling techniques in addition to our statistical modelling, including:

  • Combining a variety of the dataset’s values to create new variables, for example working out how suitable renewable technologies are for a property based on its location, energy efficiency, size and type
  • Spatial modelling for variables that are too difficult to predict using statistical models. For example, we’ll use geographic information system software and algorithms to analyse a building’s position in relation to other buildings, to understand whether it’s a detached, semi-detached or terraced house
  • Extrapolating aggregate values from larger geographical levels down to the address level (eg in the case of Census data, such as population in poor health)

By using these modelling techniques, we are able to assemble a dataset with both very detailed and varied information on the entire GB housing stock.

Find out more…

Energy Saving Trust’s Home Analytics service is in the unique position of combining these large number of data points to provide targeted insight.