Big data takes flight By Ian Moore If just three more students choose a course in Science, Technology, Engineering or Maths (STEM) after taking the British Science Association’s (BSA) CREST Silver Award each year, then the programme delivers more societal benefits than it costs to run. This was the key takeaway from the Pro Bono Economics report – “Graduate Earnings and the STEM Premium” – produced on behalf of the BSA and released in June this year (see here). Beyond this headline message however, the report’s publication offers reason to reflect on the journey Pro Bono Economics has made, and the data sets it relies on, since launching ten years ago. Having first engaged with BSA back in 2016, we then prepared an analysis for the charity showing that students who took Silver CREST achieved half a grade higher on their best science GCSE result compared to a statistically matched control group (see here). How did we go from this conclusion to the one at the heart of our latest report? In the 10 years since Pro Bono Economics was formed, the range of datasets available to economists doing cost benefit evaluations has expanded hugely, surfacing new insights about the cost effectiveness of social programmes in education (and other areas). Typically these datasets are directly drawn from government administrative datasets – and they can be found across the health, justice, education and many other spheres. A decade ago a trawl of the literature might yield some partial insights around similar programmes but economists had to work out if the same benefits could accrue in slightly different contexts such as a different geography or age group. This was particularly difficult to judge if there was a shift in policy or economic context. These challenges led to many caveats and/or heroic assumptions. In 2019, we were able to use the publicly available summary statistics from the Longitudinal Educational Outcomes (LEO) dataset to take the analysis further. With it we can establish the size of the “STEM premium”; how much is the uplift in earnings for those studying STEM at university versus non-STEM subjects? The answer is – a lot! Enough, in fact, to give us the headline above. Unfortunately, the detailed LEO dataset is not directly accessible - yet. But the Department for Education has confirmed they are working on it. When that happens, it will unlock the ability to do a true longitudinal cohort study, with a propensity matched control group and results covering both education attainment and consequent earnings profiles. Put simply, being able to access these datasets is creating far better assessments of impact than those available 10 years ago. But what of the next decade? Instead of answering the question “Does this intervention work”, or “does it work better”, we might get closer to answering “Is this intervention more or less cost effective than other ones?”. With this, we could then start to allocate scarce resources more effectively on the basis of evidence. As proponents of evidence-based policy making, we at Pro Bono Economics will be taking advantage of the opportunities for enhanced analysis these datasets provide. Of course, we will need economists who can also speak data science to do so. And for government, I’d say – let’s have more data please!  The CREST Silver programme includes around 30 hours of project work and is typically undertaken by pupils at the end of Key Stage 3 or start of Key Stage 4.