RAPping in the public sector (long version)

Published

July 27, 2021

Straggling legacy scripts (often SAS) requiring manual steps are a common ‘feature’ of data analysis in the public sector. However, refactoring such scripts to modern, reproducible analytical pipelines (RAP) can be challenging due to a lack of IT infrastructure or high complexity. In such situations, interim solutions can at least reduce manual effort and mental overhead.

The first part of the talk will present one type of "interim" solution: using Python as a glue to create one-click execution pipelines. Manual tasks like downloading data from email, updating new data file names in scripts, running scripts in sequence etc. can be managed with Python and its rich ecosystem of packages. In this talk, I will showcase how three Python packages: exchangelib, jupyter and saspy - can create quick and easy automated versions of legacy SAS code that contain many types of manual steps.

The second part of the talk will briefly describe how the interim solution can be upgraded for longer term use with docker, pipeline orchestration tools and, writing tidy "online" documentation.

Slides