Data product perspective
My experience in a small public sector organisation has largely observed analyses outputs cobbled together into an email with a paragraph or two of explanation and then followed by plenty of back and forth to get the message or caveats well understood. This approach has many disadvantages: from asset confusion (which version has the right changes) to amnesia on both sides on the original request and context as emails exchanges request change upon change. These experiences are not a standard but I suspect they happen more often than not leading to frustrations for both the technical analysts as well as the business stakeholders.
Following a series of such interactions and observing others, I realised that core to the frustrations was poor definition around the deliverable provided to the end user. Figures and tables, while seemingly simplistic require explanation and narrative to make sense to the non-technical person and a short paragraph in an email may not be sufficient for understanding. Furthermore, some analyses are better provided with interactivity so that the user can explore and take what they need.
I also found that we were all developing analyses as a collection of scripts with some instructions to run them. Even if analysts build analyses as reproducible analytical pipelines, analyses often remained a technical obscurity without a useful output tailored for the end users.
These thoughts coalesced into the idea that an analysis is best developed as a data product. Shaping a piece of work as a product makes it tractable. We can summarise a product with purpose, audience and value. A data product can be a pdf report, a web output, an interactive dashboard or a web app. It is simply a tangible asset for a piece of work that can be designed and shaped by user needs and preferences.
I’ve chosen to use a generic term of data product rather than something like analysis report. This is because there are many instances where a simple dashboard with interactive charts or perhaps a web app are the best data products to package up an analysis. However, the majority of analysis work fits well as a “report” format, especially with the wide flexibility offered by Quarto.
Quarto documents (be they pdf outputs, websites or dashboards) offer the best balance for the technical analyst. The executable document contains code and links to data, useful to the technical person, while the rendered output is designed for the target audience. I’ll go through a brief transport example to highlight the use and value of data products.
Throughout the COVID-19 pandemic and its lingering aftermath there were many questions around maritime resilience and connectivity. For transport policymakers questions abound on the nature and type of shipping lanes connecting New Zealand and the Pacific islands to global trading partners. There were specific questions but also a desire for a holistic understanding. Given this need, a web-book was the best data product. The introduction provided the wider view while individual chapters presented detailed analyses on the requested topics. The website was hosted on Gitlab Pages and made available to users as an URL.
A web-book data product functions like a reference document with clear global navigation for users to jump into the area of most interest or pertaining to their questions. This was partly because there were several users, and related but distinctly different analyses.
This data product (and others like it) were actually constructed from user stories. The term hails from Agile methodologies but can be used for data analyses without the work management aspects. At their core, user stories underlie the product [1] with no tie to a specific type of product. User stories can become features for a data product i.e. data analyses. A good user story includes the need (the question) and the context of the user giving the analyst sufficient information to best provide information and insights.
The best products evolve by iteration; through presentation to and feedback from the end users. Unlike an email chain where people can get lost finding the artefacts with the changes they requested, a data product, especially a web-based one, can be iterated upon based on feedback with the improved version available at the same location as the old. For complex projects, building cheap data product prototypes with made-up data can be iterated with the users to ensure the full build is only handled once the core use and functionality is agreed upon [2].
While every analysis request or project I have worked on in the last couple of years has an associated data product, the process has been far from smooth and with persisting points of friction. For example, the lack of a general tool for detailed feedback has been particularly problematic.
Since data products are fundamentally technical assets, they are hosted on a version control system (VCS). Gitlab in our case. The rendered output (be it web app or plain HTML) are provided at an accessible URL to end users. These outputs can be explored and feedback is usually solicited in a face to face (in person or virtual) meeting. However, the tools we have chosen don’t have the equivalent of Microsoft’s “Track Changes”. This limits the data products to ones that will be edited by the technical analysts alone. We are unable to collaborate on the text of the data products in a manner that still allows for technical development with the appropriate tools. While the friction of collaboration is galling, the model does works well for most analyses.
To recap, a data product is a tangible asset of data analysis. Depending on use and user, it can be a pdf report, website, web app or interactive dashboard. Work is best done by first collecting the requests as user stories which are developed as analyses and consolidated into an appropriate data product, noting to write design documents before coding! The data product is then iterated upon based on user feedback. To avoid search fatigue and asset confusion, data products are best hosted at a centralised location.