Design documents before coding

data science
good practice
new zealand
public sector
Save time and headspace in data science projects
Author

Shrividya Ravi

Published

February 9, 2024

Data analysts write code. Often quite a lot. We are effectively software developers regardless of our job titles. This fact doesn’t need to put off any analysts since a developer’s job is to solve problems rather than write code [1], much the same as a number of other domains that include data analysts, data scientists, scientists etc. This means that good practices across seemingly disparate domains can cross-pollinate because they share the same ethos. I’ve already written about improving reproducibility, a key practice among scientists, with the suggestion for analysts to build analyses as reproducible analytical pipelines. I will now do the same for borrowing design considerations from software development for data analysis.

For a long time, I considered myself separate to software developers because I didn’t equate the analysis work I did to feature design, development and testing. However, the underlying cycle of design, development and testing is fundamental to solving any problem for users via a product. Even if the product is just for ourselves or a single business stakeholder, a data product perspective can be very beneficial to data analysts.

Before the development work to structure data analysis as a package and build analyses as reproducible analytical pipelines even begin, it is useful to reflect upon the design. Every analysis makes assumptions about the data required, chooses a particular methodology and presents the output in a certain way. All these choices are part of the analysis design. These design choices are a core part of the actual analysis and must be incorporated into the review process of any product that is delivered to a user. A design document facilitates this vital part in a consistent manner.

Design documents describe implementation strategy and design decisions with robust discussion of trade-offs [1]. They start with why and what questions for elaborating context, requirements and constraints [2]. These questions place the analysis in the frame of the request or need which predicates the subsequent decisions around data, method and presentation.

The core of a design document suggests solutions to the problem before justifying a particular solution [1]. This should be reasonably easy to write as there is always an alternative, even if it is trivial. My personal experience of writing about alternatives has challenged the solution I honed in on too quickly. Sometimes, the rationale came down to ease and speed but in others, features from the alternatives improved my preferred solution.

Describing the preferred solution should cover most of the components of data science methodology: data, techniques, validation and communication [2]. Depending on preference, alternatives can be listed for each component. However, I have found more alternatives exist for techniques than the other components.

In an organisation or data team where projects are pursued based on priority, design documents can be instrumental in proving value before building the data product [3].

However, the greatest value is that design documents augment organisational knowledge [1]. This applies as well to a large organisation where similar skills are spread across multiple teams to a small organisation with a mix of experience.

Capturing the design considerations and discussion in a document is a boon for future team members who might find themselves needing to work with code that was designed and written by someone long gone. The design document is a benevolent “Ghost of Analyst Past” gently informing and guiding the new analyst.

The threshold of when an analysis is too simple for a design phase can be difficult to set. A simple rubric is to write a design document for complex and ambiguous problems [1]. For simpler problems, I have found it sufficient to detail the methodology that will be implemented into the Merge Request (Gitlab equivalent of the Pull Request) description.

Design documents exemplify the valuable perspective that writing documents is expensive, but cheap [2]. Time and effort is expended in writing and reviewing the document but this effort pays off in the long run. Consensus is set upfront and kinks in approach are ironed out at the outset allowing the project to progress smoothly.

Credit

Photo by Firmbee.com on Unsplash

References

[1]
M. Ubl, “Design Docs at Google,” Jul. 06, 2020. https://www.industrialempathy.com/posts/design-docs-at-google/ (accessed Jan. 15, 2024).
[2]
E. Yan, “How to Write Design Docs for Machine Learning Systems,” Mar. 07, 2021. https://eugeneyan.com/writing/ml-design-docs/ (accessed Jan. 15, 2024).
[3]
A. Viana, “Built It Once & Build It Right: Prototyping for Data Teams.” https://www.getdbt.com/coalesce-2021/prototyping-for-data-teams/ (accessed Oct. 21, 2023).