Don’t Underestimate Durability along with Scalability in Analytics Practice

One of the things our team has been working on over the past few years is improving the durability of our work. People often talk about the importance of scalability, and rightly so: scale of data and customer reach is the most obvious source of both opportunity and impact. While the benefits of scalability are generally understood, I see durability as a related quality that is less understood but even more fundamental. It has benefits of its own and is often a pre-requisite to scalability.

I get interesting answers – or just confused looks – when I ask job candidates about how they build durability and scalability into their work and this often helps me separate from the newbies from the more experienced analysts. That’s another reason I thought it would be worth discussing.

First let’s get clear on how we’re using these terms: What are we actually talking about when we say a project is ‘scalable‘ or ‘durable‘?

Scalable definition, according to Merriam Webster

scalable - capable of being easily expanded or upgraded on demand.

Ok, we get that. Now, durable definition, also according to Merriam Webster

durability - exist

I like to think of it as boiling down to two dimensions:

  • durability = time
  • scalability = size

The idea here is that we should not only think about scale but also about durability when we design, structure and execute our analytics work. This creates additional opportunities for impact and pays dividends over time, especially for projects where scale may not be as much of a factor – which can be a frequent case for lots of analytics work:

  • smaller projects that may be repeated in future or may grow with follow-up requests – or reproduced to confirm results.
  • projects that may benefit from collaboration or hand-off to other team members.
  • work that can be re-used or repurposed down the road to answer similar questions.
  • work that helps to build a knowledge repository within the group to expand upon (rather than having to start from scratch).
  • work that supports automated processes.
  • work that may form MVP for larger scale project that may be undertaken further down the road.

The end result of incorporating durability into your analytics practice will include:

  • Quicker turn-around time: using accessible, repeatable processes.
  • Better standardization and consistency of deliverables: leveraging established structures, less need to re-think/design from scratch.
  • Increased reliability of results, quality control: better reproducibility, less opportunity for manual error, stray formulas/references, over-written values.
  • Improved clarity of communication: ease of understanding of your analytical logic by teammates, partners and stakeholders.
  • Easier collaboration within the team: smoother transfer of work from one colleague to another, combination of work from different teammates, opportunities to build on work of each other.
  • Progressive accumulation of capabilities and impact over time: building an expanding practice, less running in circles.

Ok, so how do we get there? The most obvious and critical step to increasing durability in your work:

  • ditch those spreadsheets for R or other statistical processing software of your choosing!

There have been lots of words written and talks given on advantages of R over spreadsheets, so I’m not going to get into that here, other than to focus on durability and a specific and significant advantage.

Even if we have R skills, the spreadsheets can be sooo tempting: we can quickly throw some data together, do some manual manipulation if necessary, create some pivot tables, build some easy charts or attractive tables, send an email to stakeholders, and move on. When we get the follow-up question that requires a change of parameters or additional data, or need to walk a teammate through it, or get a similar question 3 months later…that’s where things fall apart. Yes, spreadsheets have their place but these and other scenarios demonstrate the point: with spreadsheets, you will never achieve as much durability as you can with R – and that durability has significant value.

Once you commit to moving your practice to R, it opens up all kinds of opportunities for increased durability:

  • Organizing your code in a logical, easy-to-follow flow that can be understood and adapted over time.
  • Use re-usable methods that can be easily modified as needed: such as using variables as often as possible, rather than hard-coding in values.
  • Commenting your code for additional understanding of ‘why’ certain choices were made.
  • Using version control (GitHub, GitLab, etc) for even small, individual projects to build up a coherent, accessible repository.
  • Using RMarkdown to integrate reporting and analysis in with data-processing for even greater logical flow, ease of updating, transparency.
  • Extending the workflow to integrated slide decks, flexdashboards, Shiny, etc.
  • Providing complete source info within any end products, so origin can be tracked back by anyone.
  • Sharing links to the end product stored in a standard location/format, rather than dropping tables or charts into email.

Of course, there are other practices along these lines, but you get the idea – the important thing is the ‘durability’ mindset.

Pretty much all of the above also supports and contributes to scalability as well. On this foundation, it becomes easy to add scalability features, such as:

  • writing custom functions to avoid duplicating code.
  • combining components from smaller projects to meet the demands of larger ones, since the components are well organized and documented.
  • building on pre-existing components and incorporating into machine learning models and workflows.

Ultimately, durability and scalability are inter-twined. The point is not to draw a hard distinction but to promote the idea that even without the need for scalability, durability still matters – and can make a huge difference to your practice and to impact on the business over the long-term, both individually and as a team.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s