..

Data-oriented design for functional programmers

a moderately dense introduction


I recently went through an interesting pipeline while working on my main project. This, along with a few signals and requests, motivated me to write this down. So here I am today, ready to write about concepts I unknowingly used long before learning they had a name.

Let me start by talking about the pipeline I went through (it’s relevant). I’m currently implementing a specific data structure for High Performance Computing applications. The final product should achieve a few different things, notably (a) great performance, and (b) great usability. The main challenge to achieve both lies in hiding the right amount of boilerplate while exposing an API which restricts mistakes by design.

Practically, this resulted in an implementation using data-oriented design principles, with an API designed for functional usage. I liked the idea behind the functional approach so much that I tried Haskell, learned a few things, and came back to my main project with a rewired brain to take my model further.

Now, I’m going to reverse-engineer my approach (emphasis on reverse), and attempt to explain how to bring data-oriented design principles to a functional environment | mindset | programming language.

First, some definitions:

  • Data-oriented Design, or DoD, refers to a programming approach leveraging data storage and manipulation to achieve better performance through cache usage. You can think of it as handling your software’s data in a way that pleases your hardware.
  • Functional Programming, or FP, will not refer to the eponymous programming language, but to the paradigm. I won’t attempt to define it as I’m out of my depth here; good thing this is not an article about programming language theory.

Now that the subject is defined, it’s time to deliver what you clicked for.


Coming from functional programming will give you a significant head start to learn and apply DoD. This is because there is a common component to both worlds: the emphasis on data manipulation. One of the core applicative principles behind data-oriented design is to think of your program as an articulation of successive data transforms, all leveraging the quirks of your hardware. This is achieved through both the transformations themselves, and how data is laid out in memory.

Functional programmers naturally acquire the ability to reason about data input, output, intermediate states, and transform compositions. Given this, there are “only” two things left to learn about:

  • Data storage practices – How much of it you can apply will be determined by the language you are willing to use and how deeply you are willing to tinker.
  • Hardware behavior – This will guide your reasoning and resulting optimizations.

The idea is to take your existing thought process, and inject DoD considerations into it. It should feel like adding one parameter to evaluate your solution on. The rest of this article is going to take a more practical approach to show you what to look out for.


Translating common DoD concepts to FP is slightly offsetting due to the importance of the programming language you’re using. Garbage collecting, ownership, and manual memory managment all have different implications; some languages may not even give you the degree of control necessary to implement common DoD patterns.

Applicability of each method is left to the reader’s discretion.

Example 1 -- Improving locality of reference using the SoA/AoS concept

Let’s take a simple problem: You have a list of users, each user has a name and an age. Your goal is to filter out underage users. Here’s a a possible implementation in Haskell:

-- AoS approach

type Age  = Int
type Name = String

type Users = [(Age, Name)]

getSomeUsers :: Users -> [Name]
getSomeUsers list = [ name | (name, age) <- list, age >= 18]

This version works great, but we can do better. Instead of using an Array of Structures, we can rearrange items to use a Structure of Arrays, prioritizing locality between items of a same field (as opposed to fields of a same object):

-- SoA approach

type Age  = Int
type Name = String

type Users = ([Name], [Age]) -- notice the []/() swap

getSomeUsers :: Users -> [Name]
getSomeUsers (names, ages) = [ name | name <- names, age <- ages, age >= 18]

The second implementation should yield better results. Why?

The reason this works is due to cache behavior: if we try to access memory that isn’t loaded in the cache, an entire block of memory is fetched (not only the accessed memory). By laying out data in a way where successively used items are held in a contiguous block of memory, we ensure each look-up maximizes the amount of useful data loaded in cache.

There are more complex mechanisms involved though:

  • Performance improvements are influenced by the size of your data. For example, there won’t be any changes if your entire input fits in the first level cache.
  • The structure of your data also influences code generation. This specific AoS/SoA tweak can enable vectorization in some cases.
  • If your iterator processes each item completely before moving on to the next, the first layout may do better cache-wise. However, if the iterator first filters all items, then maps them to the desired output, the SoA layout will do better.

These aspects operate simultaneously to create the changes you observe using profiling tools. This can make it challenging to predict what a change in source code will translate to in the final program (more on this in example #3).

Example 2 -- Trading compute for compute in preprocessing steps

Just like the previous example, the idea is to improve locality of reference. Unlike the previous example, we’re not going to tamper with data layouts, but instead make some runtime adjustments. The idea is to add a new transformation (or computation) in your composition to improve the following transformation.

In practice this can take many different forms. Here are a few examples:

  1. sorting input data to create predictable memory access patterns for other data to match
  2. moving or copying data chunks to each individual thread in a concurrent setting
  3. dividing the problem for distributed execution

The common pattern is the creation of that first “intermediate” step before running your actual transformation. Each of these work for different reasons:

  1. leverages the same cache dynamics as in the previous example
  2. offsets combined effects of first touch allocations and Non-Uniform Memory Access designs
  3. creates locality by restricting the relative workload per compute unit

Since each solutions comes at a cost, you should always keep the tradeoff in mind. In case #2, copying data has both a temporal and spatial cost. In both #1 and #3 cases, time spent preparing data (resp. sorting and partitioning) should be lesser than time gained from better cache usage.

Example 3 -- Conciliating laziness and predictability

I mentioned at the end of example #1 how multiple mechanisms can interact and create surprising results. You will find below some compiler explorer links for you to tinker with, while this example will focus on one such interaction.

There seems to be some degree of conflict between DoD and FP:

  • the former makes assumptions on predictability of memory accesses to build its principles from,
  • the latter, and more generally iterator-based approaches, often allow (or default to) lazy evaluation.

Laziness as a design choice is perfectly fine, but it sometimes prevents optimizations from being made. Take as example an any operator, checking if any item of an iterator verifies a given predicate. If the predicate is trivial (operation-wise), we could throw in some vectorization. Laziness may prevent this, as values aren’t computed until needed.

One option to fix this is the introduction of an additional loop level: by making the original iterator operate on SIMD-sized blocks, we can hint at a possible vectorization.

This example generalizes to a concept that I refer to as looping patterns, the idea being to move through your data in a more optimal way than the naive solution. I’m not saying you should forget about big O notations, but introducing hierarchies and multiple nest levels can indeed improve performance.


These examples are only the tip of the iceberg. The interesting (and hard!) part of data-oriented design is that each problem has its own uniquely crafted solution. Because we’re talking about a design philosophy, there’s no step-by-step guide on how to infuse your productions with DoD.

That being said, I can guarantee that you will get better at this if you learn about hardware behavior and actively try to apply basic principles. To that end, the best advice I can give is to think about the shape of the data you are transforming, and practice.


More reading for you lovely people: