Getting a legacy system under test


So many of us end up in the same place: with an application that we’ve built, as quickly as possible, that has found enough users, and that seems to be getting harder and harder to do any changes to. We describe the state we end up in using terms like legacy system, technical debt, long manual regression cycles, or less polite terms.

Working with a variety of companies that have come to be in that position, I’ve found an approach to dealing with these situations. The details vary according to the needs of each client, but in each case the plan of attack is the same: chart the high-level functionality of the system; provide a broad-but-shallow level of testing; prioritise; and dive-deep into relevant areas. All using techniques that can seamlessly be applied to developing new functionality too.

Last October I gave a workshop, together with Karel Boekhout, at Lean-Agile Scotland to teach our preferred way of getting a legacy system under test. We teach this 2-day workshop at companies who need help with their legacy system.

Our approach

Our approach is based on the experience of many different fix-r-upper projects. Those project were about getting back in control over a system that was hard to work with. Some resulted in extensive refactoring, some in a complete re-write, some were somewhere in the middle. All needed to get to a point where there was enough confidence in the automated tests that the organisation felt confident enough to release based on that set of tests, instead of the usual manual tests.

A very short way of describing the main steps are:

  • Find out what the current functionality is
  • Find out how (and if) you are testing that functionality currently
  • Iteratively refine and extend that testing so it fits into a defined testing strategy

These steps are important because we never really know what the existing system does in any detail. We often do not know how things are being tested currently and the tests (both manual and automated) that we find are almost always not designed keeping the testing pyramid in mind.

At the same time, we need the development teams to work on a system to start to learn the skills that will prevent the same problem from re-occuring. We prefer to use some of the more popular tools from Behaviour Driven Development to iteratively refine our understanding of the functionality and drive how we should be testing it. To that end, we apply Story Mapping and Example Mapping and stay close to the way we use those in new development work.

Start with a functional breakdown

What does your application do, exactly? Frequently there’s no clear, let alone documented, answer to that question. And even if there is, the level of detail usually varies greatly over different parts of the app. So in our efforts to structurally get a system under test, we like to start with a high-level functional breakdown.

This can take many forms, but I usually start with drawing something on a whiteboard using post-its for functional areas, together with the people that know the application . The three amigos are a good starting point, though you might find you’ll have different amigos to get more detail on different parts of the application.

Later, you can make it look better; making a pretty chart, or capturing the different functions in some sort of spreadsheet to add more detail. But first, just start with a very high-level overview of functional areas.

slide 1 - high-level functional breakdown

The images and examples used here are from our workshop, where we use an application that many people will have worked with. We don’t know any of the implementation details, but we work outside-in to create an overview.

Create a heatmap of current functional test coverage

Now that we have a first overview of the functions we should be testing, we can have a look at whether we are actually doing that. We can look at whatever kind of testing is being done before we feel confident enough to release, and mark-up our functional breakdown to show how each functional area is being tested.

When we do this, we often find that there are manual tests being run. Sometimes those are documented, sometimes not. Sometimes that documentation is (too) detailed, sometimes it’s just a high-level checklist. But at the very least we will get an initial view of what our current testing looks like.

slide 2 - high-level heatmap


Of course our goal is to improve testing and functional test coverage. And to automate as much of it as we can. So how do we look at that existing test coverage of our heatmap? And how do we use it to know where to start with getting to the next level with testing? First, we decide on priorities.

slide 3 - priorities

We do not want to spend too much time prioritising at this point, but as we learn more and add more detail to our heatmap we will have to make choices on what to test first, and how.

To help remind us what considerations are important to be able to prioritise, I recommend having a look at the RCRCRC mnemonic, as coined by Karen Johnson. Since we are initially mostly working on existing code, ‘Core’ and ‘Risk’ might be our main prioritisation considerations. As we start executing the tests we create while creating new functionality, other areas might quickly rise in importance.

For the purposes of an example, we pick one of the features of the application: ‘Manage Playlist’.

Story Mapping to get more detail on a feature

A great way to go into more depth for a feature is the Story Map. There are a few subtle differences when creating a story map for an existing feature, and certainly we might prioritise a little differently. But overall the process is very similar.

We break down the flow of the feature into steps and start writing stories down for the different parts of the flow. We focus on stories that illuminate a part (or aspect) of what the system can do in that part of the workflow. The best way to go about that is to have all Three Amigos to generate the stories. That means including everyone that has a part in development: product, UX, test, development, ops, …

Story Map of the Manage Playlist feature

In our example you can see we look at the Manage Playlist functionality of our unnamed music app. The first step in generating the story map is to simply go through the lifecycle of use of the feature. For the playlist that means we might create a new playlist, add some songs to it, find the playlist again later, see what’s in the list and play the songs, change the playlist later, and delete it.

Note that there’s different levels of detail you might have for a story map. Since we are talking about an existing feature, looking at it from the full lifecycle of the playlist gives us a nice complete picture.

Once we have this conceptual view of the flow of a feature, we discuss different variations of each step of the flow. When we are creating a new feature that means trying to find a way to incrementally build the feature, so we can deliver small improvements frequently. When looking at an existing feature it’s mostly the same. But sometimes we can take bigger steps ince testing the story (or its variations) might be less work than creating it.

Slices to Prioritise

Once we have generated a bunch of stories for the flow steps, we can first order them under the flow step: which is the simplest version of this step we can think of? Then we cluster those simplest versions to form a ‘slice’ across the steps. That slice represents a good, simple version of the feature So if we ran through them step-by-step, it would represent a basic use of the feature. We then do the same for later steps.

Story Map slices

Notice that not every slice has all the flow steps represented. In fact, we decided that even the first slice doesn’t include any edit or delete functions. That is fine: we are going for coverage that we care about, not for completeness. Like above, we again apply the prioritization heuristics: RCRCRC, to reason about what we put at the top. There is another heuristic, though, which is simply about how easy it will be to make the test. Sometimes it’s better to get started and get some value, above going for the highest.

I tend to think a slice is a good set of stories if I can give it a name. Similar to naming a method in code, a good set of stories is easily described in a few words.

Story Map with named slices

Slices to end-to-end tests

Our goal was to get tests that give us a good coverage of functionality. For some types of tests, the state that we are in now already gives us some guidance. Each slice through the story map is a variant of the feature. And simply going through each of those slices as a manual test is already a good starting point for structured testing.

Let’s update our heatmap and list each slice under the main feature. We could even use the heatmap view to mark which slices we have tested for a particular release and which we decided to skip.

A side note: manual vs ‘automated’ testing

That might not be what you were expecting, especially considering the source. Remember that this is an iterative process. Just extending our understanding about what functionality we need to cover, is already giving us more control and confidence. Even if we are still doing these tests manually.At least we are doing them, or we are conscious that we choose to not do them. See, we have even prioritised the tests!

It is recommended to limit the amount of classical end-to-end tests. These tests (often run as automated selenium-type tests through a UI) are very expensive to maintain, break easily, and tend to only give very generic “It doesn’t seem to work” feedback. It is still useful to have a few of them. The first slice or two in our story maps are a very good starting point to deciding what should be in those end-to-end or journey tests.

Of course, once you have automated those flows, don’t forget to update your heatmap to reflect that!

Updated heatmap

Example Mapping to get into low-level detail of a story

But how does this help us to get the sort of detailed functional coverage that we need to be fully confident in our system? For that, we need to go into more detail. The journey tests give us a broad coverage of a part of the functionality. If we want to know whether the details are right, we need to go in depth to discover all the variations of business rules. We can use another technique borrowed from the BDD community: (Example Mapping)3.

Rules as functional detail

Example Mapping

The idea of Example Mapping is simple: we try and ask what specific examples of the business rules (often called ‘Acceptance Criteria’) of a story are. By talking about concrete examples and situations we can quickly discover the specifics of those business rules. We often find quite a few that we didn’t think of in advance. Doing this with new stories quickly gets us to a point where there’s clarity of what is expected for the new functionality. Doing it for existing functionality gives us the same type of clarity, about what different variations in functionality there are.

Examples as tests

The examples we generate are the starting point for creating the more detailed tests of the system. Like the Story Map, our examples both document the system behaviour and help us discover where we need to test the more involved business rules of the system.

During Example Mapping we just write down examples in such a way that we all understand what we mean. That can be a few bullets, the Friends Episode Name (“The one where ….”) or even a few sketches. As long as it’s clear what we mean. The purpose is to explore the system behaviour. And by discussing it in terms of concrete examples we quickly find out whether we agree on that behaviour and see what parts are missing.

Example Examples

Formulating examples into Scenarios

Next we can use the examples to formulate ‘Acceptance Scenarios’: detailed test cases that describe and verify the expected behaviour around those business rules. These tests go into the nitty-gritty of the business rules of the system and will cover the permutations of different ways the system can react.

Acceptance Scenarios

In this form the scenarios can be the base of automated checks. The format above (‘Gherkin’) is the one popularised by the (Cucumber framework)7. This framework allows us to use the English language (or whatever language you speak) description of the test, to be implemented as a test against the system. By building up a library of those tests, guided by our story and example mapping structures, we can create a set of tests that documents and verifies the detailed behaviour of the system.

A side note: testing where the functionality lives

A short side note for a large subject: when we do implement the Acceptance Scenarios using a tool such as cucumber, it’s important that we recognise that there will be a lot of those scenarios and that therefore those tests need to be very fast to run. We try to keep the implementation of those sorts of tests as simple as possible, requiring as little additional work as we can. To do that, we implement the tests as close as possible to the implementation of the place where the business rule is implemented.

Implement close to the functionality

That means that most of those tests are implemented as unit or component level programmer tests (linked to the scenario by the glue of cucumber step definitions), and not as large end-to-end through-the-UI tests, as is often assumed.

The Heat Map - revisited

With our detailed deep tests in the form of our examples, we add a new level of coverage to our functionality. We can reflect that in our Heat Map. We do that by adding a new level to the hierarchy and marking the functionality as being automatically tested.

Heat Map with deep tests marked

Of course, we might decide that not all of the functionality needs that sort of testing and will again use our heuristics (and common sense) to prioritise. The more we have under test, the higher our confidence in the automated test set.


In this article we have discussed a simple process to incrementally and iteratively build a map of our knowledge of the functionality of the system. We expand our understanding of the main functional parts using Story Mapping and use the slices of the Story Map as basis for our broad-but-shallow end-to-end tests. We execute on those manually first and automate when we know what they should look like in detail.

We prioritise and go in-depth using Example Mapping to create a detailed understanding of the business rules that our system is based on and we implement tests for those rules based on the examples we generate. This set of tests will give us our deep tests that can cover the many permutations through our functionality needed to get full confidence that it works as expected.

  1. Create Functional overview
  2. Create Heat Map marking current (manual?) tests against functionality
  • Prioritise
  1. Use Story Mapping to gain insight into variants of high level functionality
  • Prioritise
  1. Use slices from the Story Map to formulate broad-but-shallow tests
  • Prioritise
  1. Use Example Mapping on stories from the slices to explore detailed functionality
  2. Implement examples as deep-and-fast tests
  3. Repeat the cycle

If you have any questions, let me know on email or twitter. If you want to go through this in practice with your team, you can book the full training.