Reasons for doing a stepped wedge trial

In earlier posts I’ve given an introduction to what stepped wedge trials are, and in what situations they tend to be used (see What is a stepped wedge trial? and Three kinds of stepped wedge trial). But why would you use a stepped wedge design in particular in a given situation, in place of something … I don’t know … simpler?

Clarifying the question

If we are planning a cluster randomised trial, and we have all of our clusters (the hospitals, general practices, local communities, or whatever they are) ready to be randomised at the same time, and ready to provide data over a particular calendar period, and able to be crossed over from the routine care “control” condition to the experimental intervention condition (but not from the intervention back to the control), then what makes us choose a stepped wedge design in particular – that is, a design in which clusters are randomised to one of a number of different sequences determining the timetable for implementing the intervention?

What about two-way cross-over?

OK, let’s back up a little. What if the clusters can happily be crossed over in both directions – from control to intervention and also from intervention to control? Then the answer is simple: you should design a trial in which clusters cross in both directions. This website is about stepped wedge trials, so I don’t want to get too distracted by a discussion of two-way cross-over designs for cluster randomised trials (which would need a whole new website). Suffice to say that two-way cross-over designs tend to be a lot more efficient than one-way cross-over approaches like the stepped wedge.

As I said in an earlier post, stepped wedge designs are something you would only consider for those interventions that might be easy enough to introduce to a cluster, but are much harder (practically or politically) to take away again. These are the interventions that change practice or are difficult to unlearn, or that policy has decreed will eventually be rolled out everywhere.

What if clusters are not all ready at the same time?

What if the clusters are not all ready and waiting to be randomised at the same time? This is an interesting question. To date the vast majority of published stepped wedge trials, and the vast majority of work on methods for the design and analysis of stepped wedge trials, has assumed that all the clusters join the trial at the same time. But this need not, realistically, be the case. What if you recruit clusters in separate batches over time: is it still possible to do a stepped wedge trial?

This situation will add more complexity to what is already a complex design, and one thing I want to champion in these posts is simplicity. There is also not very much written, as I say, on methods for this kind of batched stepped wedge trial: so, for now, I’d advise that applied health researchers avoid them, and methodological researchers look into them.

Incidentally, if you are recruiting clusters one by one, over a period of time, rather than all together, and you want a simple approach, you might consider just randomising each cluster either to a short period of data collection under the control condition or to a short period of data collection under the intervention condition.

Two good reasons for choosing a stepped wedge

To answer my original question, there are two very good reasons why you might consider a stepped wedge trial (and you only need one of them): the first is practicality, and the second is efficiency.

Reason 1: Practicality

Practicality is probably the easiest to understand: sometimes it just isn’t practical to implement the new intervention across half of the available clusters all at the same time. If you have a trained team of researchers who set up the intervention in each cluster then they may be stretching themselves too thin to try and set it up in more than one cluster simultaneously. Much more sensible in this case for the team to get one cluster set up, then move to the next – in other words to stagger the introduction of the intervention at different clusters.

Or consider the Gambia Hepatitis Intervention Study (see The first stepped wedge trial). In this case the vaccine needed for the intervention programme was not being manufactured on a large scale when the trial began, and a phased introduction of the programme allowed time for manufacturing to pick up.

Reason 2: Efficiency

Efficiency is another key motivator. To a research funder or a policy maker, “efficiency” tends to mean doing something as quickly or as cheaply as possible. To a clinical trial statistician like myself, “efficiency” usually means finding a way to get the evidence you need with the fewest participants (or fewest clusters, or fewest measurements), or else squeezing more evidence out of the same number of participants (or clusters, or measurements). These different usages essentially amount to the same thing.

Many published stepped wedge trials cite efficiency as the reason for using the design, and indeed in some cases stepped wedge designs are very efficient and this is a good argument to make. But can you invariably squeeze more evidence out of a stepped wedge design than out of a simpler design, such as one where you randomise half the clusters to the intervention for the duration of the trial, and half to the control? The answer, interestingly, is “no”.

A counter-example

The following example is taken from this paper. Suppose we are designing a trial where we recruit pregnant women with suspected pre-eclampsia, and suppose we are randomising by the maternity unit that the women attend. Suppose we have identified ten maternity units willing to take part. For simplicity, let’s imagine that the recruitment of eligible women is divided into month-long blocks, and let’s suppose that in each maternity unit we can expect to recruit four women to the trial every month. Note that this is an example of a trial with continuous recruitment (see Three kinds of stepped wedge trial).

(Technical note: the optimal choice of trial design also depends on a parameter called the intracluster correlation (ICC), which measures the degree of similarity of outcomes of individuals who belong to the same cluster. In this example the ICC was assumed to be 0.01.)

The classic stepped wedge design in this case would be scheduled over a total of 11 months and would look like this (where dark green represents recruitment after the intervention, whatever it is, has been introduced, and light green recruitment under routine care):

But it turns out that in this example you can do a much more efficient trial over 11 months in ten maternity units by randomising half the units to get the intervention from day one, and half to deliver routine care throughout, like this:

If you saw my post Three kinds of stepped wedge trial you might be particularly surprised by this, since none of the clusters in the schematic above act as their own controls. In some situations, it seems, this isn’t so helpful or efficient after all. Roughly speaking, this happens when the number of people recruited at each cluster in each step is relatively small, and when outcomes of people from the same cluster are not that much more similar than outcomes of people from different clusters.

In fact, the most efficient design involving ten clusters in 11 time periods in the example above turns out to be a kind of mixture of the two previous designs, known as a hybrid design (but be careful with this terminology, because in implementation science a “hybrid design” can also mean something else):

The moral of the story

I’m not necessarily advocating the rather unusual looking, hybrid design over something simpler, but I do want you to understand that a classic stepped wedge isn’t always the most efficient design choice, and if you want to justify your use of a stepped wedge design it isn’t enough simply to say “because it’s efficient”. You should consider alternatives and work out the resources you would need for the trial in each case. (I’m not pretending this is always easy to do, but some tools to help you calculate the number of clusters needed for a trial, or related design parameters, can be found on the Resources for researchers page of this website.)

Bad reasons for doing a stepped wedge trial

Finally, if you’ve read some of my other posts – particularly What is a stepped wedge trial? and The first stepped wedge trial – you may have picked up on what I think are a couple of very poor justifications for using a stepped wedge design rather than a simpler approach: (1) that you want everyone to get the intervention (see You want everyone to have the intervention? You don’t need a stepped wedge trial for that), and (2) that the intervention was being rolled out anyway. The latter is likely, in all honesty, to make it harder to do a stepped wedge trial, since the timescale may be pressured and sites may be resistant to complying with your beautifully randomised stepped wedge.

No – in any scenario where you’re considering a stepped wedge, ask first “how long can I reasonably ask any site to continue without the intervention?” This will help you understand how much time you have to conduct a truly randomised evaluation. Then you can consider the efficiency and practicality of different approaches.