Saturday, October 26, 2013

Complex Theories of Change: Recipes for failure or for learning?


The diagram below is a summary of a Theory of Change for interventions in the education sector in Country X. It did not stand on its own, it was supplemented by an extensive text description.



Its complex in the sense that there are many different parts to it and many interconnections between them, including some feedback loops. It seems realistic in the sense of capturing some of the complexity of social change. But it may be unrealistic if it is a prescription for achieving change. Whether it is the later depends on how we interpret the diagram, which I discuss below.

One way of viewing the Theory of Change is in terms of conditions (the elements in the diagram) that may or may not be necessary and/or sufficient for the final outcome to occur. The ideas of necessary and/or sufficient causal conditions are central to the notion of “configurational” models of causation, described by Mahoney and Goertz (2012) and others. A configuration is a set of conditions that may be either sufficient or necessary for an outcome e.g. Condition X + Condition T + Condition D + Condition P -> Outcome. This is in contrast to simpler notions of an outcome having a single cause e.g. Condition T -> Outcome.

The philosopher John Mackie (1974) argued that most of the “causes” that we talk about in everyday life are what are called INUS causes. That is, they are about a condition that is an Insufficient but Necessary part of a configuration of conditions but one which is Unnecessary but Sufficient for an outcome to occur. For example, smoking is a contributory cause of lung cancer, but it is neither necessary nor sufficient to get cancer. There are other ways of getting cancer and all smokers do not get cancer.


The interesting question for me is whether the above Theory of Change represents one or more than one causal configuration. I look at both possibilities and their implications.

If the Theory of Change represents a single configuration then each element, such as “More efficient management of teacher recruitment and deployment”, would be insufficient by itself, but a necessary part of the whole configuration. In other words, every element in the Theory of Change has to work or else the outcome won’t occur. This is quite a demanding expectation. The more complex this “single configuration” model becomes (i.e. by having more conditions), the more vulnerable it will becomes to implementation failure, because even if only part does not work, the whole process will fail. One saving grace is that it would be relatively easy to test this kind of theory. In any locations where the outcome did occur it would be expected that all elements would be present. If some were not, then the missing elements would not qualify as insufficient but necessary conditions.

 The alternative perspective is to see the above Theory of Change as representing multiple causal configurations i.e. multiple possible combinations of conditions, each of which can lead to the desired outcome. So any condition, again such as “More efficient management of teacher recruitment and deployment” may not be necessary under all circumstances. Instead it may be insufficient but necessary part of one of the configurations, but not the others. Viewed from this perspective, the Theory of Change seems less doomed to implementation failure, because there is more than one route to success.

However if there are multiple routes the challenge is then how to identify the different configurations that may be associated with successful outcomes. As it stands the current Theory of Change gives little guidance. Like many Theory of Change at this macro-level / sector perspective it tends towards showing “everything connected to everything”. In fact this limitation seems unavoidable, because with increasing scale there is often a corresponding increase in the diversity of actors, interventions and contexts. In such circumstances there are likely to be many more causal pathways at work. This view suggests that at such a macro level it might be more appropriate for a Theory of Change to initially have relatively modest ambitions and to limit itself to identifying the conditions that are likely to be involved in the various causal configurations.

The focus then would move to on what can be done through subsequent monitoring and evaluation efforts. This could involve three tasks: (a) Identifying where the outcomes have and have not occurred, (b) identifying how they differed in terms of the configuration of conditions that were associated with the outcomes (and absent where the outcomes did not occur). This would involve across-case comparisons. (c) Establishing plausible causal linkages between the observed conditions within each configuration. This would involve within-case analyses. Ideally, the overall findings about the configurations involved would help ensure the sustainability and replicability of the expected outcomes.

The Theory of Change will still be useful in as much as it successfully anticipates the various conditions making up the configurations associated with outcomes, and their absence. It will be less useful if it has omitted many elements, or included many that are irrelevant. Its usefulness could actually be measured! Going back to the recipe metaphor in the title, a good Theory of Change will have at least an appropriate list of ingredients but it will be really up to subsequent monitoring and evaluation efforts to identify what combinations of these produce the best results and how they do so (e.g. by looking at the causal mechanisms connecting these elements).

Some useful references to follow up:
Causality for Beginners, Ray Pawson, 2008
Qualitative Comparative Analysis, at Better Evaluation
Process Tracing, at Better Evaluation
Generalisation, at Better Evaluation

Postscript:

I have just read Owen Barder's review of Ben Ramalingam's new book "Aid on the Edge of Chaos" In that review he makes two comments that are relevant to the argument presented above:
"As Tim Harford showed in his book Adapt, all successful complex systems are the result of adaptation and evolution.  Many in the world of development policy accepted intellectually the story in Adapt but were left wondering how they could, practically and ethically, manage aid projects adaptively when they were dealing with human lives"
"Managing development programmes in a complex world does not mean abandoning the drive to improve value for money. Iteration and adaptation will often require the collection of more data and more rigorous analysis - indeed, it often calls for a focus on results and 'learning by measuring' which many people in development may find uncomfortable."
The point made in the last paragraph about requiring the collection of more data needs to be clearly recognised, as early as possible. Where there are likely to be many possible causal relationships at work, and few if any of these can be confidently hypothesised in advance, the coverage of data collection will need to be wider. Data collection (and then analysis) in this situation is like casting a net onto the waters, albeit still with some idea of where the fish may be. The net needs to be big enough to cover the possibilities.




Wednesday, August 14, 2013

Measuring the impact of ideas: Some testable propositions



Evaluating the impact of research on policy and practice can be quite a challenge, for at least three reasons: (a) Our ideas of the likely impact pathways may be poorly developed, (b) Actors within those pathways may not provide very reliable information about exposure to and use of the research we are interested in. Some may be over-obliging, others may be very reluctant to acknowledge its influence. Others may not even be concious of the influence that did occur, (c) It is quite likely that that there are many more pathways through which the research results travel that we cant yet imagine, let alone measure. Even more so when we are looking at impact over a longer span of time. When I look back to the first paper I wrote about MSC, which I put on the web in 1996, I could never have imagined the diversity of users and usages of MSC that have happened since then.

I am wondering if there is a proxy measure of impact that might be useful, and whose predictive value might even be testable, before it is put to work as a proxy. A proxy is conventionally defined as "a person authorized to act on behalf of another". In this case it is a measure that can be justifably used in place of another, because that other measure is not readily available.

What would that proxy measure look like? Lets start with an assumption that the more widely dispersed an idea is, the more likely someone will encounter it, if only by chance, and then make some use of it. Lets make a second assumption, that impact is greater when not only is the idea widely dispersed, say amongst 1000 people rather than 100, but when it is dispersed amongst a wide variety of people, not just one kind of people. Combined together, the proxy measure could be descirbed as Availability.

While one can imagine some circumstances where  impact will be bigger when the idea is widely dispersed but within a single type of people I would argue the success of these more "theory led" predictions will often be outnumbered by serindipitous encounters and impact, especially where there has been large scale dissemination, as will often be the case when research is disseminated via the web. This is a view that could be tested, see below.

How would the proxy measure be measured? As suggested by the assumptions above, Availability could be tracked using two measures. One is the number of references to the research that can be found (e.g. on the web), which we could call Abundance. The other is the Diversity of sources that make these references. The first measure seems relatively simple. The second, the measurement of diversity, is an interesting subject in its own right , and one which has been widely explored by ecologists and other disciplines for some decades now (For a summary of ideas, see Scott Page - Diversity and Complexity, 2001, chapter 2). One simple measure is Simpson's Reciprocal Index (1/D), which combines Richness ( the number of species [/ number of types of reference sources]) and Evenness, the relative abundance of species [/number of references] across those types). High diversity is a combination of high Richness and high Evenness (i.e. all species are similarly abundant). A calculation of the index is shown below:
How could the proxy measure be tested, before it can be widely used? We would need  a number of test cases where not only can we measure the abundance and diversity of references to a given piece of research, but we can also access some known evidence of impact(s) of that research. With the latter we may be able to generate a rank ordering of impact, through a pair comparison process - a process that can acknowledge the differences in the kinds of impact. We could then use data from these cases to identify which of the following distributions existed:



We could also compare cases with different combinations of abundance and diversity. It is possible that abundance is all that matters and diversity is irelevant.

Now, does anyone have a set of cases we could look at, to test the propositions outlined above?

Postscript: There are echoes of evolutionary theory in this proposal. Species that are large in number and widely dispersed, across many different habitats, tend to have better long term survival prospects in the face of changing climates and the co-evolution of competitors


Friday, July 26, 2013

A reverse QCA?



I have been talking to a project manager who needed some help clarifying their Theory of Change (and maybe the project design itself). The project aims to improve the working relationships between a particular organisation (A) and a number of organisations they work with (B). There is already a provisonal scale that could be used to measure the baseline state of relationships, and changes in those relationships thereafter. Project activities designed to help improve the relationships have already been identified and should be reasonably easy to monitor. But the expected impacts of the improved relationships on what B's do elsewhere via their other relationships have not been clarified or agreed to, and in all likelihood they could be many and varied. It will probably be easier to identify and categorise after the activities have been carried out, rather than during at any planning stage.

I have been considering the possible usefullness of QCA as a means of analysing the effectiveness of the project. The cases will be the various relationships between A and Bs that are assisted in different ways. The conditions will be different forms of assistance provided as well as differences in the context of these relationships (e.g. the people, organisations and communities involved). The outcome of interest will be the types of changes in the relationships between A and Bs. Not especially problematic, I hope.

Then I thought..., perhaps one could do a reverse QCA analysis to identify associations between specific types of relationship changes and the many different kinds of impacts that were subsequently observed on other relationships. The conditions in this analysis would be various categories of observed change (with data on their presence and absence). The configurations of conditions identified by the QCA analysis would in effect be a succinct typology of impact configurations associated with each kind of relationship change. As distinct from causal configurations sought via a conventional QCA.

This reversal of the usual QCA analysis should be possible and legitimate because relations between conditons and outcomes are set theoretic relations, not temporal relationships. My next step, will be to find out if someone has already tried to do this elsewhere (that I could learn from). These days this is highly likely.

Postscript 1: The same sort of reverse analyses could be done with Decision Tree algorithms, whose potential for use in evaluations has been discussed in earlier postings on this blog and elsewhere.

Postscript 2: I am slowly working my way through this comprehensive account of QCA, published last year:
Schneider, Carsten Q., and Claudius Wagemann. 2012. Set-Theoretic Methods for the Social Sciences: A Guide to Qualitative Comparative Analysis. Cambridge University Press.