APPROACH PAPER – GUIDANCE NOTES Evaluation performance ... · Evaluation Approach Paper: Guidance Notes –Evaluation Performance Rating System 2 / 4 EBRD Evaluation department

ab0cd

APPROACH PAPER – GUIDANCE NOTES

Evaluation performance rating system

May 2013

EBRD EVALUATION DEPARTMENT

The Evaluation department (EvD) produces approach papers to guide the conduct of an evaluation and to inform stakeholders of the approach proposed. Although comments are welcome, EvD is the sole decision-maker incorporation of comments. An approach paper is the first output and milestone of an evaluation. Approach papers will generally approved by the Chief Evaluator 2–4 weeks after a study has started, and be based on a preliminary document review and initial internal consultations. Complex studies may require more time before clarity emerges on the objectives, evaluation questions, conceptual basis for the study and methods to be employed.

Evaluation Approach paper – guidance notes

Evaluation performance rating system May 2013

EBRD Evaluation department

May 2013

Contents

Summary 1. Product type ......................................................................... 1

2. Rationale for inclusion in the work programme ................ 1

3. Principles .............................................................................. 2

4. Opportunities and Issues .................................................... 3

Annex 1: Issues and Opportunities

For further details contact: Evaluation department Keith Leonard Tel: +44 20 7338 6721 Email: [email protected]



EBRD Evaluation department May 2013

1. Product type This paper provides the rationale and approach for preparing a guidance note on the EBRD's evaluation performance rating system. It identifies a number of opportunities to strengthen the existing project performance rating system and extend it to a wider range of evaluation products.

2. Rationale for inclusion in the work programme The new Evaluation Policy approved by the Board in January 2013 establishes that "execution of the Bank’s evaluation policy on a consistent and systematic basis will require clear guidance on matters such as technical standards and processes". Further, it mandates the Evaluation Department (EvD) to prepare such guidance material in the form of guidance notes to "ensure the integrity of the EBRD’s evaluation system by: developing methods and processes for evaluation, in consultation with Board and Management wherever necessary…" Guidance on the project performance rating system was embedded in

previous evaluation policies. This was purposefully not included in the new policy. Rather, matters of a procedural and methodological nature will now be contained in separate guidance notes. This creates a clear separation between strategic and procedural matters. It also permits easier update of technical and procedural guidance without having to revisit the policy. Aside from project performance rating, EvD does not have codified guidance on performance rating for technical cooperation (TC) projects, or for use in the various types of special studies carried out. For TC projects and special studies, practice has coalesced around using some mix of OECD-DAC evaluation criteria of relevance, effectiveness, efficiency, impact and sustainability. The absence of formal guidelines on performance rating for these other evaluation products has allowed a range of approaches to

be tested and for a degree of evolution over time. However, it is now timely to develop a common understanding of which criteria should be

Aside from project performance rating, EvD does not have codified guidance on performance rating

Evaluation Approach Paper: Guidance Notes –Evaluation Performance Rating System 2 / 4


May 2013

used, how rating should be carried out, and how ratings of components should be aggregated into an overall rating. Given the nature of evaluation, a degree of evaluator discretion is essential; but there must also be consistency in the applications of both standards and judgment. In all cases such judgments and standards must be transparent. The new guidance will seek to address these and a number of other issues.

3. Principles It is proposed that the following principles should guide development of a new evaluation performance rating system:

− A rating system should provide a robust measure of performance. While this may seem self-evident, as the appended issues paper discusses, this means only including criteria and sub-criteria in the overall performance assessment that are directly attributable (or largely attributable) to the EBRD and which affect performance. Those criteria for which the operation is not solely or largely responsible, or which are the consequence of success rather than a determinant of it, could be assessed, but perhaps not included in the overall performance rating. Rather, they could be assessed and rated on a standalone basis, an option to be considered during preparation of the approach paper.

− A rating system should be easily understood with a logical and persuasive linkage between the criteria used and the overall rating.

− To the extent possible, a common framework should apply to all types of evaluation that rate performance (this includes self-evaluations by Management and independent evaluations by EvD, as well as all objects of evaluation − investments, frameworks, technical cooperation, non-TC grants, strategies and policies)

− Application of a rating system should result in a high degree of consistency and comparability across evaluation products and over time. This means finding an appropriate balance between following a standard approach versus the exercise of evaluator

Criteria of a robust measure of performance must be directly attributable to the EBRD

3 / 4 Evaluation Approach Paper: Guidance Notes –Evaluation Performance Rating System


judgment (the nature and rationale of which should be fully documented)

− A performance rating methodology should follow international good practice standards (principally the Evaluation Cooperation Group and the Evaluation Network of OECD-DAC) to the extent possible to allow comparability of results among international financial institutions (IFIs) and development agencies

− Application of a rating system must be evidence based − guidance should indicate the type of evidence that should be provided

− There should be a close relationship between the ways in which ex-ante expected performance and ex-post actual performance are assessed since ex-post evaluation largely evaluates actual performance against expectations.

− The criteria chosen should reflect the purpose of the EBRD and its unique mandate.

− Ideally, changes made should allow comparability with past evaluations.

4. Opportunities and Issues Issues and opportunities that may be addressed by the new guidance note (more may emerge during the course of the work) are:

− To rate or not to rate?

− Lack of clarity and consistency in the definition of expected results and how to deal with this in evaluation

− How to assess and rate performance when expected results are unrealistic (whether overly optimistic or pessimistic). Dealing with incomplete specification of expected results

− Handling changes of scope during implementation

− Which criteria should we use for the assessment of performance and should we group them?

− Which criteria should we use to derive an overall performance rating and which might be assessed but not used in the derivation of an overall rating?

Establishing an appropriate balance between a standard approach and the exercise of evaluator judgment

Evaluation Approach Paper: Guidance Notes –Evaluation Performance Rating System 4 / 4


May 2013

− How should we deal with the future nature of transition impact at evaluation?

− Should we change the way we evaluate environmental and social performance?

− Clarifications needed on the use of project and company financial performance and use of FIRR and EIRR

− Issues surrounding the assessment of the fulfilment of operational objectives

− Issues surrounding the rating of additionality

− How should the Bank's investment performance be measured and should it be included in the overall rating?

− Should bank handling be included in the overall rating?

− How many benchmarks should be used for each criterion and should descriptors be used, and if so, what descriptors?

− Currently, there is no guidance on the evidence base required − this needs to be addressed

− How to derive the overall performance rating − should a process of weighted scores be used with cut-off points?

Embedded within these is the need to find an appropriate balance between following a standard approach and the exercise of evaluator judgment, and the importance of transparency in how performance assessments have been arrived at. Consideration will be given to current guidance on project performance rating. Recently adopted guidance for self-evaluation will in part foreshadow changes that may be made to the performance rating approach. There are a number of exercises underway that make the preparation of a guidance note on performance rating methodology timely. First, the Transition Impact Monitoring System (TIMS) is under review as part of the Results Taskforce. The Working Group on TIMS is expected to submit its final report by April 2013. EvD is a member of the working group and is engaged with those responsible for implementation of the recommendations of the Grant Strategic Review. EvD will also stay in close touch with initiatives underway to make country and sector strategies more results oriented. A close dialogue will be required with the Portfolio Management Group.




Annex 1: Issues and Opportunities This paper identifies 17 issues that will probably need to be taken into account during preparation of the guidance note. The issues are not mutually exclusive so some overlap and repetition is unavoidable.

1. To rate or not to rate?

While a decision when to use rating need not be part of the guidance note, the rating methodology obviously needs to take account of the circumstances in which it might be used. Opinions are divided in the evaluation world as to whether rating is a good or bad thing. Without going into the details, one's point of view tends to revolve around whether accountability or learning is seen as the principal purpose of evaluation. Rating is generally seen as a necessary part of accountability but many believe it acts as a hindrance to learning as the resulting discussion may focus on the rating rather than the findings, lessons and recommendations. However, the derivation of a performance rating serves other purposes besides being a measure for accountability purposes. It also provides a structure, standardisation and transparency to the performance assessment that may not be present or transparent without the discipline of rating. Ratings can also focus the attention of the users of evaluation - it may be hard to get the attention of users without a clear cut conclusion that a rating provides. For these reasons, the use of a rating system can be beneficial for both individual operations and special studies. It is likely that rating will remain the norm for the evaluation of individual operations. An option that can be considered for special studies is that the decision to rate or not rate could be made on a case-by-case basis with the pros and cons argued in the approach paper. It is worth noting that an evaluation can be structured along the lines of a rating system without actually including ratings in the final report.

2. Lack of clarity and consistency in the definition of expected results

Results in the EBRD are specified in a number of ways but particularly in terms of transition impact. Because transition impact is considered to occur at the client, sector or industry and economy-wide levels it cuts

Annex 1: Issues and Opportunities 2 / 18


May 2013

across the spectrum of outputs, outcomes and impacts as defined by OECD-DAC. When efforts were first made (in 1997) to define transition impact more clearly, it was stated as occurring at the level of the economy or society as the following quote shows: "additionality refers to the Bank’s influence on a project……Transition impact, by contrast, refers to an influence of the project on the economy or society," OCE's website now defines transition impact as "the likely effects of a project on a client, sector or economy, which contribute to their transformation from central planning to well-functioning market-based structures." The new OPA requires transition impact to be identified at the levels of corporate/client and industry (sector)/economy as a whole. Benchmarks to be used in transition impact monitoring system have also evolved over time, which has had the effect of widening the scope of transition impact (with more recent clarification on the incorporation of social inclusion). The new guidance will need to clarify the meaning of transition impact for the purposes of evaluation, the levels at which it is expected to occur and the terms to be used. Beyond this, results are also specified in terms of operational objectives (generally outputs and outcomes), the EBRD investment performance and environmental and social impact (a mix of outputs, outcomes and impact). As the OCE website says "this also means not everything that is good about a project is necessarily transition impact." A move to organise results under the categories of outputs, outcomes and impact would be a way to establish a clearer hierarchy of results than currently exists, which would assist in performance rating (and indeed monitoring since the achievement of outcomes depends on successful delivery of outputs and the contribution to impact upon the successful achievement of outcomes). By OECD-DAC definition, outputs and outcomes would be results solely attributable to the EBRD's intervention whereas impact would be considered something the EBRD contributes to but is not solely responsible for. Accepting these definitions would lessen the burden of having to assess and attribute impact for each transaction. Since transition impact as currently defined can take place at the sector and economy-wide levels, we may be trying to measure something at the operation level that is not attributable solely to it. Also, the impact of an individual operation may not be observable at the normal time of evaluation, and/or impact may only be realised as the cumulative effect of a series of transactions. For these reasons, impact as defined under OECD-DAC is sometimes discussed in an evaluation of an individual operation (and maybe even rated either in terms of actual or likely achievement) but it is not commonly used as a criterion for deriving an overall performance rating. If EvD moves to using a clear hierarchy of



results following OECD-DAC definitions, transition impact could be broken down into its component parts following OECD-DAC definitions for output, outcome and impact with ordinarily only the first two elements included in the rating of system for individual transactions. The impact part of transition impact (which would mostly be sector or economy-wide) could be discussed in an evaluation of an individual transaction in terms of potential contribution but not included in the operations performance rating. On the other hand, impact could be a criterion for deriving an overall performance rating for special studies of groups of operations that have taken place over a time period that could allow sector and economy-wide impact to be observable (such may be the case for framework or facility evaluations, and sector and country programme evaluations).

3. Realism in the statement of expected results

Ex-post evaluation as practised in IFIs and the development community more generally is largely (though not entirely) an assessment of results achieved against expectations established at approval. Unexpected results, whether positive or negative, are also taken into account ex-post but the achievement of expected results is generally the focus of attention and these generally get the greatest weight. Because ex-ante potential transition impact is used as a hurdle level to be achieved for approval (not less than a satisfactory rating) and because this indicator is used in team and the corporate scorecards, there is a clear incentive bias toward: (i) unrealistically high expected transition impact: and/or, (ii) the achievement of transition impact highly dependent upon covenants that are unlikely to be realised (e.g. tariffs set at full cost recovery:): and/or, (iii) a projected decisive transition impact contribution from technical cooperation that may not receive a high priority once the operation is approved. In addition to the incentive structure, there can be an embedded institutional view that Board approval requires unrealistic claims to be made regarding expected results. This is not to say that people act on these incentives; but they quite likely do exist whether acted on or not. Theoretically, teams could also face an incentive to be unrealistically pessimistic about expected results so it would be easy to meet or exceed expectations. This does not appear to be a significant issue because of the more immediate incentives provided by getting approval. However, there are specific circumstances where low expectations may be accepted at approval for certain aspects of a project -commonly, these may be for bank investment performance or environmental and social performance. The Board may decide that a project can be approved with negative projected profitability because other considerations (e.g. high transition potential) outweigh that factor. Or it may approve a project



May 2013

which has no chance of meeting EU environmental standards in the near future (the basic standard set out in the Bank’s Environmental & Social Policy) because the EBRD’s involvement will nevertheless have an important positive effect (significant Environmental Change). If the project then performs as planned, do we rate these aspects unsatisfactory according to the normal benchmarks, or satisfactory because they are in line with expectations approved by the Board? The challenge of realism in the specification of expected results is of course outside the purview of a guidance note on performance rating; however, ways of dealing with it do need to be covered.

4. Incomplete specification of expected results

During due diligence bankers are encouraged to focus on what are expected to be the main sources of transition impact. Subsequent monitoring by the transition impact monitoring system tracks only the benchmarks identified in Operation Reports for these. While selectivity in terms of what transition impact is most likely to occur may make sense for screening purposes, it can understate the achievements. EvD, on the other hand, evaluates all transition impact whether specified ex-ante or not. Projects also have operational objectives (outputs mostly) and in addition may produce a range of environmental and social outputs, outcomes and impacts. These are specified to varying degrees of evaluability in Operations Reports but are seldom drawn together in one place in a hierarchy of results with time-bound and measureable indicators including target values and baselines (this is a finding of EvD's Project Performance Metrics study that rated the evaluability of a sample of recently-approved operations). Strategies and policies generally provide little on expected results and how success will be measured. Poor and/or incomplete specification of expected results ex-ante makes evaluation more difficult and less value-adding than it might be with better specification of expected results. Evaluators can and often do address this common reality by constructing a results framework ex-post from the fragmentary and incomplete information available in order to provide an evaluative framework. This is very much a second best solution. Clear guidance is needed that in clarifying expected results, these should not be restated based on an evaluator assessment of what the results should have been. Rather, the evaluator should make clearer what the expected results were and organise these into a hierarchy along with targets and baselines where these were provided.



5. Changes in scope during implementation

If the scope of the operation changes during implementation, should evaluation be against the original or revised objectives? Currently, there is no guidance on this and practice is probably variable. There are various ways in which this issue might be addressed - the guidance note will need to determine the most appropriate. Two options might be:

i) If a change of scope is required for reasons which should have been foreseen during due diligence (the risk identified and a mitigation strategy planned) then this is a negative for project performance. The negative may be best taken into account by evaluating the project against its stated objectives at approval rather than the revised objectives after the change in scope. If a change of scope is required for reasons that could not reasonably have been foreseen, and if the EBRD's response was timely and appropriate, then performance should be assessed against the revised objectives.

ii) Irrespective of the reason for the change in scope, the evaluation needs to assess against the revised objectives as the change of scope approval is an official change to the objectives. However, the assessment needs to include an analysis of the reason for the change, which, through the exercise of evaluator judgment, can influence the rating given.

6. Current evaluation criteria

EvD uses 7 evaluation criteria for the performance rating of operations (9 criteria if project and company financial performance, and environmental and social impact and change are counted as separate criteria, which, unlike the 3 TI sub-criteria are not amalgamated into a single rating). Within the 7 criteria, the overall transition impact criterion is derived from 3 sub-criteria each with its own rating—realised transition impact, transition impact potential, and risk to transition potential. The criteria are: i) Transition Impact

− Realised TI at time of evaluation

− TI that can still be achieved

− Risk attached to achieving remaining TI ii) Environmental and social performance and change



May 2013

− Environmental and social performance of the project and the sponsor

− Extent of environmental and social change iii) Additionality iv) Project and company financial performance v) Fulfilment of project objectives vi) The Bank's investment performance vii) Bank handling of the project Given EBRD's particular mandate, these criteria are unique to the EBRD since they provide for a very close alignment between the Bank's mandate and the assessment of its performance. However, a possible downside of unique criteria is that it may be difficult to compare the EBRD's ratings with those done by others for similar projects. The International Finance Corporation (the EBRD's closest comparator) generates an aggregate Development Outcome rating using 4 indicators. The four criteria are:

i) Project/company financial performance (this also considers the fulfilment of the project business objectives)

ii) Project/company economic sustainability iii) Project/company's contribution to the IFI's mandate objectives

(this is where transition impact rating comes in for the EBRD) iv) Project/company's environmental and social performance

Through IFC influence, the Development Outcome rating and criteria for determining it have been adopted in the Evaluation Cooperation Group Good Practice Standards (ECG GPS) for the evaluation of private sector operations. However, the EBRD never fully adopted this approach as it was judged not reflective of the Bank's mandate and unique characteristics. In fact, the EBRD formally opted out of the second criteria as being irrelevant for its circumstances. For some years, EvD did calculate a Development Outcome rating and reported this in its annual evaluation report, but this has never been used as the basis of overall rating and the reporting on performance trends. Four of the EBRD's criteria can be accommodated under the three ECG criteria that the EBRD accepts (being [i], [iii] and [iv] above). The three not included in the Development Outcome rating are additionality, the Bank's investment performance and Bank handling. Despite the fact that it is unlikely that the EBRD will follow ECG GPS as the means of deriving an overall performance rating, the GPS provide detailed and useful



guidance on measurement methods for the three criteria, some of which can be used in the new guidance note. The ECG GPS are also vague as to how a Development Outcome (i.e. overall) rating should be derived from the component rating criteria. ECG GPS states "the rating of project outcome reflects summary qualitative performance judgments based on a synthesis of all the following indicator ratings, taking into consideration the sustainability of results." This is a rather non-specific and so an unsatisfactory basis for ensuring transparency, replicability and comparability in derivation of an overall rating (this does not detract from the fact that the individual criterion level, ECG GPS provide useful material for the new EBRD guidance note). For special studies and technical cooperation, EvD generally uses some mix of the standard OECD-DAC criteria of relevance, effectiveness, efficiency, impact and sustainability for rating purposes although without guidance on how these will be applied in practice. The opportunity exists to group EBRD-specific criteria using a standard set of high level criteria such as the OECD-DAC criteria, for example: relevance, effectiveness and efficiency could be defined as core criteria for deriving an overall performance rating with the other criteria of impact and sustainability either forming part of overall rating or acting as criteria outside the overall performance assessment as befitting the circumstance. As well as being in widespread use in the evaluation world, the OECD-DAC criteria are also relatively easy to understand—did we do the right things (relevance), what results were produced (effectiveness), and what was the relationship between results and costs (efficiency). An added advantage is that EvD already uses OECD-DAC criteria in evaluation of technical cooperation initiatives and for special studies. Use of the standard criteria could also put attention on important aspects not well covered by the current criteria - in particular, a broader consideration of relevance beyond additionality, and a more extensive consideration of efficiency and consideration of the sustainability of results. Having suggested a move to use OECD-DAC criteria for projects as well as for other types of evaluations, the intent would not be to replace the EBRD-specific criteria with the OECD-DAC criteria. Rather, it would be possible to use the OECD-DAC criteria as a means for grouping existing criteria, which in effect would become sub-criteria - either as part of overall rating or outside this. In fact, discussion has occurred within EvD over many years about moving to use the "standard" OECD-DAC evaluation criteria. Several useful efforts have been made to "map" existing criteria to the standard criteria such that some or all of these would become sub-criteria. An example of this mapping is shown below. This could be a useful starting point for reviewing criteria and sub-criteria.



May 2013

Relevance

− Additionality Effectiveness

− Fulfilment of project objectives

− Outputs and outcomes of associated with transition, environmental and social impact that are largely attributable to the project (mostly client-level transition impact, and extent of environmental and social change, and environmental and social performance of the project)

− Project and company financial performance Efficiency

− FIRR and/or other financial ratios demonstrating efficient funds use

− Bank investment performance

− Bank handling in terms of efficiency of process Sustainability

− The EBRD's understanding of sustainability is different from that of OECD-DAC. At the EBRD the concept is one of environmental and social sustainability rather than sustainability of project results (the dimensions are assurance, impact and engagement). However, it would be possible consider company/financial performance at time of evaluation under effectiveness and projected future financial performance as evidence of the sustainability of benefits.

− The currently used idea of remaining transition impact and the risk to achieving this could be utilised as part of a sustainability assessment.

Impact

− Transition impact at the sector and economy-wide levels

− Environmental and social performance of the sponsor

7. Overall performance assessment

Current guidance starts by stating that the “overall performance rating is the composite of the following [seven] individual ratings.” It then provides a matrix of ratings for “four major performance indicators



(transition impact, project/company financial performance, fulfilment of objectives, and environmental and social performance) the mix of which is used to derive the overall rating. The policy adds that “Transition impact gets the highest weight with judging the overall performance of an operation” but it does not say what that weight should be. Guidance notes “apart from these four major indicators, of course the remaining indicators…also play a role when assigning the overall performance rating, but to a lesser degree define the overall performance outcome of a project.” Again, it is not clear how this is to be done or what weight should be assigned to these other indicators. Either this should be more specific or there should be a requirement for evaluators to be clear about the approach they have adopted and why. Current guidance shows 4 possible combinations of the 4 principal criteria that can result in an overall highly successful rating; 10 combinations that can give rise to a successful rating, 6 combinations that would lead to a partly successful rating and 4 combinations for an unsuccessful rating. The commentary on the table notes that “the combinations of ratings for assigning an overall performance rating in the above table are not exhaustive.” No guidance is given on the weights to be applied or, as noted above, on how and to what extent the other 3 criteria should be taken into account. Although the current project performance rating system says it provides guidance on the weightings to be used in aggregating the individual criteria into an overall performance rating, it only does so in the most general terms by saying transition impact should get the highest weighting. The 2010 Annual Evaluation Overview presented correlations between overall performance ratings and criteria ratings (among others), showing that the criteria most closely correlated with overall performance (correlation coefficient in brackets) were fulfilment of objectives (0.86), transition impact (0.85), project financial performance (0.73) and bank handling (0.73). The least correlated were environmental change (0.32), environmental performance (0.44) and bank investment performance (0.53). Three of the four most highly correlated criteria are those that are expected to be the main determinants of overall performance under current guidance. The last of the four core criteria, environmental performance, is less highly correlated. That EvD has felt it necessary to conduct such analysis a number of times suggests a lack of clarity on how evaluators arrive at overall performance ratings in terms of the relative weight given to the various criteria. Having standard weights (with discretion available to vary these in a transparent manner) would overcome this problem. The system for determining an overall rating is less clear than it should be, which may result in (i) a lack of transparency, (ii) difficulty in replicating the result, and (iii) make it difficult to analyse underlying



May 2013

trends in overall rating through looking at trends in aggregate criteria ratings. While a degree of evaluator judgment is necessary and desirable given the nature of evaluation in the EBRD (things are often not clear cut and the evidence base is frequently patchy), too much discretion (or the exercise of evaluator discretion without explanation) risks undermining consistency in the derivation of ratings and transparency regarding how overall ratings are arrived at. Readers should be able to follow the logic of how a rating was derived and have the opportunity to consider the evidence upon which it is based if they are to have confidence in the rating. Also, other evaluators should be able to replicate the rating using the same approach, and to adduce a clear relationship between overall performance ratings and aggregate criteria ratings. The system for deriving overall performance ratings for technical cooperation and in special studies is undocumented and is largely driven by past practice, which has evolved over time. The gives rise to the same problems as exist for project performance assessments. Turning now to issues surrounding EvD's current project performance evaluation criteria (not including benchmarks which are addressed more generally as Issue 15):

8. Handling remaining transition impact potential at evaluation

According to the current guidance, overall transition impact will be based on an assessment of the short term realised transition impact, the longer term transition impact potential that can still be realised, and a risk rating attached to the latter. There are a number of issues here.

i) First, has project/company level transition impact been adequately accounted for under project and company financial performance (unlikely since many sources of transition impact would not be captured in financial performance within the evaluation timeframe: for example, improved corporate governance) or fulfilment of operational objectives (possible)?

ii) Second, should transition impact at the industry and economy levels be considered as part of the performance rating of an individual operation since these are unlikely to have been achieved by the time of evaluation and/or it may take more than a single transaction to achieve? The possibility of having so much of transition impact achievement still in the future and subject to a risk assessment is problematic for what is supposed to be an ex-post performance assessment (the results



should be observable when the evaluation is done, at least by proxy measures if not directly). A partial solution here might be to do evaluations later than has typically been the case in the EBRD (particularly, EvD's independent evaluations, which could be achieved by having a greater gap between self-evaluation and EvD's in-depth evaluation) While the option to do evaluations later remains on the table it is not proposed to consider this in the performance rating guidance note, which is intended to focus solely on the rating methodology.

iii) Third, the policy does not give guidance on how evaluator judgment should be exercised on reaching a conclusion on overall transition impact. What weights should be given to the achievement (or non-achievement) of expected versus unexpected transition impact and how should the realised, remaining potential and risk to potential transition impact be combined into an overall rating? If the current approach to rating transition impact is to be retained, clearer guidance on this, including the possible use of weights, would be desirable.

iv) Fourth, current guidance requires evaluators to look at the relevance of the transition impact benchmarks selected during preparation. Relevance of the expected transition impact is probably best assessed separately from the actual results because of the difficulty in making trade-offs between relevance and results—for example, modest achievements of highly relevant results versus significant achievement of less relevant results. If, we adopt the first level criteria of relevance, effectiveness, efficiency among others, the relevance of transition impact benchmarks would be more appropriately considered under relevance.

9. Environmental and social performance

Current guidance requires two separate assessments under environmental and social performance. The separate ratings are for “environmental and social performance of the project and sponsor” and “extent of environmental change.” These are not amalgamated into a single rating as is the case for the three dimensions of TI. If it is decided to retain the two separate ratings, the possibility of deriving a single overall environmental rating should be considered. Current guidance is not particularly clear on the difference between these two assessments. Environmental and social performance of the project and sponsor "measures how well the environmental and social objectives of the project … have been met" while the extent of environmental change "is measured as the difference between the



May 2013

environmental and social performance before the project started and its performance at the time of evaluation." In practice, the difference is clearer as the former is about meeting a set of standards (generally EU standards) while the latter is more of a before and after assessment, irrespective of whether standards have been met. The extent of environmental and social change assessment has its roots in the legacy type projects the EBRD frequently engages in. The intent was to give recognition to environment achievements even if these did not fully comply with environmental standards. The rationale for two separate assessments is not strong. It is inherently unsound to set objectives that can't be met and then devise a criterion to countervail the negative assessment of failing to meet impossible objectives. The new performance assessment guidance note will revisit how environmental performance/change is assessed and rated. A further issue is that social dimensions are becoming more important to the EBRD so the rationale for including them under environmental (which dominates) and social performance is perhaps becoming weaker than it was. Issues such as gender and social inclusion are rarely addressed in EvD or self-evaluations, though they form part of the standard client environmental reporting proforma. This probably needs to change as these issues come more to the fore in the Bank's work. However, there is a wider issue in that environmental and social results are just that - they are also outputs, outcomes and impacts of what the EBRD does. Accordingly, the question arises as to whether they should be addressed along with other results (TI and achievement of project objectives) under effectiveness, efficiency and impact criteria rather than as a separate rating? Again, the new guidance note will need to come to a conclusion on this. Another issue is that currently guidance does not differentiate for projects of differing environmental categories. This is something the guidance note should also consider.

10. Project and company financial performance

Current guidance lacks clarity on when project or company financial performance or both should be assessed. Again, this has been clarified in the new OPA guidance. This states "in most cases, you should analyse and rate either project financial performance or company financial performance" and it goes on to outline the circumstances in which each is appropriate. This could be carried through to the new guidance note. Current guidance talks about FIRR and EIRR but these are not often used now. The new guidance should consider whether to drop mention



of these or retain it and, if the decision is to retain, how to incorporate them into a rating framework (they are generally considered under efficiency, being efficiency of funds use).

11. Fulfilment of operational objectives

The current policy defines this criterion as “the extent of verified and expected risk weighted fulfilment of the operation’s process and project objectives (efficacy) upon validation of their relevance.” There are several issues here for consideration as the new guidance is developed.

i) First, by assessing only against project objectives, the evaluator is not directed to consider unintended results whether positive or negative.

ii) Second, it is best not to combine relevance and results under the same criterion (as noted above) as it is not clear how to make trade-offs between relevance and results (more but less relevant results versus less but more relevant results for example).

iii) Third, since the statement of project objectives is likely to reflect the outputs and perhaps outcomes of an operation, if achievement of these is still in the future we may be conducting independent evaluation too early (as noted above, it is proposed to cover the timing of evaluation separately).

iv) Fourth, process objectives may best be assessed under efficiency (of process).

v) Fifth, within OECD-DAC definitions efficacy is a synonym for effectiveness (one which derives from medical experimentation). It is probably best to stick with the single term of effectiveness.

12. Additionality

Current guidance identifies two elements to the EBRD's additionality: the additionality of its financing; and, whether the Bank "can influence the design and functioning of a project to secure transition impact." These sources of additionality are supported by the current review. However, there are methodological issues surrounding how they are measured.

i) First, guidance states "in judging additionality at evaluation one tries to verify whether the Bank was additional or not at the time the project was financed by the Bank." What this is saying, albeit maybe not as clearly as it might, is that we should not be using the benefit of hindsight about how things turned out in practice.



May 2013

This certainly makes sense for the additionality of financing since it is impossible to guess how the market will play out (e.g. the onset of the global financial crisis). It is sensible for evaluation to only assess whether the claims made for additionality of financing at approval were justified or not for the purposes of rating. Of course, mention can be made of whether the claims made for the additionality of EBRD financing proved to be correct or not.

ii) Second, it is appropriate to assess the plausibility of claims made at approval for EBRD additionality in securing transition impact and an assessment of whether these claims were borne out.

iii) Third, does it make sense to have degrees of additionality? The Bank was either additional or not - as current guidance notes “there is a critical level of conditions above which a project becomes and remains additional. In judging additionality at evaluation one tries to verify whether the Bank was additional or not at the time the project was financed by the Bank.” However, we currently have four degrees of additionality in the benchmarks.

iv) Fourth, what conclusion should be reached on project performance if it is judged that the Bank was not additional at the time of approval? Should the operation automatically be rated unsuccessful?

v) Fifth, current guidance seems to provide for some degree of double counting. Under the assessment of TI “EvD will also question whether the most relevant TI criteria/objectives were selected…” while under additionality the assessment will determine “whether the Bank can influence the design and functioning of a project to secure TI." The potential for double counting should be addressed in the new guidance.

As noted in work being carried out by a consultant as input to the 2012 Annual Evaluation Review, guidance in the Bank's Operations Manual identifies four dimensions as shown in the following table: Table 1: Additionality dimensions recorded in an Operation Report

Additionality Dimension Verification and/or counter

factual results Timing

Terms Market (country and segments) benchmarks [Annex on capital markets review]

e.g. - Already achieved as result of project preparation

EBRD attributes Preferred creditor status, political risk carve out, dialogue

e.g. - Before signing



with federal or local governments, regional relationship with sponsor, experience in country, sector or with innovative financial instrument…

Conditionalities Corporate governance standards, board representation, procurement, environment…

e.g. - During implementation

Commercial Mobilization Syndication, local parallel financing, underwriting IPO arrangements, co-financing from private equity-funds…

Others…please seek to indicate precise dates

Source: EBRD Operations Manual Section 1.5.4

Use of this ex-ante assessment framework for ex-post evaluation to harmonise ex-ante and ex-post assessment can be considered. Subsuming aspects of additionality under a relevance criterion and following more closely the ex-ante assessment guidelines could be one way of dealing with the above problems. Alternatively, a case could be made for rating additionality but not including this in the overall project performance rating. The rationale here is that additionality may not influence project performance much (particularly in terms of additionality of financing) or that the Bank's success in leveraging TI and other results should be picked up under other overall rating criteria - particularly effectiveness.

13. The Bank’s investment performance

There are a number of issues surrounding this criterion that the new guidance note could consider. First, is it relevant to include this criterion as a measure for assessing the performance of an individual operation since it is an outcome of project success and not a determinant of it? Second, is it relevant from an evaluation perspective to assess the Bank’s investment performance on a transaction-by-transaction basis? The Bank may have very valid reasons for funding small projects, projects in early transition countries, and other types of projects that may result in poor performance under this criterion. Should we give negative ratings for projects that are only so-rated because they are delivering on a particular policy objective? Third, questions surround the adoption of an arbitrary “twice the direct costs” figure because indirect costs cannot be attributed to individual transactions. Does this render the performance measure largely invalid? It is understood that this issue is under review by Management so the guidance note will need to take account of changes made. Fourth, does it make sense to use appraisal estimates as the benchmark for determining the rating? The Audit Committee has



May 2013

previously rejected use of hurdle rates of return as a basis for assessment but it may be considered worth revisiting this. For these reasons perhaps Bank investment performance should be a criterion that is rated but not included in the overall performance rating. The pros and cons of this will be considered during preparation of the guidance note.

14. Bank handling

Bank handling and sponsor performance can help explain performance so it is appropriate to include a consideration of these in an operations evaluation. However, is it double counting to have bank handling as a rating criterion for overall performance rating since it will already be reflected in the assessments of relevance, effectiveness and efficiency of the operation? The same goes for sponsor performance. Would a better alternative be to have bank handling (and possibly sponsor performance) but not include it in the overall performance rating? The ratings of bank handling and sponsor performance outside of overall performance rating may contribute to explaining the reasons for the overall performance rating. Turning back now to some more general issues:

15. Benchmarks for criteria rating

There are a number of discrete issues under the broader one of benchmarks for deriving criteria ratings as provided for in current guidance. First, 7 of the 9 criteria/sub-criteria have 6 rating categories with benchmarks for each while 2 (extent of environmental and social change and additionality) have 4 categories. Where 6 rating categories are provided for they are: excellent, good, satisfactory, marginal, unsatisfactory and highly unsatisfactory (the last category is termed negative for transition impact). Experience has shown that six rating categories provides for an unnecessary degree of detail. Four rating categories are generally considered to be sufficient of which two will be the main categories "above and below the line" -a rating for where expectations were met and another where there were achievements against expectations but there were significant shortfalls. Two further categories cover the situations where expectations were significantly exceeded and where they fell well short. Second, the bottom rating of the six is almost never given so effectively EvD has three benchmarks "above the line" and two "below the line" which is not good practice.



Third, the project descriptors used imply value judgments: excellent, good, satisfactory, marginal, unsatisfactory and highly unsatisfactory. Better perhaps to be more neutral through use of terms such as "expectations exceeded", "expectations met", "expectations partly met", "expectations not met". Other alternatives exist, for example, each criterion could have descriptors reflecting the criterion name such as "highly relevant", "relevant", "partly relevant", or "irrelevant". A further option is to drop descriptors altogether and just use a numeric score. Fourth, does it make sense to base assessments solely on the basis of being better or worse than appraisal estimates since appraisal estimates may be flawed in some way (overly optimistic, too conservative or just wrong in some respect)? Would it be better to check the robustness of the appraisal estimate and make comparisons made against benchmarks such as hurdle rates of return, or cost of capital (recognising though that the EBRD does not currently establish either hurdle rates of return or cost of capital), or just levels set arbitrarily by EvD for the purposes of rating?

16. Absence of guidance on the evidence base required

Current guidance does not specify the types of evidence that should be provided in support of ratings. The Evaluation Cooperation Group’s Good Practice Standards provide useful guidance here that the new EBRD guidance note can draw on.

17. Deriving the overall performance rating

How should the individual criterion ratings be aggregated into an overall rating? As noted in Issue 6, current guidance for deriving an overall rating may not ensure a consistent approach is followed. If change is judged to be necessary, at least two options could be considered:

− A weighted score approach whereby each criterion is rated on a four point scale from 3 to 0 from highest to lowest and a total weighted score calculated which has cut-off points for the overall rating categories of highly successful, successful, partly successful or unsuccessful.

− A sequential system whereby the criteria are looked at in turn and a decision tree followed to determine the overall rating. For example, if an operation is rated irrelevant it is automatically rated unsatisfactory and other criteria are not taken into account in the overall rating (they are still assessed though). This is a more logical approach as it avoids the double counting inherent in the weighted score approach) but it would represent a big



May 2013

change from current practice so this is probably not a viable option.

Implicitly, EvD has been following a weighted score approach even if it was not clear what weights were being applied and the weights applied may have varied considerably by evaluator. It is suggested that weights be made explicit in the new guidelines, with the provision for variation among different evaluation products and the reasoned and transparent exercise of evaluator judgment.

European Bank for Reconstruction and Development One Exchange Square London EC2A 2JN United Kingdom Switchboard/central contact Tel: +44 20 7338 6000 Fax: +44 20 7338 6100 Information requests For information requests and general enquiries, please use the information request form at www.ebrd.com/inforequest Evaluation department Tel: +44 20 7338 6467 Fax: +44 20 7338 6726 Email: [email protected] Web site www.ebrd.com

APPROACH PAPER – GUIDANCE NOTES Evaluation performance ... · Evaluation Approach Paper: Guidance Notes –Evaluation Performance Rating System 2 / 4 EBRD Evaluation department

Documents