By Declan Hunt

Over the last two decades, worldwide investor interest in socially responsible investing (SRI) has grown significantly. SRI has evolved rapidly from a negative screening process removing from consideration firms operating in socially undesirable industries, to considering a plethora of other factors unique to the individual firm. In choosing a socially responsible portfolio, today’s investor will also consider non-financial issues such as the company’s environmental, social, and corporate governance (ESG) aspects.

Across developed markets, sustainability reporting remains at the discretion of the organisation, however recent trends indicate ongoing increases in both the percentage of firms presenting sustainability reports and the level of detail in reports presented, largely in response to increases in investors’ adoption of SRI. Nevertheless, the research-intensive nature of evaluating the performance of individual firms across ESG aspects has given rise to a number of institutions offering ESG ratings. Unlike financial ratings, no consensus or regulation exists for deriving an ESG score.

Concerningly, a 2015 study by Dorfleitner, Halbritter, and Nguyen finds a lack of similarity in ratings between different measurement methodologies, however Romero, Jeffers, Lin, Aquilino, and DeGaetano (2018) find strong similarity between a different sample of rating methodologies. This article looks to summarise each paper and better understand the causes of differing ESG scores by examining the methodologies utilised by rating agencies. It further serves to encourage readers to critically evaluate any ESG rating system they use in personal investing.

Dorfleitner, Halbritter, and Nguyen examine three rating databases: ASSET4 by Thomson Reuters, Kinder Lydenberg Domini & Co. (KLD) by MSCI, and Bloomberg Sustainability. ASSET4 began rating ESG aspects of 1000 firms in 2002 and has since grown to assess over 4300 companies across the S&P 500, Russell 1000, MSCI Europe, FTSE 250, ASX 300, and other world indices. The KLD ratings cover the largest 3000 US firms by market capitalisation. Bloomberg claims to have assessed over 20000 companies, however, data for only 4100 companies across 52 nations is provided. In the majority of databases, a firm’s ESG credentials are captured in a single number score, reflecting scores in sub-criteria in each of the environmental, social, and governance aspects. The ASSET4 rating is unique in including an additional ‘economic’ aspect to assess the firm’s ability to generate value in the long run by using best management practices. Scores are derived from a mix of positive and negative indicators from firm’s reports and additional public information.

In the environmental aspect, agencies largely rate companies on the same metrics: emissions, water, waste, resource reduction, biodiversity, and impact of operations. However, differences do exist, with the most prominent examples being that ASSET4 is the only agency to consider animal testing, whereas the legality of environmental practices is only considered by Bloomberg and KLD.

The social aspect encompasses approximately half of all data points for each agency. Broadly, all agencies assess the same areas: employment quality, health & safety, diversity & opportunity, human rights, and product responsibility. However, there are significant differences in the measurement of each of these areas. KLD evaluates health & safety through two indicators regarding the firm’s OH&S programs and controversies, whilst Bloomberg and ASSET4 have detailed ratings on specific industry-relevant policies and the number of reported workplace accidents. Another significant difference is that KLD does not consider employee training and development in the social aspect, whilst ASSET4’s data evaluates the indirect impact of further training on local communities. ASSET4’s data additionally permits it to rate compliance with political donation laws, levels of business ethics, and tax controversies, which are not monitored by other providers.

The governance aspect has the smallest amount of broadly assessed areas shared across providers: reporting quality, public policy, governance structure, and ethics controversies. Bloomberg and ASSET4 also consider shareholder and stakeholder engagement, and board function and structure. KLD does assess the firm’s board, however it contributes to the social aspect score.

ASSET4 forms ratings from 850 binary data points, which are then aggregated into over 250 ESG KPIs. The KPIs are combined into 18 category scores, which are components of the four “pillars” assessed. Each pillar is scored between either 0 or 100, where 100 represents the best score. A total score for each pillar and a combined total score are presented. The total ESG rating is an equal weighted average of the four pillars. Then scores are normalised from 0 to 100 and benchmarked across the complete library of 4300+ companies.

Bloomberg incorporates over 100 data points in forming its overall “ESG Total Disclosure Score”. The combination of environmental, social, and governance disclosure scores is tailored to different sectors to evaluate firms on sub-criteria most relevant to the industry. Disclosure scores range from 0.1 for companies which disclose no data points to 100 for companies which display every sustainability data point. Although Bloomberg’s rating is only a measure of the firm’s disclosure levels, Eccles et al (2014) show firms with high ESG performance also disclose a great deal more non-financial information. This being the case Dorfleitner et al used ESG Total Disclosure Scores as proxies for ESG ratings in their evaluation.

KLD forms ESG ratings from binary indicators of seven ESG related groups: environmental, governance, community, human rights, employee relations, diversity, and customers. Company information and news is classified into either positive or negative impacts, where the existence of either positive or negative information scores 1, and an absence of information scores zero. KLD has historically used between 62 and 80 indicators for components of the ESG rating, and currently uses 70 (2012). Unlike, ASSET4 and Bloomberg ratings, KLD does not provide a total score for either sub-criteria or a total ESG rating.

To allow for comparison, KLD scores require aggregation, and two popular approaches have emerged in academia. The first simply subtracts the sum of negative impacts from the sum of positives, however, as the number of indicators varies across years, the usefulness of this method is limited. The alternative approach developed by Kempf and Osthoff (2007) transforms concerns into strengths by taking the opposite binary value. To generate scores for sub-criteria, the binary indicators are totalled and then normalised from 0 to 100 to yield results in line with other methodologies. A similar approach is used to generate the total score.

To test convergence between ratings methods, Dorfleitner et al begin by presenting descriptive statistics of the three methods. Given the different methodologies used by each provider, it is unsurprising the distribution of ratings shows no similarity. ESG scores of ASSET4 exhibit a bimodal distribution with a concentration of higher and lower values, Bloomberg ESG scores exhibit a right skew, and KLD ESG scores (from both approaches) are largely concentrated in the range of 60 to 80. Distributions of the scores are presented below, and descriptive statistics are presented in Table 1. A random effects panel model confirms the descriptive analysis, finding significant differences in the average ESG score of the three rating methodologies.

Figure 1: Distribution of ESG scores
Table 1: Descriptive statistics of Dorfleitner et al full sample

To allow comparison given changes of rating methodologies over the sample period, the average ESG ratings of companies in the sample are also compared. Table 2 contains descriptive statistics for the mean ESG score of an individual firm from all agencies over the sample period. The mean values are not noticeably different from those of the full sample, however, there are notable differences in standard deviation, minimum, and maximum values. When considering differences in the amount of data collected to form ratings, it follows that ASSET4 has the largest sample standard deviation, followed by Bloomberg, and then KLD. Standard deviation of firm annual average scores for each agency were also computed, finding ASSET4 had the lowest year-to-year fluctuation with SD between 1.26 and 2.29, compared with 2.71 to 6.26 for Bloomberg, and 8.03 to 12.51 for KLD on account of changes to their number of indicators and methodology.

Table 2: Descriptive statistics of cross-sectional data

To determine convergence of the different ratings, correlations between the scores were analysed. Due to the panel structure of data, direct correlations could not be computed, rather the cross-sectional correlations among all rating types for each year were first calculated. A sub-sample of identical companies was then created, and years with data for fewer than 300 overlapping companies were omitted. Finally, pairwise correlation tests were conducted, and the mean of annual correlations was computed. Results are presented in Table 3.

Table 3: ESG Rating Correlations

It can be seen that total ESG scores for each provider are highly correlated with their own sub-scores, with correlations between 0.41 and 0.93 all significant at the 1% level. The governance pillar is generally the least-strongly correlated with the overall score and other pillars within each agency. Between agencies, ASSET4 and Bloomberg have the largest similarity across sub-scores and total ESG rating with significant positive correlations. KLD shows little relation to other providers, in particular ASSET4, where correlation of environment sub-scores is almost non-existent. Further correlation analysis shows these distinct results cannot be described by some affine transformation, leading to Dorfleitner et al concluding ESG ratings of different agencies largely do not coincide and are should not be used as a comparison tool by investors. 

Romero, Jeffers, Lin, Aquilino, and DeGaetano also examine three databases: RobecoSAM (now owned by S&P Global), Sustainalytics by Morningstar, and Bloomberg Sustainability.

Sustainalytics covers over 12000 firms across the majority of global indices, with data for over 4000 firms available for non-subscribers. Sustainalytics incorporates over 250 data points, aggregated into 20 material ESG issues. Issues are evaluated on the firm’s preparedness to manage ESG issues, the standard of a firm’s voluntary non-financial disclosures, the performance of the firm at meeting internal and external quantitative targets, and a qualitative assessment of firm involvement in controversy. Sustainalytics also engages directly with companies to enable greater data collection on ESG matters. A final “ESG Risk Rating” between 0 and 100 is then generated, where 100 represents the largest amount of risk. To allow for comparison, Romero et al adjusted this score to a number between 0 and 100 representing the firm’s percentile rank relative to its industry peers, where 100 represents the best score.

RobecoSAM (now S&P Global ESG Score) covers over 7300 firms representing 95% of global market capitalisation. Between 600 and 1000 data points are collected for each firm, with 80 to 120 industry specific questions also considered. This data is aggregated into 16 to 27 criteria scores depending on the industry, and these criteria then form sub-scores for each of the three ESG aspects. Analysis of the firm’s media coverage forms a multiplier to the criteria scores, where no negative coverage gives a multiplier of 1, which reduces as the number of negative stories increases. Aspect scores are normalised between 0 and 100 around the industry mean, and then an average of the firm’s aspect scores is taken for the final ESG rating.

Bloomberg Sustainability data was drawn from the same source as Dorfleitner et al, and an identical assumption to use ESG disclosure scores as a proxy for ESG ratings was adopted. Descriptive statistics of the dataset used by Romero et al paper are presented in Table 4.

Table 4: Descriptive statistics of Romero et al full sample

Correlation analysis of the three rating methodologies shows significant positive correlations between each agency. Additionally, differences in absolute value of ESG scores can be adequately described by an affine transformation, suggesting ratings from each agency are consistent and a suitable indicator of a firm’s ESG profile. Figures in support of these conclusions are provided by Romero et al, however, have not been included here due to poor image quality.

Romero et al further test to see if ESG disclosure is driven by the firm operating in an industry perceived to be harmful by the public, as the firm may look to legitimise their activities. The existence of such a relationship would invalidate the assumption adopted for Bloomberg data. After controlling for factors found in other literature to affect ESG disclosures, there remained a significant positive relationship between the Bloomberg ESG Total Disclosure Score and the ratings of both other providers. These findings uphold the assumption that good ESG disclosure practices are a indicator of better performance.

Across both studies, the authors identified dissimilarities in data collection and methodologies as the primary determinants of score differences. They do not aim to present any agency as giving superior ratings; however, it can be seen from the results that as the number of data points collected increases, the correlations between scores also increase. This suggests that if investors source firm rating information from an agency using a large number of data points, they can be sure it is relatively close to the consensus ESG rating of that firm.

It is important to consider that in calculating an ESG rating, the agency’s views on what constitutes socially responsible behaviour form the criteria on which a firm is judged. Greater similarity in results amongst agencies will largely be driven by standardisation of criteria, however, achieving this will be understandably difficult, and may in fact hinder individual investors. An investor choosing a provider may make this decision based on whether criteria considered by the agency align with their personal view of socially responsible corporate behaviour, instead of the provider’s similarity with other agencies. Standardisation of data collection and calculation methodology would eliminate this possibility by imposing a singular view of what constitutes ethical behaviour.

Socially responsible behaviour and its quantification into ESG ratings is inherently subjective. Whilst the work of ratings agencies undoubtedly removes a degree of information asymmetry about the firm’s activities, it is vital for investors to know the criteria considered in forming ratings to properly understand a firm’s ESG score. Whether standardisation of metrics would benefit investors is a topic meriting deeper discussion than has been presented in this article.

Do you use ESG ratings in your personal investing, and do you have a method for synthesising different ratings? If so we would be happy for you to share how. UQES Publications welcomes reader correspondence on our articles. If you wish to reply to points raised, or believe a perspective has been missed, feel free to send respectful responses to publications@uqes.com.au. We will endeavour to publish correspondence in subsequent articles. 

References

Dorfleitner, G., Halbritter, G., Nguyen, M. (2015). Measuring the level and risk of corporate responsibility – An empirical comparison of different ESG rating approaches. Journal of Asset Management. 16:7, pp. 450–466

Romero, S., Jeffers, A. E., Lin, B., Aquilino, F., and DeGaetano, L. (2018). Using ESG Ratings to Build a Sustainability Investing Strategy. The CPA Journal.

Pin It on Pinterest

Share This