Introduction to Role of Data Analytics in Anti-Corruption and Fraud

Bangkok (Thailand), 7 January 2021
-The rise in the use of information and communication technologies, accelerated by the pandemic, has both altered and amplified global corruption patterns. UNODC research shows that the Covid-19 crisis has resulted in increased risks of embezzlement, bribery, fraud, and has allowed forms of corruption-driven crime to flourish. In Southeast Asia, this is compounded by a lack of financial transparency and the relaxation of due diligence checks surrounding the sharp increase in emergency public spending in response to the pandemic. Even prior to the Covid-19 crisis, the region already incurred losses of an estimated $73.4 - $110.4 billion USD annually to transnational organized crime, much of it enabled by corruption (UNODC 2019).

Conversely, there is a growing consensus among governments globally on the role that technology can play in the fight and prevention of corruption. Crucially, it allows for the rapid analysis of vast tracts of data to identify potential instances of corruption, in areas such as public procurement, asset disclosures, tax records and financial allocations. On the law enforcement side, this can economize resources through smarter and more proactive investigative strategies, while reducing huge losses of public funds by improving the detection, investigation and analysis of corruption.

To build a picture of the utilization of data analytics in the prevention of corruption across Southeast Asia, UNODC has requested government and non-government perspectives on current practices, challenges and recommendations. The analysis below is based on a UNODC questionnaire, primarily compiled in the first two weeks of September 2020 with the cooperation of NGOs and anti-corruption, anti-money laundering and digital governance agencies across the region. To supplement this data, additional research was also carried out with government and civil society representatives on an ad hoc basis. The study covers Cambodia, Indonesia, Lao PDR, Malaysia, Myanmar, the Philippines, Thailand and Viet Nam.

Data Analytics in Southeast Asia: The Current State of Play

In each country surveyed, focal persons from only three Anti-Corruption Agencies (Indonesia, Malaysia and Thailand) confirmed using data analytics and having exposure to data analytics theory. The survey captured familiarity with data analytics among respondents from other government agencies in four countries (the Indonesian National Police, the National Audit Department of Malaysia, the Audit Commission of the Philippines and the Digital Government Development Agency of Thailand). Malaysia was unique in that all individual respondents from government agencies appeared to be familiar with data analytics.

The visualization below (Figure 1) shows the most common uses of data analytics among all government and civil society respondents, the most prominent of which related to operational oversight activities (e.g. public procurement, PEP data and financial statement auditing), strategic planning (e.g. developing Key Performance Indicators as part of the latest National Anti-Corruption Plan and research projects) and all-government approaches (e.g. innovation in government and open data initiatives).

Figure 1: Uses of Data Analytics (All Respondents, Consolidated)

On closer analysis, however, there was a discrepancy between responses from governmental and non-governmental organizations, with 55% of surveyed civil society organizations confirming familiarity with data analytics, compared to just 41% of government entities. On the NGO side, respondents generally stated that they had used open data sources and data mining tools to identify red flags in procurement and data on Politically Exposed Persons (PEPs). Although they did not report having data analytics capacity in place, the Economic Police of Viet Nam did disclose their ability to draw upon existing analysis from the banking sector. Indonesia’s Corruption Eradication Commission (KPK) specified the application of graph analysis for syndicate mapping, while the Audit Commission of the Philippines stated that they use data analytics for citizen participatory audit reports.

Structured Datasets

In studies of data analytics, there is a distinction to be made between structured and unstructured data. Structured data refers to information that is organized according to predefined categories, making it easily searchable in relational databases. Unstructured data, such as images and videos, do not have a pre-defined format and so are generally harder to collate, search through and analyze.

The availability of structured datasets for data analytics on payroll, procurement, tax records, asset/income declarations and financial allocations/budgets is laid out in Table 1 (below). Electronic access to such datasets was only confirmed by Anti-Corruption Agencies (ACAs) in Indonesia, Malaysia and Thailand. In Thailand, the National Anti-Corruption Commission (NACC) stated that although it does not currently have in-house datasets on payroll, tax records or budgets, it is able to “formally ask from partner agencies” when required. At the time of the survey, the availability of structured datasets for the government agencies responding from Cambodia, Lao PDR, Myanmar and Viet Nam appeared to be limited. 

Among other government agencies surveyed, access to structured datasets remained mixed. Of the three audit commissions in the sample (Cambodia, Malaysia and the Philippines), only the Philippines confirmed any availability, which was solely in the case of procurement data. With the exception of Indonesian National Police, the surveyed law enforcement authorities (the Republic of Viet Nam National Police and the Anti-Money Laundering Offices of the Lao PDR and Thailand respectively) did not appear to currently have access to structured data. Within all countries surveyed, access to structured data appeared to be quite inconsistent between agencies, suggesting the limited use of information sharing. Region-wide, it appeared that the datasets available to Anti-Corruption Agencies (ACAs) were not fully at the disposal of other government agencies. Meanwhile, while Thailand’s Digital Government Development Agency confirmed their ability to draw upon a repository of over 2,000 datasets, we did not collect evidence of specialized anti-corruption entities in Thailand being able to make habitual use of the datasets as part of their investigative work.

Finally, the availability of tools with which to conduct analysis varied greatly across the sample of government agencies. For instance, Indonesia’s Corruption Eradication Commission (KPK) reported having access to geo-spatial data, while the Audit Commission of Malaysia revealed that they conducted the analysis of specialized financial accounting data. The Government Procurement Policy Board (GPPB) of the Philippines, a multi-agency task force on public procurement, has confirmed the development of software for the analysis of procurement data and the identification of fraud and corruption – more about which is available here.

For each form of structured data, the proportion of surveyed NGOs reporting to have access to governments’ datasets was consistently low. Access to information in the region remains problematic and the penetration of open data initiatives is currently limited. Only the Philippines and Indonesia are members of the Open Government Partnership and have introduced clear commitments towards open data.  NGOs in Myanmar and Thailand noted that their structured datasets were often limited to internal data, due to difficulties in accessing government data.

Unstructured Datasets

For all entities surveyed, unstructured data for conducting analytics, such as communications technology, images, videos and social media information, appeared to be much more readily available than structured datasets. This suggests that efforts to fight corruption within the region are already making some use of communications technology, without yet systematizing data approaches. It is clear that governmental agencies are not yet leveraging the full potential of data analytics, instead choosing to conduct analysis on an ad hoc basis.

Within most of the countries surveyed, there was a high degree of variation among governmental agencies with regards to the availability of unstructured datasets. The most consistently high access to different forms of unstructured datasets was reported by Thailand, from which all government agencies in the sample confirmed access to at least three forms of unstructured datasets for the purposes of data analytics.

Overall, government agencies were most likely to confirm access to social media information, rather than images, videos or communications between individuals/organizations. This finding was particularly pronounced for law enforcement authorities in the sample, which may draw on social media information for investigative and evidentiary purposes. ACAs, meanwhile, were least likely to report having access to video-based datasets, possibly due to storage space limitations. NGOs appeared to be most likely to access image and video-based datasets, which may be indicative of a greater reliance on campaigning than governmental agencies.

Data Analytics Methods & Focal Areas

Among ACAs across Southeast Asia, the findings suggested that the methods of data analytics differed significantly (see Table 2, below). Indonesia, Malaysia and Thailand stood out in terms of capability across technical areas, with Cambodia, Myanmar and Viet Nam showing limited grasp of specific data analytics methodologies.

Social network analysis/predictive analytics and modelling data visualization/automated red flags appeared to be the skillsets with which Anti-Corruption Agencies (ACAs) in the region were most familiar. It is worth noting that, with the exception of Indonesia’s Corruption Eradication Commission (KPK) and Malaysia’s National Centre for Governance, Integrity and Anti-Corruption, no single agency reported applying the full range of data analytics methods as part of their anti-corruption work.

Just as with ACAs, among other government agencies, the level of data analytics application appeared to be very mixed. None of the audit commissions in the survey (from Cambodia, Malaysia and the Philippines) reported applying any of the analytics methodologies. The law enforcement authorities in the sample (the Indonesian National Police, the Economic Department of the Republic of Viet Nam National Police and the Anti-Money Laundering Offices (AMLO) of the Lao PDR and Thailand) were most likely to apply link analysis and social network analysis and the least likely to practice machine learning, possibly due to the different levels of technical complexity required by each method. 

Table 3 (below) looks at the types of activities for which the region’s Anti-Corruption Agencies (ACAs) most commonly utilize data analytics as part of their work. The data shows that ACAs were most likely to confirm the use of analytics on contracting and procurement work. It is possible that this reflects advances made in recent years in pioneering transparency and open access data on each of these subjects, as well as awareness on these topics as high-risk areas for corruption. Nevertheless, ACAs in only 3-4 countries reported utilizing data analytics in each of the areas specified.

The application of data analytics among the surveyed audit commissions appeared to be limited, with Cambodia’s National Audit Authority confirming its use only in procurement and contracting, the Philippines only on local level financial management and Malaysia not doing so in any of the areas specified in the survey. Neither of the Anti-Money Laundering Offices in the study (from the Lao PDR and Thailand) reported the use of data analytics.

NGOs that confirmed the use of data analytics (Open Development Cambodia, Indonesia Corruption Watch, Malaysia’s Sinar Project, Myanmar’s East-West Management Institute and Hivos Southeast Asia in the Philippines), were the most likely to do so with respect to contracting and procurement and the least likely for payroll and licensing. It is probable that the lack of available data is a key factor here, once again indicating the importance of an open data culture in government in order to sustain a strong multi-stakeholder approach to fighting corruption. In effort to navigate such a “data constrained environment”, the Sinar Project in Malaysia confirmed ongoing efforts to examine Politically-Exposed Persons using Beneficial Ownership data – an issue which is further discussed at a recent UNODC webinar, here.

Digitalization and Corruption Risk Typologies

The survey asked national agencies about the type of corruption risks they look at when applying data analytics. Respondents were invited to consider the following four risks: bribery, embezzlement, conflicts of interest and money laundering.

In much of the region, the consideration of risk typologies appeared to be inconsistent when applying data analytics. Only in Cambodia, Indonesia and Malaysia did ACAs confirm a focus on all four types of risk. In Indonesia, the Corruption Eradication Commission (KPK) also reported utilizing data analytics to examine collusion and beneficial ownership. Data analytics was reportedly applied by the fewest ACAs with respect to money laundering, possible due to such tasks instead being relegated to Anti-Money Laundering Units.

Among other government agencies, including audit commissions and anti-money laundering units, there was likewise an inconsistent focus on corruption-based risk typologies. The surveyed police agencies, in Indonesia and Viet Nam, was the only agency type to report focusing on all forms of risk identified in the survey. NGOs in the region were most likely to confirm looking at conflicts of interest and embezzlement.

Regional Challenges: Difficulties Affecting the Rollout of Data Analytics

In seeking to integrate data analytics into efforts to prevent corruption, national agencies may encounter a wide array of difficulties. The survey asked respondents about their concerns with regards to five broad types of interrelated challenges: policy  restrictive legal framework or lack of organizational policies), infrastructural (deficient software or inefficient data recording, processing and analysis systems), analytical (lack of electronic, structured, compatible or publicly accessible datasets), coordination (organizations that collect data not cooperating) and capacity (staff unfamiliar with data analytics).

Table 5 (below) summarizes responses for Anti-Corruption Agencies (ACAs) by country. The data suggests that “legal frameworks that restrict access to datasets” is the issue on which the most progress is currently being made. Meanwhile, the table shows three areas of key concern, on challenges which are generally not yet in the process of being addressed. First and foremost, all responding ACAs identified a lack of cooperation among organizations that collect data as problematic, with only two ACAs (in Indonesia and Thailand) suggesting that solutions are being sought. Secondly, the majority of responding ACAs identified the challenge of datasets not being in an electronic form, a problem which only the ACA in Thailand reported working to resolve. Thirdly, most respondents also cited the unstructured or non-compatible nature of datasets as an obstacle – possibly a cause or result of limited inter-agency cooperation. This was also identified as a major issue in interviews with staff from the State Inspectorate and Anti-Corruption Authority (SIAA), the primary ACA of the Lao PDR. It is worth noting that by contrast, the public accessibility of data was not regarded as being as severe a challenge by ACAs. However, this view is not necessarily shared by other government organizations or NGOs, as seen below.

In contrast, a greater proportion of other government agencies – regardless of agency type – identified the public accessibility of datasets as a significant challenge. One possible explanation for this is that ACAs are more likely to have internally developed anti-corruption datasets, leaving other agencies more reliant on publicly available datasets. Once again, the lack of compatible, structured and electronically formatted datasets and the lack of cooperation was highlighted as problematic by the majority of surveyed government agencies in every country. Audit commissions (surveyed in Cambodia, Malaysia and the Philippines) were most likely to state that their organization had “no overall policy for the development and utilization of data analytics techniques”, which may necessitate analysis being carried out using information already in the public realm. The need to create partnerships with other organizations in order to mitigate the lack of available data was identified by respondents from the audit commission in the Philippines and several NGOs.

In contrast with government agencies, NGOs identified the lack of familiarity with data analytics among staff as their most significant challenge. The reliance of NGOs on readily available datasets was also apparent, through the prioritization of existing data in electronic, structured and compatible formats. In comparison with government agencies, NGOs were far less likely to regard organizational policies on data analytics as a challenge – an issue identified in fact by only one NGO, in Cambodia. It is likely that this is due to the working approach of smaller organizations, often able to operate with fewer policy procedures than government entities. As one NGO in Malaysia explained, rather than seeking to emphasize legal frameworks or internal policies, a priority for civil society is often to “increase institutional capacity and find other ways to acquire data” in what is commonly a very data sparse landscape.

A short summary of key challenges within the anti-corruption context of each country is as follows:

In Cambodia, all respondents acknowledged challenges related to the lack of electronic data in a structured and compatible format, as well as the deficiency of software and infrastructure. Civil society respondents were more likely than government counterparts to cite the lack of staff familiarity with data analytics, inefficient data systems and a lack of publicly available datasets as significant obstacles. Levels of data sharing between organizations was deemed problematic by the majority of respondents.

In Indonesia, all government respondents acknowledged that the full array of challenges flagged in the survey were problematic to some degree. The two issues that appear to be deemed the most serious by both government and civil society practitioners relate to the need for structured and compatible datasets and inter-agency cooperation. Respondents did not identify infrastructural challenges as being as urgent, although it was acknowledged that some improvements would be beneficial.

In the Lao PDR, government-led efforts have been underway to gather anti-corruption data and convert them into electronic formats. However, such formats are typically not machine readable, placing an undue burden on human resources when it comes to the analysis of data. A representative of the State Inspectorate and Anti-Corruption Authority (SIAA) stressed that datasets are typically “not unified, not comprehensive and time consuming” to analyze, which as of yet has impeded the rollout of data analytics initiatives.

As for Malaysia, the majority of respondents identified the absence of publicly accessible data in particular as a significant challenge. Against this backdrop, one civil society expert noted that a key approach to mitigate this problem was to build organizational capacities so that analysts could be more resourceful in seeking out and utilizing viable data. Other concerns, shared unanimously by government and civil society organizations to some degree, included legal/policy restraints, information sharing and the need for staff to be more familiar with data analytics.

In Myanmar, the most widely recognized challenges related to staff capacity, software and infrastructural limitations. It is likely that, because data analytics have yet to be fully utilized in various contexts, a number of other challenges (e.g. inter-agency cooperation, dataset formatting, open data) were recognized but not deemed to be as high a priority at this stage. One senior government interviewee noted that rather than being a matter of specific technical knowledge, there is a need to adopt a broader culture of transparency and open data in order to enable innovative anti-corruption approaches to thrive.

The Philippines showed a clear consensus among governmental and civil society organizations as to the types of challenges faced. The four most significant challenges identified by all respondents related to inter-organizational cooperation, infrastructure, dataset formatting and publicly available data. Both government and civil society respondents cited ongoing efforts to build partnerships as a way to overcome shortcomings with regards to the availability of viable data.

For Thailand, stakeholders acknowledged a broad array of challenges, including those concerning the legal framework, data sharing, dataset formatting and infrastructure. On the civil society side, the most pressing needs shared related to staff capacity and data systems. Government interviewees cited the importance of partnerships to help “fill the gaps” as a means of enhancing capacities and compensating for gaps in technical expertise.

In Viet Nam, the three most significant challenges identified concerned dataset not being available in electronic formats, the need for greater cooperation among organizations that collect data and software/infrastructure quality. Other concerns included legal and policy restraints, public access to data and staff familiarity with data analytics.

Next Steps: Request for Support on the Use of Data Analytics to Prevent Corruption

UNODC training in Cambodia on financial investigations into corruption, 14 October 2020

The previous section established that, despite a broad variation in the degree of implementation, many civil society and government agencies encounter common challenges in harnessing the potential of data analytics to prevent corruption and fraud. When asking about support from UNODC, the survey found a high degree of interest across the region to take steps to improve digitalization practices:

Conclusion / Summary: Regional Prospects for the Digitalization of Anti-Corruption Approaches

While progress has been made in recent years, the uptake of data analytics to fight corruption and fraud remains limited. Of those surveyed, only three Anti-Corruption Agencies (ACAs) reported being familiar with data analytics. Likewise, ACAs in only two to three countries reported having any access to structured datasets relating to payroll, tax records, assets/income declarations or financial allocations/budget. By contrast, unstructured datasets appeared to be much more heavily utilized, particularly in the case of social media, from which ACAs in five countries reported drawing information. This suggests that, while there is some awareness of basic concepts and techniques, data analytics methods have yet to be applied at the institutional level to data that is drawn in a systematized manner. It is worth noting that a reliance on unstructured data can leave institutional outcomes highly contingent on the capacities and networks of individual officers, rather than on a more organization-wide basis. Accordingly, more emphasis needs to be placed on access to structured datasets, to automate preparatory elements of data analysis and to render it less reliant on individual efforts.

The use of data analytics to prevent corruption and fraud appears to be selectively applied across governmental activities. Regionally, government agencies surveyed were most likely to report utilizing analytics on procurement and contracting, with the least digitalization taking place around licensing and payrolls. The prioritization of contracting and procurement is likely due to the fact that, as highly lucrative forms of public expenditure, they account for a vast quantity of known corruption, both globally and within Southeast Asia.

The unequal application of data analytics across anti-corruption efforts in the region is evident in many elements of the study. For example, the data showed that only half of ACAs in Southeast Asia tend to consider bribery, money laundering or conflict of interest risk typologies when applying data analytics. Meanwhile, NGOs were more likely to examine conflicts of interest and embezzlement than bribery and money laundering. In part, this may reflect a perceived culture of impunity with respect to bribery in contexts throughout the region, and a belief that there may be more impact through a focus on the first two risk typologies.

Perhaps the most significant issue outlined in this study is the need to promote cooperation among organizations that collect data on anti-corruption issues, including across government agencies, civil society and the business sector. As the study makes clear, ACAs across Southeast Asia identify cooperation as the number one challenge when it comes to digitalizing the response to corruption. Very often, metrics showing the amount of data gathered by ACAs is lower for other government agencies, and in turn even less accessible to NGOs, demonstrating how an open data culture can have a powerful effect on the health of the anti-corruption landscape at large.

For data sharing to take place meaningfully, particularly in the context of data analytics, some degree of standardization is necessary. The need for structured, compatible and electronic data was identified as the foremost challenge by the vast majority of government agencies and NGOs in the survey. Initiatives to enhance the compatibility of anti-corruption data include the Beneficial Ownership Data Standard (BODS) developed by Open Ownership and the Open Contracting Data Standard by the Open Contracting Partnership (OCP), who seek to promote frictionless data sharing through compatible formats and standards.

The operationalization of data analytics will require the capacity building of government and civil society personnel over time, particularly as technology advances. This will need to be complemented by a strong legal framework, internal policies that enable the ethical and effective use of data analytics and effective software, infrastructure and up-to-date systems for the recording, processing and analysis of data. Crucially, this will require not only the integration of technical approaches, but the adoption of a collaborative open data culture. As one senior Anti-Corruption Agency representative stated as part of the study, organizations will need to overcome a “persistent lack of trust and secrecy” in order to “pave the way for more innovative approaches to data analytics”.

Ultimately, a truly robust anti-corruption approach will require the even uptake and application of innovative tools across national and regional contexts. Ways to address this could include inter-sectoral information exchanges, e.g. drawing upon the notable success of Southeast Asia’s AI-powered start-up scene, interagency mentoring and digital strategies coordinated by technologically integrated agencies. Alongside partners, UNODC will continue to organize bespoke trainings, advocacy, webinars and research, to promote the ethical use of data analytics among governments and civil society organizations across Southeast Asia.

Related Links: