Connecting data silos with the World Bank
Thinking Machines built OnTrackPH, a proof-of-concept software for matching records across data silos in a fast, cost-effective, and scalable way.
An international development agency which fights poverty by providing funding and technical assistance to middle-income and low-income countries.
Structured Information Extraction
Word Count Vectorization
Tracking and auditing government spending in the Philippines has been a time-consuming, laborious process with project-level data fragmented into data silos across agencies from budgeting until implementation.
With millions of unmatched project records, automation of data matching can lead to millions in cost savings from reduced manpower needs and averted corruption, as well as strengthened public trust through better government transparency and accountability.
Connecting the Dots for Transparency & Accountability
The World Bank, as part of their Open Government Data initiative, sought to assist the Philippine Government by partnering with Thinking Machines to use matching algorithms to create a fast, cost-effective, and scalable way to track the budget.
Thinking Machines developed OnTrackPH, an algorithm which leverages the same techniques used by technology companies to process and analyze big data. It uses a multi-step “sieve approach”—starting with finding the most precise matches, then scoring the more ambiguous matches with bag-of-words vectorization and cosine scoring—together with natural language processing to boost match accuracy.
Using these methods, OnTrackPH accurately matched data in a fraction of the man-hours needed to do so manually. It took 6 months for 5 World Bank research analysts to correctly match 2,268 records across three government databases. During its pilot test, OnTrackPH correctly matched 85% of the 2,268 records in 15 minutes.
Enabling Fiscal Auditing for Government and CSOs
To demonstrate a possible use case for the matching algorithm, Thinking Machines has built a small web demo around the OnTrackPH engine to fully demonstrate its power in a limited way. OnTrackPH.com shows the benefits of a dashboard where spending can be tracked from budgeting to infrastructure maintenance.
In the coming months, the Thinking Machines team will explore developing OnTrackPH into a tool that government, civil society organizations, journalists, researchers, and the general public can use to monitor not only road projects, but other areas of public spending. The project is a great example of how entity-matching algorithms used in the tech sector for contact management, and user matching can be applied quickly and effectively for good governance.