Using Geospatial AI to Support Infrastructure Rollout for the Philippines’ Largest Telco
We worked with Globe Telecom to build an Artificial Intelligence model that detects the location and measures the socioeconomic class (SEC) of households across the entire Philippines. Using a combination of satellite imagery, telco usage data, and external location information, we were able to generate SEC classification estimates for each populated 50m x 50m tile in the Philippines.
This is now being used as a key data source to make high-CAPEX decisions on where to build service infrastructure for Globe. This will enable them to maximize the number of people who will benefit from their service.
Being the Philippines’ largest telecommunications company, our client Globe Telecom faced an actual billion dollar question of where they should build new Mobile and Broadband infrastructure. Not only does this decision directly impact the quality of the service that they can provide to their customers, but it also results in millions of dollars of investment. Where should they focus their efforts so that they’re able to maximize ROI on these investments?
For this endeavor, Globe looked to data. With no recent census data, reliable house registry, or even granular information on income in the Philippines, Globe had to resort to manually tagging and classifying houses’ socioeconomic status using Google Earth. This process was extremely labor-intensive. At their current rate, they estimated that it would take 45 full-time employees 6-9 years to finish tagging all households in the Philippines.
Needless to say, they needed a more scalable and efficient way to answer their question.
Our task was plain and simple - how could we automate and speed up the tagging process being done by Globe’s team? We broke down the task into two problems - detecting the location of all of the houses in the Philippines and identifying the socioeconomic status of each one.
To deal with both problems, we needed three key datasets:
Satellite Imagery: Using satellite imagery, we automatically detected where households are by applying key computer vision techniques. In addition to this, we used visual features such as house sizes and roof quality to assess the socioeconomic status of an area.
Globe Information: Globe’s existing dataset of manually-tagged houses with SEC was used as the ground truth data for training the model.
External Geospatial Datasets: This refers to external location information such as the presence of malls, supermarkets, schools, etc., as well as aggregated Globe user information such as average revenue and usage in specific areas. This was also used to determine the socioeconomic status of an area.
We divided the entire Philippines into a 50m x 50m tiled grid - resulting in 152 million tiles for the entire country - and for each tile, we collected the satellite image and the relevant datasets mentioned above.
Detecting locations of houses
To answer the first problem, we trained a deep learning computer vision model to detect whether a tile contained a house or not. This basically meant showing our model hundreds of thousands of examples of different kinds of houses from satellite imagery until it learned to associate certain patterns in the image with a house. We made sure to tune it with a wide variety of houses in different kinds of areas to ensure that the model works properly regardless of the scenario. Even though we could detect the actual location and shape of the houses, we reduced the output to a binary one (house / no house) due to privacy considerations.
Identifying socio-economic status
To answer the second question, we used the satellite imagery again to extract specific features about each tile. How many houses are in this picture? How big are the roofs? How much built-up area is in the picture? We also aggregated the geospatial datasets in each tile to create features such as the presence of a mall or an ATM, the average mobile data usage, or the percentage of 4G users. Using all of these features, we then trained a Machine Learning model to predict the average socioeconomic status of each tile. What’s fascinating is that we’re able to extract how important certain features are in predicting the model as shown below:
After all that work, we were able to successfully develop a model that predicts the socioeconomic status of a 50m x 50m tile in the Philippines with an F1-Score of 0.73. This number represents the reliability with which we are able to measure correctly the socioeconomic status of each household in the Philippines. That is, if there are 100 SEC AB houses in one area, our model is able to correctly identify 73 of the 100.
This is very important as this allows our client to maximize the utilization of their infrastructure by matching the products they deploy with the predicted purchasing power of that area. For example, areas that are identified as socioeconomic class AB will likely have higher subscription/utilization rates of Fiber and 5G.
But that’s just half the work.
We then had to take both models and roll it out to all 152 million tiles that we had for the Philippines. For context, this corresponds to around 15 TB of satellite imagery for which we had to run two models -- not a trivial task. At that point, if we were to run the models using the same method we had used to develop them, it would’ve taken us three months to get the output for everything.
To handle this, we used Google Cloud Platform’s Dataflow, a service that automatically parallelizes and scales any data processing that you may require. This allowed us to easily use up to 200 virtual machines -- with each machine loading the image, running the models, and saving the output. As a result, we finished running the models for the entire Philippines in just two weeks.
Detecting houses and estimating wealth in the entire Philippines, something that would’ve initially taken 45 people 6-9 years to do manually, is something we are now able to do in just two weeks. Now that the workflow is set up, we can easily rerun the analysis to update and regenerate the models as frequently as the client needs it.
Using the output, they are now able to make strategic decisions on what services best fit specific areas to maximize ROI on their infrastructure. With our up-to-date and granular data, they’re able to prioritize and allocate resources using a data-driven approach.