Econometrics in the humanitarian sector

December 18, 2018

Share this article:

In different sectors, companies reckon more than ever the econometrician is a valued employee. Firms striving for innovation and a data driven approach nowadays, will hire a data analyst at some point, in order to derive significant outcomes. Through research, solutions yield more efficient, faster and less costly improvements. The financial sector, strategy consultancy and actuary invest in such way, the company knows an entire team of data analysts. But did you ever wonder how an econometrician can do his work in the humanitarian sector and at which companies this is possible?
A different path
Thomas Plaatsman actually did so. After finishing high school, Thomas choose a combination between his technical profile and interest in economics. Starting his bachelor econometrics in Rotterdam he was able to improve his analytical skills and acknowledge more about the economic fields as well. Even though this contained a lot of interesting mathematical content, he was missing a broader variety of subjects, such as economics, business and languages. Not sure which field and work would suit him most, Thomas decided to have a gap year including an exchange to Barcelona for his minor, an internship and participation in a voluntary work program in The Philippines. During this period in The Philippines there were several social start up projects and in Barcelona as well he realized there are several interesting options besides the commonly taken route. He therefore wanted to work in an organization that provides beneficial social or environmental impact alongside a financial return.
After his bachelor and gap year Thomas continued studying econometrics through the master Business Analytics and Quantitative Marketing. Naturally, at that time he had to write his thesis. Having the feeling a lot of students took a pre-determined subject for granted and lacked motivation for their thesis Thomas decided to act differently. Therefore Thomas approached over fifty NGOs (nongovernmental organisations) with the question whether there was an interesting project for him, enabling him to write his thesis. It was the Red Cross that came along with a concrete opportunity quite fast. Maarten van der Veen, the initiator of 510, who saw need for better use of data in the humanitarian sector, informed Thomas about this team and his vision. 510 global is an initiative of the Red Cross. Its mission is to shape the future of humanitarian aid by converting data into understanding, and put it in the hands of humanitarian relief workers, decision makers and people affected, so that they can better prepare for and cope with disasters and crises. 510 refers to the total surface area of the earth in million square kilometres. Agreeing on writing his thesis at 510 global Thomas’ university supervisor understood this thesis did not include an enormous dataset in advance and a clear research question.
Priority-Based Humanitarian Aid Modelling for Flood Impact in Malawi
Thomas decided to investigate flooding in Malawi. Data had been gathered with the help of other volunteers, online secondary data and contact with other NGOs. It was a challenging thesis subject because of the lack of data. In the Netherlands we’re used to institutions like CBS that have a lot of information, but not every country has the same institutional level when it comes to the collection and use of data. This was 2 years ago, so students who wrote their thesis with 510 currently can build on the experience and information available within 510. This makes it easier, but there’s still a big difference between working with data from big corporations and governmental institutions in developing countries.  
The team lead was part of the shelter cluster in Malawi during the 2015 flood and knew the challenges on the ground. It was of great interest to research this operation and especially the level of aid-neediness in these areas to improve future disaster responses. Focusing on people living in flood-prone areas, Thomas and his team established three levels of vulnerability. The data to populate the PIM (Priority Index Models) still needed to be collected for Malawi, so the team started searching for baseline and flood related data for the models. The intention was to use four different machine learning techniques, a combination of two decision trees and two random forests to create accurate forecasts on vulnerability. This became the basis for his thesis. In many developing countries it is challenging to gather a large amount of significant and valuable data. With this in mind, there was a bigger focus on different (new) methods and not only on the results.
The team collected geographical, infrastructural and socio-demographics for baseline data. These data sets were combined with event-specific variables, such as the amount of rainfall and the percentage of the area flooded. Thomas engaged the most important stakeholders, such as Malawian Governmental Institutions and the humanitarian sector to better understand the process of data collection and sharing. Using the 2015 flood as training data, four different machine learning techniques (CART Decision Tree, Conditional Inference Tree, CART Random Forest & Conditional Forest) were used to predict the amount of help needed by each Traditional Authority. Every country is divided into multiple administrative areas, such as regions, subregions, districts etc. Malawi consist of 32 Districts, that are divided into 367 Traditional Authorities.
In the thesis the most affected areas through flooding had been forecasted. This was done with help of information of several social demographical factors, among others whether a building was build from stone or mud and cane. Besides this specific factor, a lot of time had been put in finding the most influencing indicators. The different variables indicating the vulnerability of each area were used to forecast the amount of affected people during a flood. Because exact numbers on the amount of victims weren’t available, they combined data about internally displaced persons (IDPs) and different shelter aid received to categorize the dependent variable. The main goal of forecasting the level of aid needed is to be more effective in providing aid to the most vulnerable communities.
In researching this topic, Thomas found that the Random Forest model performs best in terms of accurate predictions, albeit 64%. The two most important variables in this , research were 1) The percentage of an area that is flooded, and that for every area; 2) The percentage of drinking water coming from natural sources. A flood map can be retrieved from satellite images right after or during the disaster and intuitively gives an idea of the impact on a certain area (although it does depend on other factors too, such as the elevation and number of people living there). The percentage of drinking water coming from natural sources may be a proxy indicator for poverty. Although this indicator has been selected as one of the most important in this research, more data is needed to substantiate these facts.
In order to develop this research further, a several measures had to be taken. First of all, the model uses one data set (from one flood.) This needs to be increased before practical use. The predictions are too uncertain to be used in the field yet, but by collecting more data we could significantly improve their accuracy and reliability. In order to better understand the practical implications of the research, conversations with Red Cross staff & field trips were conducted to determine its uses in the field. Here the team learnt more about their current EWS and river gauges.
The main points mentioned were:
1) The model needs to be enhanced with other information to become more useful. For example creating a map to show the prediction, but include information on; poverty and the amount of people living in an area indicating women and children (over-saturating the map, would make the information overwhelming and therefore lose some of it efficiency and usefulness).
2) These forecasts are very sector specific. Some areas may need a lot of help with medical support (health) and food assistance (if they lose their livelihoods). Being clearer on how the model was created and defining the scope.
A good vehicle for successfully implementing the use of these models in the field would be SIMS  (Surge Information Management Support). Using SIMS would ensure the right information is disseminated on the ground and potentially allow for faster forecasting (preferably within 1-3 days). As models are never 100% accurate it is important to enrich them by involving local knowledge from the local villages, Malawian Red Cross and other NGOs. Currently elements of PIM have already been implemented in humanitarian aid efforts, and 510 global continues to enhance its capabilities in spite of the challenges. The steps taken for this thesis in Malawi were just the beginning. More information about the following projects can be found here
At Global 510 Thomas has done a variety of projects now including one in Peru and Ecuador where used vulnerability and baseline data to provide help before a disaster has taken place (based on the forecast) and one in Kenya for the food crisis taking place in Africa. All these applications of data on the humanitarian sector yield extremely useful outcomes. In the future Thomas still wants to contribute valuable information to the world with his knowledge of econometrics. Want to learn more about all the possibilities for yourself? Go to the website of 510 global. 


This article has been written by: Anne Dumoulin.

Read more

Bayesian Statistical Inference

Bayesian Statistical Inference

Statistical inference is the use of data analysis to say something about the probability distribution of the underlying data. A very common tool to say something about the likely distribution of data is the method of maximum likelihood. Here we make an assumption of...