Internationalization: how do you scale your data model?
Imagine a company that grants credit in France, but now wants to expand into other countries, particularly in Europe. It's not that simple! Even within the European Union, contexts vary widely. The company may be in for a nasty surprise, and be forced to rebuild its scoring model from scratch.
Far from being hypothetical, this situation was commonplace until recently. Designing a data model that applies to all countries is complicated, but not impossible. Especially since the availability of a powerful resource: Open Banking data.
At Algoan, we've been striving since day one to build an API-based Credit Scoring model, scalable internationally, based on this Open Banking data. How did we do it? Camille Charreaux, our Head of data science, lifts the curtain on our data choices to create a product that suits all geographies, without being a gas factory.
Accessible but disparate data
Do you remember your last credit application experience? If it was more than 5 years ago, it probably looked something like this:
"Traditionally, in France, credit institutions ask consumers applying for a loan to fill in an online form. 10-20 questions to answer, declaring their family situation, income, expenses, other outstanding loans, etc."
And if you've contacted several banks or credit agencies, you'll know: questionnaires vary from one to the next... as do credit decisions. (Imagine between different countries!).
This approach is not without its problems:
- The data is declarative. This means that errors, omissions and oversights can occur. There is often a discrepancy between income, which is overestimated, and expenses, which are underestimated. With a tendency not to declare other outstanding loans to optimize chances.
- The data is not financial. At least, not exclusively: much of it is demographic and socio-professional data (age, family situation, job category, etc.). However, when it comes to knowing whether a person will be able to repay their loan, financial data is the most reliable.
Whether in terms of data collection or data type, this credit scoring model is not optimal. Added to this is an additional layer of complexity: different operating procedures in different countries, with the presence of Credit Bureaus.
"It's an approach we're not familiar with in France, but which is common in many countries. Credit bureaus are agencies that gather information on the credit held by consumers. They provide this information to credit institutions, which then obtain a global view of a citizen's situation.
Data is no longer purely declarative, which solves part of the problem. However, others are emerging:
- The data collected varies from country to country, even with credit bureaus operating in several countries.
- Depending on the country, not all credits are recorded in credit bureaus, which creates disparities.
- The data collected is not as granular as banking data.
- Only citizens who have already taken out a loan are included in the databases. It can be difficult to obtain a credit score - and therefore access to credit - for a first-time applicant. In the United States, for example, without a FICO score, it is difficult to obtain a loan.
Even when credit institutions use credit bureaus, they have to adapt their scoring models to the data collected by each one.
The good news is that over the past few years, this approach has been evolving.
- First, with the appearance in the 2010s of aggregators collecting consumers' banking data. Using web scraping, this method involves collecting financial data to assess a person's potential to repay credit.
- An approach that lacked security until the entry into force of PSD2, the second European directive on payments, in 2019. This secures access to banking data, by requiring banks to set up secure APIs, with strict authentication mechanisms. This is what is known as "Open Banking". This has enabled aggregators to develop secure, reliable connections, providing universal, granular banking data.
This Open Banking data creates a solid basis for building scalable international models.
Open Banking: an undeniable opportunity for data processing
Why is Open Banking radically changing the way data is processed?
"Open Banking data comes in formats that are well known in the world of data science. They are numerical (transaction amounts) and textual. We have experience of the types of models and architectures that work on these categories of data."
Their very nature solves many of the problems encountered in traditional credit scoring:
- It's always the same data, in a similar format.
- The data source is unique and tamper-proof. It's no longer declarative: data is obtained directly from users' bank accounts.
In short, this data is representative of the financial situation of those applying for a loan. They provide a perfect basis for building a data model that can be reproduced internationally.
"Open Banking, far from being a niche, is profoundly changing the way credit is granted. At Algoan, we have chosen to work with several aggregators, which connect directly to banking APIs to collect data. In this way, we avoid developing our own connectors. We're concentrating on the development of our Credit Scoring API."
Open Banking is a paradigm shift for credit routing and opens up possibilities for developing scalable international products.
How do you scale a data-driven product?
Open Banking data is a powerful resource. The next step is to build data models that can scale with it.
"We knew from day one that we wanted to offer a global product. This is important because it was natively integrated into the construction of our data models: how we collect it, how we process it, etc."
Here's how we do it at Algoan:
Time 0 → design the functionalities:
The aim of this first phase is to identify the functionalities that are required, so as to know which data to collect. In our case, we knew the different stages leading up to the granting of credit. We therefore reviewed each stage of the granting decision to decide on the most relevant variables to select. These are the variables that enable us to establish the precise banking profile of consumers (income, volatility of expenditure, use of bank overdraft, incidents, etc.).
Step 1→ Build the skeleton of the generic algorithm:
The aim is to build a skeleton adapted to all contexts. This is manageable for banking data. Once this architecture has been defined, the bulk of the work is done on the data, its collection and labeling strategy. With Open Banking data, the pre-processing is similar in all countries (cleaning and simplifying the data to be injected into the algorithms).
The next step is to train the algorithms, which learn from the labels they are given. Their aim is to guess these labels on their own on future data.
Step 2→ Customize the product with specific data:
Once the basic, universal layer has been built up, we can turn our attention to more country-specific data (customs, social habits, lifestyle, etc.), and implement a complementary labeling strategy.
"To scale a data-driven product, having an international vision from the outset is almost unavoidable. It saves precious time later on, since the algorithms have been designed for it. So we need to think universal first, which will be a duplicable base for all countries, then add a layer of specialization, to respond precisely to the local context."
The good news is that in data, improvement is continuous. Algorithms are constantly being improved. The more data we have, the better we do. The better we do, the more data we have. This virtuous circle is only possible if the right data collection and scaling strategies are in place.
In credit, this is made easier thanks to Open Banking. It's a solid foundation for the entire value proposition developed by Algoan: improving access to credit. With Open Banking data, we're able to offer a product that works better, and works everywhere.
You may also be interested in
Un projet ? Une question ?
Vous souhaitez changer votre manière de prendre vos décisions de crédit ? Discutons-en !