How L&T Finance’s Framework Enhanced Projections on Loan Default Probabilities

India’s financial services sector is becoming increasingly data-rich, with lenders leveraging internal customer records alongside external datasets to refine credit underwriting. However, the translation of this data into efficient credit outcomes remains uneven. As noted in PwC’s analysis of India’s retail lending landscape, lenders are dealing with a growing range of data inputs and customer segments, complicating risk assessment rather than simplifying it.

This gap, increasingly framed as the challenge of return on data investment (RODI), is where initiatives such as L&T Finance’s Project Cyclops seek to intervene. By embedding data directly into underwriting decision engines rather than treating it as a passive input.

Need for RODI Framework

Lenders have access to base data of customers who have previously borrowed. The credit bureau has their records along with the repayment history. However, with regards to the first-time borrowers, no such data is available.

Even for people who have credit bureau records, it is a backward-looking indicator. On the basis of historical information, a decision has to be made so that the probability of default is low. The question is, are there other sources of information about the applicants that will give a sharper picture of their probability of default. For somebody who would eventually default, does it help to increase the estimate of the default probability. For others who would eventually not default, does it decrease the estimate of the default probability.

It makes the underwriters more confident, if somebody's probability of default is above a certain threshold, then their applications can be rejected citing they are too risky. If it turns out,more of the declined applicants are defaulting, it effectively translates to potential reduction in credit losses. This is the return from using additional sources of data.

The additional source of data is, however, not free. It may have been acquired from various credit bureaus, bank account aggregators, payment aggregators, maybe it is a source of location information giving affluence scores based on the locality, or even macroeconomic data. There can be many different possible sources of data. Whatever was the cost to acquire that data (acquisition, processing, storing, cleaning, etc.) is the lender’s investment. Companies have to extract returns on the acquired data. The return is the numerator with investment as denominator; hence RODI—Return on Data Investment.

Changing The ‘IT as Cost Centre’ Narrative

Companies can go wrong in their data management journey in a couple of ways. Either the investments on internal data for better credit decisioning is not giving adequate results or companies do the heavy lifting of storing external data acquired from various vendors but in the absence of organised data cleansing exercises, they are unable to use the data to make better decisions. Unless the data is stored for the purpose of compliance, it is just a cost centre.

“There may be a legitimate need for investing more in data acquisition or storage and processing for making better credit decisions. If the RODI is significantly higher, then although the data investments may seem higher, the returns may be proportionate or even higher. This is especially useful because historically, not just data, but the narrative has been that IT organization is a cost center,” says Dr. Debarag Banerjee, Chief AI & Data Officer, L&T Finance.

However the RODI framework proposes that even the internal data within companies, if it is stored, processed and put in the right kind of machine learning models can be used for credit decisioning use cases. If the returns from these investments are 'X' more, compared to the investments, then IT will no longer be perceived as a cost center. “Moreover the narrative now will be what can be done more and what is the future potential to get the best from the data investments,” Banerjee says.

The opposite is also true. Many organizations which acquire data from external sources or store irrelevant and old data just don't do their ‘spring cleaning’, ending up with a lot of junk eating into the expensive storage capacities. If it doesn't help in making decisions to get proportionate returns, then throwing it away is considered as a reasonable option unless being done for record keeping or compliance reasons.

“So, both decisions can actually be driven, but the way to drive that decision is to do the math: is the cost justified by the return? That's the whole point of this framework,” Banerjee explains.

How L&T Finance Leveraged the RODI Framework

L&T Finance has adopted a methodical structure in devising the company’s credit decisioning models.

“As more and more sources of data are ensembled, we measure them in terms of the ‘lift’ each data source is giving us, both by themselves as well as in the ensemble total. This runs parallel as we are building the models,helping us to decide what data sources to use for which kind of decisions, and then what level of confidence we can get as a result of driving various credit parameters up or down, such as credit pricing, approved loan amounts, etc,” says Banerjee.

As the models operate in the real world, they are reviewed at regular intervals, in terms of are they getting the company the desired returns, and is it possible to get the targeted RODI successively over the years, either by acquiring new sources of data or more uncorrelated sources.

Another aspect to this is the dilemma of pricing the value of data, especially when it is coming from a less conventional source. How do companies decide the right price for a particular dataset?

“When we run these ‘data room’ exercises, it gives us a very good understanding of what is a wise amount of price to pay for that data. This helps us in being better buyers of data, because number one, you don't want to overpay, and number two, if you find a truly high RODI source of data, you also want to make sure that you really acquire it,” says Banerjee.

Project Cyclops

L&T Finance has designed and implemented the multi-data underwriting engine called ‘Project Cyclops’, live for more than a year now. “We started with our two-wheeler portfolio with three additional sources of data beyond traditional bureaus. Since then, we have extended Cyclops to small business loans, to farm tractor loans, and are increasing the footprint of it in personal loans,” says Banerjee.

The company has run A/B tests (also known as split testing) of underwriting without all these sources of data (the old methods) versus with Cyclops. “We have seen a significant evolving difference in the early default patterns,” Banerjee says.

The other part of project Cyclops is, where these models are anyway making extremely accurate decisions, “We no longer have a need for a human underwriter to be present when deciding on a personal loan or a two-wheeler loan. As the delay of a human reading through pages of documents, and machines is eliminated, underwriting can be automated at scale and speed. The Cyclops underwriting engine in the background calls all of the APIs, runs about 24 or so scorecards (each containing a machine learning model), and comes up with the decision on approval/rejection, loan amount, and pricing—all within few tens or hundreds of milliseconds, and almost always less than four seconds. That is instantaneous for any practical purpose.”

This fundamentally changes the entire experience of getting a loan from L&T Finance.

Future Roadmap

Project Cyclops is a live project for many reasons. At the outset, it is trying to model credit behavior, which is never static. Following new developments in the relevant areas of the economy, we need to keep changing our models in order to truly capture the real world.

Secondly, this journey to external data has only started. “As we look at newer and newer forms of data—maybe public data, data on legal actions, weather patterns, satellite data—and as newer techniques are evolving, including reasoning model-based AI that can make sense of the unstructured data in the world, the combination of those two (making updates to the model and bringing in newer sources of data) will continue to evolve,” says Banerjee.

To make the project more sustainable, “We have built ‘Project Nostradamus’. It is a portfolio monitoring engine. Every month, it looks at the behaviour of each customer and their various data footprints to understand both at an individual level and at a cluster level where are the emerging risks evolving, and also where is it that we should lend, which kind of borrowers and what kind of products where we are under-indexed. All of that would feed back into the system. Cyclops essentially will be a closed-loop, continuously evolving machine learning system,” he adds.