Summary – In the financial sector, most new customer-facing initiatives such as new products like green loans or MacBook insurance cover, campaigns such as informing customers of their rights or obligations based on their account balance, or services such as annual tax statements or spending summary – involve data being made available in the correct format.
Wherever data is needed for a new purpose, a change to an existing data model or an entirely new data model may be needed.
Your data model will drive the build and test phases and will remain as the foundation of the way your data flow works. Therefore, it needs to be designed to accommodate changes and scrutiny long into the future. To help you achieve this, in this article we explain some principles from diverse use cases in the financial sector so you can borrow from our experience.
Introduction
Internal initiatives also rely on data such as new criteria for decision making, which could involve a score of how environmentally friendly borrowers in a particular industry are, or how many products a customer holds with your institution based on the value and longevity of the products rather than just the count of the products held. These also will involve new datasets being created from existing sources.
A data model is a diagram defining how data is organised in a storage system, including the relationships between data elements. In many cases, we also include the detailed logic on how the data elements are sourced when we say, ‘data model.’ This is because a diagram is essentially a schematic which is useful for conversations but does not have the detail needed for implementation.
The use cases that inform these principles are:
1. Insurance pricing.
2. Bank regulatory capital.
3. Marketing communications.
I found these principles more or less of equal importance in all three use cases.
Principle 1: Use a generic structure.
We designed insurance pricing rate tables by postcode, CRESTA, and several tables joining those location tables to bushfire risk, flood risk, theft, and storm. Street address and even the level of the building an apartment is in gives more granular information for flood risk for some more modern flood risk datasets.
Naturally for each customer we had their postcode which was used to look up their CRESTA.
New data sources on the various risk categories became available from time-to-time, so we needed to design the data model to enable new data sources to be adopted without having to change the base tables or the data flow.
For regulatory capital, we had the loan in force, and the collateral, whether it was the property, or a guarantor and the guarantor’s property, or several of these. We also had the value of the property when the loan started, which we used to calculate the LVR.
The source data tables were simply the loans in force, and the property valuations which were purchased from an external vendor and fed in daily.
The regulations will change in the future in ways that we cannot predict, because banking regulation needs to respond to unexpected changes in risk, banking products and customer trends. The data model needs to allow the bank to be able to update the calculations in the future, and even add new data sources, without having to change the basic framework including the base tables and the data flow.
For marketing communications, we have the list of customers, the characteristics of each customer that are used to determine whether the customer is contacted for a particular campaign and, if contacted, the content of the communication to the customer known as the ‘creative.’ Many communications are based on the kind of account a customer has. So, if a customer has a term deposit and a mortgage, then they may be eligible for a particular term deposit campaign and a mortgage campaign. There is a table with all active term deposit accounts and all active mortgage accounts. These can be joined to the customer table. There is a customer-account relationship table sitting between the customer table and each account-type table to enable to many-to-many relationship.
New campaigns are being devised continually for new products, new marketing initiatives, and new and changed regulations. The data model needs to be able to accommodate new requirements within the basic framework and data flow.
A generic data model allows new columns to be added to existing tables, and only rarely create an entirely new table.
Principle 2: Start with the best source available.

All the above use cases are subject to audit and even regulatory scrutiny. Regulatory scrutiny may not be on a regular cycle but if it happened, we need to be able to stand behind our decisions even in front of the media, including social media, to explain the purpose of the data sources, the data flow, and data model.
If our source is wrong or even questionable, then it could mean customers were being mistreated and suffering negative impact, resulting in legal and reputational consequences for the financial institution.
If the insurance risk data is not up-to-date or based on outdated technologies, such as methodology for flood risk mapping, then some customers may be overpaying for flood cover, some underpaying, and the insurer exposed to risk in a way that is inconsistent with the insurer’s pricing. Therefore, we need to regularly vet our data suppliers and check for what other data sources are available, as well as what our competitors are using.
Property valuation data is another case which is relevant to calculating bank capital. Are the values of the real estate backing our mortgage loans, what our data source says the values are, and therefore are we applying the correct denominator when we calculate the LVR? This has a direct bearing on how much capital the bank must hold.
In an audit or regulator review, a small systematic difference in valuations can result in billions of dollars of capital too little – or too much – being held. How can a bank be holding too much capital? Clearly more capital means more safety and stability, but it also means the bank is more constrained in lending to people, making it slightly more difficult for customers to get a mortgage to own a home to live in or invest in.
For marketing campaigns, because the parameters are based on internal data sources it may seem that getting the best source data is straightforward. A large bank may have many data sources that are not consistent with each other because they are collected or calculated for different purposes, and on different timeframes.
For example, the definition of vulnerable customers, may change over time, and this may affect whether or not a customer is in the target population for a particular campaign.
The point in time when the balance of an account is calculated on a monthly cycle – or perhaps it is an average balance over a period – will affect whether a customer’s account balance is within the threshold to be in the target population for a particular campaign.
Saying ‘use the best source’ is much easier than doing it. An ideal data source might not be in a system that is linked to the system where your application sits, or it might not use the same customer key or account key that your application uses, requiring a mini project to add these.
Principle 3: Use a practical design.
Regulatory constraints and the potential glare of the media does not mean that we need to go for perfection or to over-spend on the solution. These are also detrimental to customers who will pay higher prices due to the financial institution’s costly internal data projects, and who may not get the benefit of new products or communication campaigns due to delays from overly ambitious internal data project goals.
Most projects will be on safe ground if options are socialised internally for review, feedback, and refinement, based on clear criteria such as cost, time to implement, data quality, customer benefit, and decisions are explained in writing.
Conclusion
Typically, data modelling is part of a larger project, because it is a task to deliver some business goal. Whatever your project, I hope these principles and examples help you in your next data modelling sub-project.
About the Author – Dan Misra, 10 April 2025
Find me on LinkedIn here https://www.linkedin.com/in/danmisra/
See Dan’s profile here