Please select your region so we can provide you with the most relevant content

Capco Institute Blog

Demand-side-driven data warehousing: Building what you need, not what you could

Massive centralized data warehouses that could presumably contain all the data within a financial institution have been the holy grail of information management architecture ever since storage costs plummeted to immaterial levels. In this “supply-side warehousing” school of thought, the prevailing rule is that anything that is available should be harnessed because someone, somewhere, at some point in time may have some use for it. This has led to the development of ever-larger and more complex systems, as diversified organizations have attempted to harmonize detailed models of a large array of disparate instruments, transactions and analytic measures.

As budgets become increasingly tighter, regulatory demands more numerous, and the maintenance costs of such systems grow exponentially (along with an increase in the amount of workarounds), many financial institutions are seeing the pendulum swing in the opposite direction: Avoid centralizing the data and rely on direct inter-system messaging instead. In a recent online article on Wall Street & Technology, well-known experts in information management systems argue that “[...] firms are no longer spending money on gigantic warehouses and agreeing on a single data model that would fit all the data in one place [and instead] are looking at more messaging-oriented data management, where the reference and infrastructure lie in the systems of record, and they then pass that information to subsequent systems [...].”

This approach, however, is applicable only for firms that have spent considerable amounts of resources over the past several years to improve and standardize their primary source systems at least to some reasonable extent — in other words, the largest commercial and investment banks. For the vast majority of second- and third-tier banks that are now experiencing the risk management and regulatory pressures previously reserved for the top tier, this kind of approach is not feasible. Their systems are simply too numerous and too disparate and their processes involve too many manual adjustments to allow for such a decentralized approach.

For example, an assessment of requirements to build fairly basic credit risk reporting at a large super-regional bank resulted in the need to harvest information from more than 30 source systems — whose data was stored in proprietary formats, various flavors of relational and Lotus Notes databases, myriad spreadsheets, text files and other “repositories” of manual overrides and adjustments. Customer information existed in no fewer than nine customer relationship management systems.

This case study, by no means unique, is representative of the state of infrastructure within institutions that are just below the “too-big-to-fail” radar. Clearly, direct communications among those systems is out of the question. But what is the answer? One solution is a demand-side-driven approach to data warehousing, where a centralized repository still provides a harmonized view of the data and acts as the “traffic cop” between all these disjointed systems but is built under much more carefully defined guidelines. Financial institutions interested in developing a successful demand-side-driven data warehouse should:

  • Design an initial, generic, structurally flexible and parsimonious model that can easily accommodate new requirements.
  • Enrich the warehouse, as the term “demand side” implies, only with data required for a specific business use (outside of the original source system), instead of the old “just in case” approach.
  • Ensure that even with the addition of specific business requirements, the structure of the warehouse remains “foundational,” that is, includes only data and business logic likely to be enterprise-wide and which can be accessed only through properly designed handles.
  • Build a secondary layer of specialized data marts, where additional data and logic relevant to one or a few “consumers” (which can be a business unit, a particular type of reporting package, etc.) can enrich or transform the data to suit its own needs.
  • Establish a responsible governance system, in which the development costs are borne, at least in part, by the “consumer” of the data, thus ensuring that any new requirements are carefully evaluated.

This kind of pragmatic approach retains most of the benefits of a centralized, harmonized and carefully managed data warehouse that is essential for institutions with a large number of legacy systems, while avoiding the costs and pitfalls of the universal, monolithic behemoths that have been created over the years. Banks that have lagged behind in infrastructure spending should learn these lessons from the past as they struggle to cope with the growing transparency, auditability and reporting requirements that will only increase in the future.

What benefits could your organization reap from a demand-side-driven approach to data warehousing? Join the discussion.

Comments

Good article
Let me see if I understand this correctly.
Lets say I am an Investment manager who has recently converted onto various vendor supplied systems for placing orders, portfolio management and middle and backoffice etc.
The demand model is to keep all of the required data in the source systems versus building the traditional datawarehouse for all data from all systems. Now based on the data subscriber 1's requirements extract the data from the source systems into a flexable database for the data subscriber 1. The data subscriber will then obtain their data from this database.
For other data that is needed for other subscribers who also want the data from the first subscriber then build another data mart and repeat this process as needed.
What are your recommended reconciliation solutions for this model?

Thanks Dave.
I think your interpretation is right, but just to re-iterate, I'm assuming what you mean "subscriber" is what I called "consumer". In that case, what you have is a 3-layered model. 1) Data that nobody but the original system of record needs stays there and doesn't flow anywhere else; 2) data that is needed by other "consumers" does go into a warehouse (and passes trough a proper data quality and certification process, hence no reconciliation processes are needed); 3) Additional enrichment of that data (or the model holding it) can happen at either the warehouse if it is likely to be needed by multiple consumers or in the next upstream layer (the mart) if it's only needed by one of them. Let me give an example: a loan booking system may hold dozens of attributes of collateral most of which are irrelevant to other LOBs or reporting groups. So the "collateral" model in the warehouse can be simply "type" and in the case of real estate "zip code". Now one particular type of risk reporting may need to know if this property is in an area prone to floods, so you need another attribute "is in flood zone?". However being the only user, this addition should be done in its own mart and not complicate the foundational warehouse. On the other hand, many reports may need some form of "region" derived from the zip code. If that is the case, that attribute is enterprise-wide and should be added to the warehouse model.

Very good article. The number of reconciliation business and system flows at financial institutions is ridiculous. Given that it's not rocket science or a huge competitive advantage within a firm, a great candidate for a smart consolidation instead of simply taking output from many systems and throwing it all into a giant bitbucket in the basement. The tricky part is to design the 'structurally flexible and parsimonious model' which will satisfy all the stakeholders and get itself into the critical path.
Can you think of any real-world systems outside finance that would serve as a model?

Thanks.

I agree that the design of the foundational model can be very tricky and there is never a one right answer. However, this is where experience comes in and having people who have seen it done in many ways (right and wrong) and have struggled with it in different contexts (and have gotten it wrong themselves) is critical.

As far as examples outside finance, I'm not sure I can think of a specific system, but well designed object-oriented inheritance models follow the same paradigm: you build abstract base classes (foundational) that no "user" accesses directly. Instead you have derived classes that are tailored for particular usage. Any common elements reside in the base, anything that is special-purpose gets moved to specialized (derived) versions. Note that the derived classes (or the marts in our cases) can both add complexity (specialized rules and calculations) as well as remove it (hide from the user anything that is not needed for that user's purpose)

Leave a comment

Comments are moderated and will be posted if they are on-topic and not abusive. For more information, please see our Comments FAQ
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.