Are you creating a Canonical (or “Common”) Information Model?

Mike Gilpin 2009 Casual Head Shot - Edited




It’s been almost two years since I last wrote about this topic, but since then this trend has continued to accelerate. I have not had an opportunity to do another survey myself, but have seen:

  • Anecdotally among many clients doing SOA, more than half are also creating and managing one or more canonical information models for their SOA and/or information management strategies. These are all focused on “data in motion,” not “data at rest.”
  • Surveys from other sources have shown 50-60% of those doing SOA are creating a canonical information model (increased from the 39% rate our 2007 survey found). Last week I saw data shared informally by a major vendor of SOA suites, from a survey of hundreds of their customers (all of whom are doing SOA), showing more than 60% are creating a canonical model.

So what’s behind this growing trend? The forces we identified in our original research piece are all still in operation, but to give a quick view, stories I’ve heard typically go like this:

  • We have thousands of XML Schemas (XSDs) about the place, rapidly proliferating out of control – with each representing the information model as the local team sees it for their application interchanges or service interfaces.
  • From the point of view of an individual developer or small team, it’s not a big deal, but the lack of a canonical/common model is a huge obstacle to any integration or interoperability we require across multiple applications.
  • Our issues with schema governance are exacerbated by the rapid evolution of the industry schema standards with which we must comply.
  • The lack of interoperability is especially painful when:
    • We’re integrating with one of our ecosystems of B2B partners.
    • We’re integrating/automating a cross-functional business process, like order-to-cash, or order-to-provision.

When these folks try to establish a canonical model, results vary:

  • If the work happens in a context where industry standards like SID, Acord, or FPML can provide a starting point, the effort tends to succeed. This is true even when those standards have not previously been adopted by that enterprise.
  • Where such industry standards don’t exist, it’s often much harder to get enough agreement among the interested parties to get the effort off the ground.
  • And since two years ago I’ve seen one other interesting dimension to the problems of establishing a model: the need for a federated approach. In very large organizations with multiple business domains, it sometimes turns out that it’s not possible to establish one canonical model. Instead, multiple domain models are necessary, interlinked with one another and with an enterprise-level canonical model. These domains may reflect different external ecosystems, such as securities trading participants, as opposed to customers of a wholesale bank, or international banking exchange operation.

Fortunately, since I last wrote, the state of the art has moved on, with more tools coming on the market, as well as evolution of the tools I mentioned in the earlier piece. These included (from those mentioned in the 2007 document):

  • Enterprise architecture tools. Casewise, IDS Scheer, MEGA International, Proforma, and Telelogic led the EA tools market (IDS Scheer has since been acquired by Software AG, and Telelogic by IBM). But some folks are relying for their information modeling needs on vendors like Embarcadero that made the transition from data modeling tools to EA tools more recently.
  • Tools embedded in ESB, Information-as-a-Service, or BPM suites. Major vendors of ESB, IaaS, and BPM suites often include information modeling tools as part of their solution. For example, TIBCO ActiveMatrix, Composite Software Composite Studio, Red Hat MetaMatrix Enterprise, and IBM Information Server (which includes semantic technology from the acquisition of Unicorn Systems) can be good options if you're using those suites for multiple other parts of your SOA or IaaS strategy.
  • Independent specialist vendor tools. For the most advanced modelers, especially when semantic technology is required, tools from specialists like Contivo, Metatomix, or Revelytix are a good solution. Other independent tools include Progress Software's DataXtend Semantic Integrator.

Since then I’ve also heard of others, like TopQuadrant’s TopBraid Suite. Oh, and my colleague Dave West has written a great report on the ways that semantic technology is being used by application developers nowadays. Dave and I are doing more new research in this area, about both canonical modeling and semantic technology. So please help us with our research:

Are you creating a canonical model? What tools and techniques are you using to drive your success (whether based on XSDs, or semantic technology, or both)? What issues have you encountered along the way? Can we interview you for our research?

Please comment here if possible, or Tweet with hashcode #ForrCanon, or email me (if you must retain confidentiality) at


Canonical Model (Industry Standard)

We have adopted Industry Standard canonical model from Open Applications Group (OAGi). Our SOA implementations use this as the base model. Often I find modeling tools lack the capability to accommodate out of the box models from OAGi (XML Schema) and the architecture recommendations of extensions with versioning capability, so that it can be maintained/governed well.

Re: Canonical Model (Industry Standard)

Thanks, John. In another more recent blog post I mentioned that I had attended the first annual meeting of the Canonical Model Management Forum (last March). While there, I met a gentleman from Dell, who mentioned that they had based their canonical model on OAGi, too - or at least, a version of it, as extended by Oracle in AIA (Application Integration Architecture). Also, they were using IgniteXML to manage that model and its versioning over time, and usage in other application connectors.

So you're in good company! I recall years ago (maybe 10?) when I first encountered someone using OAGi - Lockheed Martin. Their rationale was that they had a heterogeneous ERP portfolio (that is, not all SAP, or Oracle, or whatever), and that OAGi did a good job of not only integrating across those disparate apps, but also that it made them more independent of any one of their ERP vendors.

Oracle made so many acquisitions of other ERP vendors that they had a similar requirement, all inside one company (to serve their customers). Oracle AIA is not identical to OAGi - are you also an Oracle customer? Have you compared them, and noted any differences of any importance?

re: Are you creating a Canonical (or “Common”) Info

Mike, thanks for reiterating the question. It was one of the items on my list as to: without a data model is a 'Big Ball of Mud' implementation and yes, you can call it 'federated' to make it sound more 'agile'. We can import existing XSDs and XPDL processes into Papyrus and NO ONE spends serious thought as to how these are going to fare once you try to change anything. They are not linked into a change management concept! We link them together but that does not help if the outside world is not coordinated.A canonical data model managed in another environment than your execution is another piece of the modeling straightjacket that reduces the wished-for and promised agility.

re: Are you creating a Canonical (or “Common”) Info

Max, I think you’re absolutely right to point out the trade-off between trying to standardize anything (including an information model) across more than one application or project, versus agility. Dave West and I have had some interesting discussions about this; Dave wrote a great piece about it with regard to architecture, pointing out that some trade-offs are necessary to balance short-term and long-term objectives. Plus, the kind of interoperability requirements between partners or applications that a canonical model can address are not a luxury – they are an essential component of business.So how to manage the trade-off? You’ve identified one approach – linking the model to the runtime in a direct way. As long as all the systems in question are built in Papyrus, that’s a good approach.Another approach that works with heterogeneous runtimes is good integration of the model-management environment with the various development environments in use. If each project can easily obtain a “projection” of the canonical model into its local vocabulary of choice, auto-generated into an XSD that’s imported into your tool of choice, that works, too.I also think it’s important to do everything possible to unburden developers from trying to align with someone else’s model. It’s not always possible to completely shield them from such requirements, but a lot of isolation can be achieved if the integration medium around them takes care of a lot of the details.

re: Are you creating a Canonical (or “Common”) Info

Canonical format are used today as a way of managing master data in motion. We still loose huge time and money to do semantic gap analysis when we need to do integration, but at least we have a core set of business data defined and shared across the enterprise.Compliance is also a reason for us to create canonical format. Lineage and usage path of each data is then easy to create and analyse.It is also the result of a company increasing its maturity by defining its core business objects, their dependencies and their implementation and distribution across different channels.In order to do sustainable development, we do SOA by defining core objects and their lifecycle. Then each lifecycle step can be a service to be offered. It goes beyond CRUD now.

re: Are you creating a Canonical (or “Common”) Info

William, thanks for your comments. Have you used a tool for defining or managing your XML schemas or other canonical modeling artifacts?