Metadata Mining

Another idea that Mike Hoskins threw at me in our brief conversation was Metadata Mining. You only need to consider the term to realize there’s potential in it. Yes, if you gather sufficient metadata – and by the way quite a lot of it is easy to harvest – then – as long as you have the necessary rules of transformation and cleaning, you can begin to build a federated corporate data resource. But some of the metadata is not easy to harvest and you are going to have to “mine” it. It is buried in the source code of programs written in any number of languages from COBOL to Ruby, and all points in between.

And that’s just the data in storage pools. There is also the data that is passed in messages and – by the way – there is a lot more data than the simple structured stuff of databases. There is also text and graphics and voice data and video – and normally a great deal of that data has some simple structured stuff appended to it. And now that we are in the era of XML, we can parcel our data up and pass it around in neatly self-defining bundles – from application to application or, to be more accurate, from context to context.

I could push this a lot further – into SOA and into the semantic web – but the truth is that there’s no great point in venturing to such exciting frontiers, while there our current pools of data are dirty and growing weeds. The most important point about a Metadata Warehouse is that it can start to fix those problems, which we all know are a drag on the business.


«- 1 2 View All

  Subscribe to HaveMacWillBlog in a reader