With SAP chewing up Business Objects and IBM swallowing Cognos whole, you might think that the BI market has reached the end-game stage, with just a few giants left standing (IBM, SAP, SAS, Oracle and Microsoft) and one or two as-yet-uneaten critters, like MicroStrategy, flitting around in the shadows. Dr Fern Halper and I believe otherwise.
As the old guard disappears, we see it being superseded by a new set of energetic companies with new approaches to BI. Most such companies have come to our notice very recently and they are all innovative in their approach. Particularly they emphasize new analytical approaches, new and better ways to illustrate data, and quite often, a real-time or near real-time approach to BI.
QlikView
QlikView, from QlikTech, falls into the first category. It offers a new analytical approach. In fact, imho, QlikView has dug a hole for the once admired “data cube” and is gradually burying it.
Data cubes, and even multi-dimensional cubes (multi-dimensional data structures) were once the “new new thing” in BI. They were the much vaunted child of OLAP (On-Line Analytical Processing). The idea was simple enough, you grabbed a “subject area” (or set up a data mart) and built multi-dimensional data structures that allowed you to analyze data in terms of any of the dimensions you had specified.
The problem OLAP products solved was that relational databases held data in 2 dimensional tables and didn’t provide easy ways to build and analyze 3 dimensional or multidimensional data structures. Add this to the fact that one of the most important BI needs for a company was to analyze sales figures, which had at least 3 distinct dimensions (of sales totals, time and geography) and the attraction of the OLAP cube became obvious.
So what does QlikView do that’s new?
First of all, it doesn’t hold data in cubes or similar OLAP structures, it holds data within a specially designed in-memory structure that is far faster to access. It is faster to access for two reasons:
- It is in memory, not on disk (see this posting for more on the memory/disk trend).
- It is designed to provide fast access to all data by any given item in any table.
The way that QlikView works is that you extract data from one or more databases (even very large databases) and it builds its data structure in memory on a server using the associated schema information from the databases.
The power is in that data structure. It means that QlikView can build any multi-dimensional view into that data in a fraction of a second. You want to see sales by month by product by state, OK. How about weekly sales by discount scheme by city by customer age, OK. Or maybe product by customer, by marital status by popularity, OK. With QlikView you can analyze on any data item combined with any other data items that links to it, in a fraction of a second.
QlikView also has the advantage that it is really easy to use. You can get the gist of it by going to the web site and trying the demo. One of the realities of the old data cube was that users had to have a reasonably good understanding of what they were doing in order to decide which cubes to build.
With QlikView you don’t think in cubes at all. You can literally play with the data and, to be honest, you could have no idea that you’re carrying out multidimensional analysis. You are just navigating through the data, discovering interesting information as you go. And if you want dials or pie charts or bar charts or tables of data, you can have them. It’s all ticks in boxes.
QlikView doesn’t require “knowledge workers” to use it and it provides results dramatically quickly. It is game changing. It is going to bury the data cube.
Note: Dr Fern Halper and I collaborate on the topic of BI. You’ll find her blog here. Fern’s comments on QlikView are here: Is This the Death of the Data Cube? (continued)

























I was stupefied by your comments so I test drive the software and also contacted my mentor Ralph Kimball
and this what I got from him.
”
While I have no doubt that QlikView employs clever, appealing technology, their claims fall into the category I call “objection removers”. In other words, they sweep away all your troubles in one dramatic step in hopes that you will buy their product before you start thinking about the larger picture.
What stand do they take on
· Local data control (i.e. staging) both of the original extract as well as the delivered BI payload
· Cleaning
· Deduplicating
· Conforming across multiple original sources
· Establishing durable surrogate keys
· Processing slowly changing dimensions (some technologies are extremely sensitive to Type 1 changes for instance)
· Handling late arriving fact data as well as dimension data
· Drilling down anywhere, not just declared hierarchies
· Multi-valued dimensions
· Ragged hierarchies
Also do their claims support a multi-vendor environment using other BI ttols?
You might want to read an article I wrote on this issue of objection removers. Please see http://www.intelligententerprise.com/showArticle.jhtml?articleID=167100313 .
Finally, I believe there is no way to avoid a specification step somewhere in the use of any BI tool. Somewhere, you have to map the source data into the user interface of the BI tool. Every time I have looked at this step, I have concluded that the work to do this correctly is as much work as the cube or star schema building that they claim to avoid!
Good luck,
Ralph”
So I contacted the vendor and they tried to dodge the questions and refer us to their documentation on incremental loading and saving incremental files and reuse them, so I asked for proof of concept regarding these points, the magnitude of programming involved is huge so we decided to go with our old fashioned data warehouse design, before committing to this type of claim do more research and ask for proof of concept.
Sam Moayedi
Sam,
It’s hard to argue with Ralph Kimball on the academic virtues of data warehousing, but that appears to be exactly the feedback he gave you. If you asked me how to mow your lawn and I asked you 20 questions about the meaning of grass and how to effectively grow it evenly and slowly such that less mowing would be required, would you go back to the Toro and tell them they sold you a mower you didn’t need?
BI is not easy and NO vendor will solve all of your problems. Qliktech is trying to solve a few of them, and is doing it better than most because their solution breaks the age-old paradigm of needing large databases. Think about it for a minute…how scary is that to all the academics and BI vendors that based their entire careers on database management? It’s very scary and their first reaction is to talk about process maturity, data cleansing and other BI disciplines that Qliktech never claimed to solve for you in the first place. Believe me, I spent 15 years on warehouses trying to deliver on the BI promise before I decided that one tool (or one solution) can not do it all. Warehouses have their place, big BI has its place, and now data discovery and in-memory analytics finally has its place.
Here’s the real value - try Qlikview and find out for yourself WHERE it fits into your BI stack and solution set. Don’t let an author, vendor or consultant whose salary depends on databases tell you that it can’t be done without massive warehouses. It can, and it does not replace the warehouse as a BI deliverable. It just allows you to handle parts of your BI solution in a faster, more intelligent, user friendly, agile and meaningful way (read: $$$ savings).
Just like every other paradigm shift in the IT industry, most of the existing experts in legacy solutions were late adopters, and this won’t be any different.
Brad
BradP
I’m inclined to agree with you on this. Quite clearly Qlikview makes no contribution to the whole gamut of Master Data Management, and as far as I can tell, it was never meant to. It kicks in at the point where you have data that you want to query which is stored in some database with a schema - which is the same place that a data cube product kicks in.
You can thus go through the whole datawarehouse exercise and create subject databases from the warehouse and use it on those. Alternatively you can replicate production datbases or subsets of them and group them together and use it on those.
Sam’s response to my article doesn’t make much sense to me.
A few things:
1) Reporting on sales totals by time and geography is very easy in a dimensional model, and does not require OLAP. A Sales fact table is simply joined to both the Georgraphy and Date dimension tables and the results are grouped any way you want. You have completely missed the concept of what differentiates OLAP from ROLAP (relational-OLAP, attempting to build a Kimball bus). It’s the density of the data. When joining a bunch of dimension tables to a Sales fact, when only have the rows in the fact table for Sales that actually happened, so the data is what we call “sparse” for the days where no transactions happened. What OLAP enables us to do is calculate ALL dimensional possibilities, even the ones where nothing happened. When reporting on search results on a website (what keywords didn’t come up), or reporting on sales (which items in inventory didn’t sale), OLAP fills in the gaps for us. In fairness, Oracle (and maybe others) have the ability to report on this, but OLAP makes it much easier.
2) I often hear about concepts for replacing a DBMS, and it usually has to do with “in-memory” alternatives. People who espouse these ideas have in mind that every read to a database pulls data from disk. That is not the case at all. All DBMS have buffer caches where frequently accessed rows stay in memory, so that we don’t have to go to disk to get them every time. So the heavily used rows in almost any DBMS area already in memory, much like your proposed solution.
As regards 2), you are completely wrong. That would be 100% wrong. People who know in-memory database technology know well enough that you can get high cache hits on disks reads. Even in the early days of database it was usually possible to achieve 95% hit rate by intelligent cache strategies and it’s possible with most databases to pin whole tables in memory. The real problem is that the physical table structure is hugely inefficient both for in-memory store and for resolving an in-memory query. That’s why you can build in-memory databases that run 100 times faster than relational databases that are held completely in memory. Take a look at Clear Pace for example or, if you like, drill down under Qlikview to see how it works at the technical level.
Because you are so wrong about 2) it’s easy to understand why you’re wrong about 1). OLAP structures are inefficient structures when compared to well designed in-memory structures (whether pinned in memory or held on disk.) They outperform relational largely because of the 3rd normalization tendency in relational database - to normalize repeating groups. They are bewilderingly slow compared to good in-memory technology.
The 3NF modeling philosophy comes from Bill Inmon in his Corporate Information Factory methodology. Ralph Kimball has written books and books defending his Bus Architecture which argues against normalization techniques such as 3NF. Even Bill Inmon argues that the presentation area should be denormalized for performance. Your Straw Man argument against DBMS’s won’t hold water. The “tendency” of data modelers, which is a debatable point as well, should have no bearing on whether the DBMS can compete. I’m sure I could develop a radically inefficient memory structure to put up against a DBMS. It is logical fallacy to argue that your inductive argument can give deductive results.
I’ll have to be honest… I can’t comprehend what a “disk-based” “in-memory” structure is, so I can’t comment on that. But, just as with DBMS’s, they have to be synchronized with disk-based structures sometimes, or else, we have no ability to backup and recover these structures. As for the performance, we may just have to agree to disagree. There are Java persistence layers that try to move the work out of the database, and these methodologies start hashing and comparing values in memory before putting the structures back in. My experience finds these methodologies to be “bewilderingly slow”, especially compared with what a fine-tuned database can do with intelligent SQL.
You can read about Kimball’s objects to 3NF and denormalization techniques in data warehouses at http://www.intelligententerprise.com/showArticle.jhtml?articleID=17800088 . I seriously think you should become more familiar with real data warehousing technologies before you dismiss them with the wave of your hand. And their are more polite ways to have a debate with some one than to declare he or she is “100% wrong”.
The mistake you appear to be making is to confuse logical with physical. The physical architecture of all the major RDBMS products is fundamentally “optimizer plus btree” with various variations and sophistication in the caching depending on product. This architecture can be complicated by the physical implementation of database constraints (referential integrity, etc.) and stored procedures, but they tend to get designed out when performance becomes an issue. The implementation of star schemas and snowflake schemas is simply a demonstration of the fact that the optimizer approach rarely works for fully normalised data.
The early OLAP products were usually btree based data stores that deliberately held repeating groups efficiently at the data level so that there was no need for joins. It gave them a physical performance advantage over RDBMS and justified their existence.
The physical structures that make the newer in-memory products so much faster than this earlier physical architecture include; column stores, storing cardinality only and retrieving query solutions by tree walking, tokenization and efficient scale-out parallelism. None of these techniques are in-memory per se, but as with all database if the whole structure is in-memory then it runs an awful lot faster. The reality is that these structures make it possible to compress the data down to a fraction of the size of the typical RDBMS. 20 to 1 or 30 to 1 are proven in usage.
The relatively new database product, Vertica, uses many of these techniques and has proven performance improvements in the 100-to-1-and-better area over typical RDBMS. ClearPace’s NParchive is actually sold as an archiving capability and it markets its disk performance rather than its in-memory speed - although both are far far ahead of the typical RDBMS. QlikView I believe, is completely in-memory - in that you cannot split the data store between memory and disk. Nether Vertica nor ClearPace’s NParchive are built that way.
These are physical structures. Logically, Vertica and NParchive look just like a relational database if you look at them through SQL. You can also consider Cache from Intersystems which has an object architecture. It has a SQL interface and it regularly outperforms RDBMS products in OLTP contexts because its physical architecture is a lot more efficient the “btree/optimizer” approach.
Sorry if I offended you with my previous comment. But you did write
“I often hear about concepts for replacing a DBMS, and it usually has to do with “in-memory” alternatives.
No-one I’ve encountered is talking about replacing a DBMS they’re talking about delivering one that performs better. The problem is that RDBMS has had performance issues from the get go and it still has.
You also wrote:
“People who espouse these ideas have in mind that every read to a database pulls data from disk.”
I’ve never met anyone espousing in-memory database who believe that every DBMS read pulls data from disk. I’ve never met anyone who worked with database who believed that. I’ve never met anyone who knows how a computer works who believes that. Sorry for offending you, but you offended me first. There are more polite ways to have a debate with someone that to imply that they are stupid.
I don’t see how confusing the logical with the physical was my mistake, as it was your comment I responded to: “They outperform relational largely because of the 3rd normalization tendency in relational database.” 3NF is a logical modeling approach, not in any way a physical approach.
Your comment about the superiority of QlikView over DBMS’s was the following: “It is in memory, not on disk”. I can only comment about what you posted. I cannot read into what you may know, and what you may not know.