As an analyst, I cut my teeth writing about database. In the 1990s the database market was hugely active with a mass of products; Adabas, Allbase, Datacomm DB, DB2, Empress, IDMS, Informix, Ingres, Interbase, Oracle, Progress, RDB, Sybase, Unify, and quite a few more. Object databases were introduced in 1993 or so, and then the object relational database emerged. Microsoft got into the game by licensing the Sybase code. Database benchmarking contests consumed column inches of IT news and then Very Large Database (VLDB) became a focus and Teradata, Red Brick and Tandem Non-stop SQL got air-time.
It was a very active market and then, all of a sudden, it wasn’t. Some products atrophied, companies ate other companies and eventually the marketing contest was decided. Three very big players: Oracle, IBM (DB2) and Microsoft (MS SQL Server) became dominant and most other products scratched out a living if they could. That’s what happens with lively software markets, eventually they settle down with a few large vendors sharing the spoils. The only development that disturbed the comfortable hegemony of the big three was the gradual success of MySQL, which was Open Source and optimized for serving web pages. It dominated a small market that soon became a big market.
Vertica and Verticality
If you put specialist database applications to one side, you can view the database world as consisting of 3 big markets:
- Commodity Database: For undemanding database applications pretty much any viable rdbms will do. May as well stick with Oracle, DB2 or SQL Server, whichever your company uses most.
- Large OLTP Databases: These are for very heavy transaction workloads. The big 3 are used here, but also some older products, IDMS and Datacomm for example, run big OLTP systems.
- Data Warehouse: These databases hold masses of data (in the terabyte area). Oracle, DB2 and SQL Server are used in such applications but so are more specialist products including Teradata, Green Plum, NeoView, Netezza and one or two others.
As far as Data Warehouse is concerned, there is now a new kid in town, that goes by the name of Vertica. The team that created Vertica is led by the man who can claim to be the most prolific database author in history, Prof Michael Stonebraker. The list of databases he has worked on includes; Ingres, Postgres, Illustra, Cohera, StreamBase and now Vertica. When something new comes from the Stonebraker stable it is usually worth looking at - and Vertica is definitely worth looking at.
The simplest way to explain why Vertica is compelling is to describe the thinking behind it.
- First of all, the vast majority of data warehouses are built using a simple snowflake schema. The important point about such a database is that the vast majority of the data (95% +) is held in a single “fact table”. The fact table will hold the normal transactions of the business. For banking that would be account transactions, for a telco it’s telephone calls, for a retailer its individual items purchased and so on. Nearly all the transactions run against the data warehosue will involve pulling data from the fact table.
- Vertica is a database that focuses almost entirely on pulling data from very very big tables. It optimizes queries that do precisely that. It does so by organizing the data into columns and processing on a column by column basis. I could explain why this is dramatically faster than the way relational databases typically process database tables, but there is not the space here. Take my word for it, it flies like a bullet from a gun. Depending on circumstances, it can be 100 times faster.
- There are products like those I mentioned (Teradata, NeoView, etc.) that do the data warehouse job faster than Oracle/DB2/SQL Server, but not 100 times faster. 100 times faster is “two orders of magnitude” and that is huge. It means that a query that previously took 3 hours, now takes about 100 seconds. For data warehouse that’s game changing.
Vertica Characteristics
In actual fact, there’s nothing that Vertica is doing that is new in terms of database techniques. It uses column processing (not new), it compresses columns (not new), it is highly parallel (not new), it is optimized for a particular workload (not new), it’s shared nothing (not new), it works on commodity hardware (not new), it is entirely self-tuning (not new). However, when you put all of this not-newness together you get data warehouse characteristics that are distinctly new. Here’s a list:
- Requires no DBAs
- Self scaling (to scale up, it kinda scales out)
- Always on (loading is concurrent with processing)
- Near real-time data (indeed so, if compared to anything else)
- Configurable to be fully fault tolerant
- Very much faster
- Considerably reduced cost
- Can work in the cloud
The last of these characteristics is doubly worthy of comment and it is the most exraordinary aspect of Vertica. I have to confess that, although I eventually expected data warehouse to be offered as a cloud-based capability, I never expected it any time soon. Vertica is partnering with Amazon to deliver it now - at a price of $2000 per month for half a terabyte of data.
If you know anything about the costs of a data warehouse then you will think I’ve misquoted the price, but I haven’t. You really can commence building a highly scalable always available data warehouse and have it up and running in a day or two (if you’ve got the data ready) for a few thousand dollars.
If that isn’t revolutionary, then what is?

























I highly doubt Vertica (or any database for that matter) requires no DBAs…that’s stretching it bigtime.
That said, Vertica has some interesting properties. But I wonder how the market will play out with several strong players. Besides Vertica, there’s Greenplum, Datallegro, Aster Data, Infobright, and others that are probably still in stealth.
It’s great to see competition and innovation again…long awaited and very welcome for the large community of database junkies!:)
Of course I’m taking Vertica at its word about no DBAs, but for there to be DBAs there needs to be tuning parameters.
“but for there to be DBAs there needs to be tuning parameters.”
And backups and recovery and security, etc. The only database that doesn’t need a DBA is the database not worth having.
I do however think the architecture of vertica is amazing. They have a white paper on the vertica site that lays it out in nice detail. Very, very slick.
LewisC
So tell me, what will the DBA be doing in the cloud implementation?
For the cloud provider or for the cloud customer?
Provider: Backups, DR planning, tuning, daily maintenance, etc. This stuff all still needs to be done, it just shifts from local to remote.
Customer: Creating users & roles (security), monitoring performance (to decide when to add capacity), schema design, additional indexing, etc.
The cloud changes where things run, day to day still has to get done.
LewisC
““two orders of magnitude”… means that a query that previously took 3 hours, now takes 10 seconds.”
3 hours * 60 min/hour * 60 sec/min = 10800 seconds
10800 seconds / 100 (two orders of magnitude) = 108 seconds.
How do you get 10 seconds?
3 hours (10800 seconds) to 10 seconds is 1080 times (3 orders of magnitude) faster.
Niko
I got there by making a mathematical error. Thanks for the heads up. I’ve now corrected it in the article.
it’s a well known thing that in a columnar database the performance is good while working on one or few columns.
how good are they with fetching multiple columns?
Adi
Technically multiple columnar operations are handled in parallel and then merged together to create rows. I can’t think of why there would be a significant overhead to that. You could ask Vertica this question directly. With any luck someone will be reading the blog and will offer you a reply which is better informed than this technical assumption.
Thanks for the quick response.
Adi
Hi, I work for Vertica, and here are a few comments on the comments…
Regarding the “No DBA” claim…Vertica’s architecture greatly reduces the time DBAs need to spend getting it running and keeping it fast (via all sorts of automatic installation, tuning, physical design, replication, failover, recovery features as Robin describes). DBAs ARE still required (as LewisC mentions) to focus on data security, backup and integrating new data into the database. Vertica spares DBAs from the design, tuning and recovery tedium so they can focus on higher value added work.
Regarding Niko’s orders of magnitude math…Good catch! There are examples of >2 orders of magnitude performance out there. One customer, NetworkIP (a US-based prepaid telecom services provider) described (in a webinar) a query that formerly took 12 hours to run and now takes 10 seconds…depending on the workload, customers will find these exceptional boosts in query performance (for a given query), but for the overall query workload (averaged across tens of different queries a given DBMS is handling), the query performance increase typically ranges from 50x to 200x relative to a row-oriented DBMS.
Lastly, regarding Adi’s comment about whether a column DBMS performs as fast when processing more than just a few columns…A column database reads just the columns referenced by the query, which reduces disk IO and speeds up query handling (versus an index-less row db, which must scan the whole table). But if the query is referencing every column in the table (a rare query for most data warehouses), this benefit was negated in early column dbms implementations. The Vertica architecture has much more than column-orientation going for it…data compression (typically up to 90%, and Vertica can query data in its compressed form) further reduces disk IO, and as Robin mentioned in his comment, it’s shared-nothing architecture and automatic data replication spreads the query and loading work across multiple servers working in parallel. So even when reading a relatively large % of the table, Vertica performance stays fast.
As LewisC mentioned, the Vertica Architecture white paper (found on the vertica home page) describes how all this works in detail.
This is an interesting thread–thanks all.