I recently came across a nice whitepaper from Oracle that describes the Exadata columnar compression scheme. I wrote up a brief overview of Oracle's columnar compression in the past (in my hybrid storage layout post), but I was pleased to see the whitepaper give some additional details. Before discussing the main content of the whitepaper, I want to correct a few technical inaccuracies in the article:
"Storing column data together, with the same data type and similar characteristics, drastically increases the storage savings achieved from compression. However, storing data in this manner can negatively impact database performance when application queries access more than one or two columns, perform even a modest number of updates, or insert small numbers of rows per transaction."
The first part of the above statement is usually true; columns tend to be self-similar since they contain data from the same attribute domain, and therefore laying out columns contiguously on storage typically yields lower entropy and higher compression ratios than storing rows contiguously. However, the second part of the statement is quite misleading and only true for the most naïve of column-oriented layout schemes. True, the benefit of storing data in columns decreases as a query accesses a higher percentage of columns from a table. However, the number of columns that need to be accessed such that storing data in columns impacts performance negatively relative to a storing data in rows is quite a bit larger than one or two columns. There’s a VLDB 2006 paper by Stavros Harizopoulos et. al. that runs a bunch of benchmarks to understand this tradeoff in more detail. There are a bunch of factors that affect whether column-storage or row-storage is best beyond simply the number of columns accessed (e.g. prefetch size, query selectivity, compression types), most notably the fact that column-stores struggle on needle-in-a-haystack queries since the multiple I/Os needed to get data from different columns are not amortized across the retrieval of multiple rows, but the bottom line is that the query space where pure column-oriented layout outperforms pure row-oriented layout is quite a bit larger than what the Oracle whitepaper claims. Regarding updates and inserts, the VLDB 2005 C-Store paper and OSDI 2006 Google Bigtable paper discuss how to alleviate this column-store disadvantage by temporarily storing the updates in a write-optimized store.
"Oracle’s Exadata Hybrid Columnar Compression technology is a new method for organizing data within a database block."
I feel that this is not giving enough credit to the VLDB 2001 paper by Ailamaki et. al. on PAX. Even though the PAX paper didn’t apply compression to this layout, and PAX didn’t include multiple blocks inside a single compression unit, the basic idea is the same.
One of the key benefits of the Exadata Hybrid Columnar approach is that it provides both the compression and performance benefits of columnar storage without sacrificing the robust feature set of the Oracle Database. For example, while optimized for scan-level access, because row data is self-contained within compression units, Oracle is still able to provide efficient row-level access, with entire rows typically being retrieved with a single I/O.
To people who have even a rudimentary knowledge of column-stores, the juxtaposed statements “entire rows being retrieved with a single I/O” and “provides the performance benefits of columnar storage” is clearly inaccurate. The most often cited advantage of column-stores (indeed cited even more often than the compression advantages) is that if you have a query that has to scan two out of twenty columns in a table (i.e. “tell me the sum of revenue per store”), a column-store only has to read in those two columns while a row-store has to scan the entire table (because multiple entire rows are picked up with each I/O). Hence, a column-store is very I/O efficient, an increasingly important bottleneck in database performance (albeit somewhat reduced with good compression algorithms). Again, Oracle’s statement is true for needle-in-a-haystack queries, but false for the more frequent scan-based queries one finds in data warehouse workloads (which is where Exadata is primarily marketed).
Despite the above overstatements, it is clear that Exadata’s Hybrid Columnar scheme is a big win (relative to vanilla Exadata) for most Exadata datasets and workloads, for the following reasons:
- Storage costs are lowered by having a smaller data footprint.
- I/O performance is improved by having to read less data from storage.
- Data is kept compressed as it is scanned in the storage layer. Selections and projections performed in the storage layer can be performed directly on the compressed data. (Those of you who are familiar with my academic work will not be surprised I like this, given that I wrote a paper on integrating compression and execution for column-oriented storage layouts in a SIGMOD 2006 paper). This yields all the storage costs savings and runtime I/O savings of compression without the performance disadvantage of runtime decompression.
The whitepaper is vague when it comes to the question of whether data remains in compressed form after it is scanned in the Exadata Storage Servers as it is shipped over the Infiniband network to Oracle RAC for further processing. It appears from the way I read the paper (I could be wrong about this) that data is decompressed before it is shipped over the network, which means that Oracle is not gaining all of the potential performance from operating directly on compressed data, as data does have to be decompressed eventually, though the amount of data that needs to be decompressed is smaller, since selections and projections have occurred. If Oracle RAC could operate directly on the columnar compressed format, then decompression would never have to occur, which would further increase performance (query results would obviously have to be decompressed, but usually the magnitude of query results is much smaller than the magnitude of data processed to produce them). Not only is Oracle missing out on potential performance by decompressing data before sending it over the network, but also network bottlenecks could be introduced.
- If storage costs are the most important consideration (e.g. for historical data that would otherwise be archived to tape or other offline device), there’s a knob that allows you to further increase compression at the cost of having to decompress data at runtime (and this decompression is heavyweight), thereby reducing runtime performance. Furthermore, the same table can be compressed using the lightweight (high-performance) compression and heavyweight (low-performance) compression using Oracle’s partitioning feature (e.g. you could partition by date and have the historical partitions use the heavyweight compression scheme and the active partitions use the lightweight compression scheme). Note that lightweight still yields good compression --- Oracle claims up to 10X, but this is obviously going to be very dependent on your data.
The most interesting part of the whitepaper was the figure of a compression unit on page four. It is clear that one complication that arises from compressing each column separately is that for a fixed number of rows (e.g. tuples numbered 1-1000), the column sizes are going to vary wildly (depending on how compressible the data from each column is). Figuring out how to fit all columns from a set of rows into a fixed-size block (or set of blocks inside a compression unit) without leaving large chunks of the block empty is a nontrivial optimization problem, and something that Oracle probably had to think about in detail in their implementation.
Anyway, if you’re interested in Oracle’s approach to hybrid columnar storage, the whitepaper is worth a read. With rumors of additional developments in Oracle’s hybrid columnar storage starting to surface, it is likely that this will not be my last blog post on this subject .