- I totally agree that the future of high-end data warehousing is going to move away from data warehouse appliances and into software-only solutions on top of commodity hardware served from private or public clouds (appliances still might be an option in the low-end).
- I also totally agree that data silos and data marts are an unfortunate reality. So you may as well get them into a centralized infrastructure first, and then worry about data modelling later.
- I also agree with the self-service nature of the vision.
- I wonder what Teradata has to say. This seems very much counter to their centralized "model-first" enterprise data warehouse pitch they've been espousing for years.
- I wonder how Oliver Ratzesberger feels about Greenplum essentially rebranding eBay's virtual data mart idea and claiming it for themselves (though admittedly there is slightly more to the EDC vision than virtual data marts). I agree with Curt Monash when he says that you're probably not going to want to copy the data for each new self-service data mart, in which case good workload management is a must. Teradata is probably the only data warehouse system that already has the workload management needed for the EDC vision. NeoView might also have good enough workload management, but it hasn't been around very long.
- I wonder if Greenplum felt a little burned from their experience with their MapReduce announcement. In that case, they implemented it, tested it, and then announced it; but unfortunately they then had to share the spotlight with Aster Data which announced a nearly identical in-database MapReduce feature the same day. This time around, they've apparently decided to make the announcement first, and then do the implementation afterwards.
- It appears that the only part of the EDC initiative that Greenplum's new version (3.3) has implemented is online data warehouse expansion (you can add a new node and the data warehouse/data mart can incorporate it into the parallel storage/processing without having to go down). All this means is that Greenplum has finally caught up to Aster Data along this dimension. I'd argue that since Aster Data also has a public cloud version and has customers using it there, they're actually farther along the EDC initiative than Greenplum is (Greenplum says that the public cloud availability is on its road map). If I wasn't trying to avoid talking too much about Vertica in this blog (due to a potential bias) I'd go in detail about their virtualized and cloud versions at this point, but I'll stop here.
(Note: I am not associated with Aster Data or Greenplum in any way)