Qubole offers Big Data killer App As A Service for Google Cloud Platform

I think it is safe to say that these days, very few people are ignoring cloud computing, bar 2013’s privacy concerns. One of my favorite 2014 predictions for cloud computing is that the cloud will allow everyone to become a media company. This has been true for a while, but lets see if that statement attains true significance next year. The cloud, nevertheless is gaining popularity among small firms and so is Hadoop. The combination of the two actually represents greater accessibility for Big Data analysis for startups and small firms that cant afford to build their own Hadoop infrastructure from scratch. Hadoop was created to solve the problem of processing an avalanche of Big Data.

Hadoop has since moved far beyond its humble beginnings in web indexing and is now used in many niche industries for a wide variety of tasks that all share the common theme of of variety, volume and velocity of data. All of which are both structured and unstructured. Cloud computing combined with Hadoop allows the little guy to use big data without having to purchase and manage the hardware themselves. Google actually has a solution for Hadoop lovers. Google’s cloud solution involves the Google Compute Engine (GCE) running MapR Distribution. Google’s Compute Engine cloud has been building a reputation as the platform of choice to run large scale data workloads.


The proposition has been further strengthened by startup Qubole’s announcement that its fully elastic Hadoop engine, is now available on Google Compute Engine. Qubole is a Big Data/Hadoop as a Service provider. Founders Ashish Thusoo and Joydeep Sen Sarma, creators of Facebook’s Big Data infrastructure and Apache Hive, started Qubole with the aim to making it dead easy to prepare, integrate, and explore Big Data in the cloud.  Qubole will offer a beta release of Qubole Data Service (QDS) on GCE. QDS is the first fully elastic Hadoop engine to run on GCE. Qubole is extending Cloud platform support for GCE to a select number of invited customers through its ‘early adoption program’ with general availability expected in January 2014.

According to a Gigaom assessment: “Qubole has seen 2-3x faster startup times for virtual servers using Compute Engine over Amazon EC2 and more reliable performance from Google Cloud Storage than from Amazon S3. We’ll also assume that AWS is the “CloudX” against which Qubole engineer Praveen Seluka benchmarked Compute Engine, some results of which he shared on the Google Cloud Platform blog. Qubole did launch as an AWS-based service though and it seems likely many, if not most, users will still choose to run jobs there if only because they already have data stored in S3″.

If you are planning on exploring this option, the immediate benefits are Super-fast loading of GCE nodes and Auto-scaling to automatically add or remove GCE compute resources based on actual usage. There is also piece of mind knowing that dozens of connectors are provided to move data to and from GCE. It really does look like the combination of GCE and QDS facilitates the most powerful and affordable Hadoop clusters running in the cloud at the moment. GCE’s amazingly fast virtual machine spin up, consistence performance for virtual machines and storage and by-the-minute billing, along with QDS’ rapid provisioning and efficient resource utilization deliver low-cost Big Data processing, and this is factor IT managers and those planning new startup ventures will find hard to ignore. It also appears to be part of Googles’s strategy to make its cloud platform more accessible for third party integration.

“To help customers get the most out of our Cloud platform products, we work closely with technology companies such as Qubole who provide powerful complementary solutions integrated with our platform,” said Global Partner Lead, Google Compute Engine Allan Naim.