Most data today is user-generated and unstructured. But because it’s so much easier and cheaper to generate data today, increased pressure is on the rest of the big data lifecycle, including storage, analytics, and computation, which are having to cope with data that is growing at an accelerated pace. The business with petabytes of data is not unusual today. One petabyte (PB) is 1,000 terabytes (TB), or 1,000,000 gigabytes (GB).
The explosion in information being generated has created a gap between the amount of data that can be collected and the amount that can be effectively and efficiently processed.
So much data. So. Much. Data.
Who Has the Iron to Process All This Data?
Performing any kind of meaningful analytics on today’s massive amounts of data requires significant, powerful hardware for processing, and who can afford to install and maintain that much iron? Not everyone. That’s where the cloud comes in. The elasticity of the cloud – where processing power scales up or down according to the task – is what is making the cloud so appealing, particularly to smaller organizations, who can now rival big companies in terms of analyzing mass quantities of data. As big data keeps growing, so do the cloud resources for handling it.
Use Cases for Cloud Analytics
Three main use cases for cloud analytics are pre-packaged cloud decision-making systems, cloud-based solutions for defining and building predictive models, and cloud-based deployments for embedding predictive analytics.
• “Decisions as a service” are pre-packaged, cloud-hosted systems that are used to provide decision-making based on predictive analysis. Applications could include things like selecting marketing offers, detecting fraud, and making instant credit decisions.
• Cloud-based solutions for defining and building predictive analytic models take data from the cloud and / or on-premises solutions. They assign computing resources on an as-needed basis, supporting demanding analytic algorithms that most organizations don’t have the resources to provide on-premises.
• Cloud deployment for embedding predictive analytics can insert predictive analytics into existing systems without predictive analytics capability. An example application might be linking internally developed “propensity to buy” models across multiple customer-facing systems.
Example: Google Cloud Dataflow
“Somebody call Steve and tell him his data is being delivered.”
Google Cloud Dataflow lets companies use the cloud to create data pipelines where unstructured data goes in and analyzed data comes out. It can be used in either a batch or streaming mode. The goal of Google Cloud Dataflow is to allow companies to develop more powerful analytics that will let them monitor operations constantly. If a company were to push this task onto the IT department, simply cleaning up the data would take an enormous amount of resources. The cloud lets organizations use tools that have already been developed by behemoths like Google to get real-time or near-real-time analysis, potentially saving tens of millions of dollars in hardware and software costs.
Example: Amazon Kinesis
Amazon Kinesis, a fully managed service for processing streaming data on a large scale in real time, can collect and process hundreds of terabytes of data every hour from hundreds of thousands of sources. A company could, for example, write apps that process click-stream data in real time, or financial data or data from social media. Organizations use Amazon Kinesis to crunch data and generate alerts, create recommendations, or make real-time operational decisions. And since it’s in the cloud, they pay only for the resources they use. Amazon Web Services is completely based on a cloud model and currently has the broadest selection of products for cloud-based big data processing.
Big data analytics is one of the major drivers of rapid cloud growth. In the fourth quarter of 2013, revenues for the top 50 public cloud providers increased by 47%, and the overall cloud industry is expected to be worth $107 billion annually by 2017. The increase in cloud resources for big data analytics is great news for smaller businesses that otherwise wouldn’t have access to the computing resources necessary for deriving meaningful, actionable insight from big data. And it isn’t just major players like Google and Amazon offering these services, but an ever-increasing selection of companies that specifically want to put big data processing power into the hands of more organizations.