So What is ‘Big Data?’

11_3

11_3

Big Data, like most things that you read about on the internet or hear about on TV, can be thought of as both a nebulous buzzword, and a real, functional concept with a definition. The buzzword, also like most, comes with grand expectations and proportional misunderstanding, while the reality is utilitarian, and somewhat less exotic.

So really, what is Big Data, and what should this term mean to a business? In a technical sense Big Data generally refers to data sets that are unfit for traditional relational databases, due to a combination of excessive size (in the terabyte or petabyte range) or format that doesn’t fit the classic table structure (like JSON data, raw unstructured text, etc.). Essentially, for those reasons Big Data is differentiated by the fact that it can’t be stored, processed, or manipulated via usual means.

In more common conversation though, Big Data refers to the fact that companies are dealing with a rapid explosion of information that is being collected, driven by recent technological innovation. As connected digital devices have become more common and people increasingly live and work online (and in-app), the amount of data that we have at our fingertips has grown exponentially. With numbers being thrown around like ‘90% of the world’s data has been created in the last two years’ or ‘Every day, we create 2.5 quintillion bytes of data,’ it’s not hard to see how this subject can quickly become overwhelming, but don’t worry. The reality is that most companies won’t need to scale that much, that fast, so the journey from ‘small’ data to Big Data will be gradual, and you are likely already on your way.

What’s Your Point?

All of that generalization and background is fine, but how is this all relevant to a small- to medium-sized business in the real world? Think of it this way: Your business has always kept some information on its customers, things like:

  • Contact information (name, address, phone number)
  • Billing information (credit card number, payment type preference)
  • Transaction information (what was purchased, and when, and for how much)

Way back when, maybe it was all kept in a ledger, by a person who still knew how to write in cursive and sharpen a quill pen. Then it made its way into paper files and a rolodex. Eventually it grew into spreadsheets, then MS Access, then a full on database. So don’t think of this as a new world, just the same continuous evolution that has been going on forever.

It’s just that now, you have a website, probably Google Analytics, maybe an app, a CRM, a Facebook page, and a digital product. Every time that a person interacts with any part of your business, on any platform, from anywhere in the world, scripts start running, pixels start firing, and servers hum like angels. All of these events are generating data, on your customers and your business. Whether or not your data meets some arbitrary threshold to be considered BIG DATA is beside the point, what matters is that there has recently become a lot more of it available, and it would be best to do something with it.

Like What, and How?

We’ve already discussed all kinds of ways that your data might be collected, but no amount of information will do your business any good if it simply blinks out of existence, or ends up in a place that is prohibitively hard to get to. If you want to ride the wave of Big Data and get value out of it, you need to think about three main things:

  • Collection/Storage
  • Cleaning/Processing
  • Querying/Analyzing

The good news is, while technology has gotten us into this situation, it also offers solutions. In the past, we had to pick and choose which data to keep because there were significant limitations on storage in terms of cost. That’s why the biggest advance in the Big Data ecosystem over the last few years has likely been the rise of enterprise cloud computing options.

This is the big point of this entire piece, and if you take nothing else away, let it be that every company now has the ability to leverage data cheaply and efficiently, thanks to the cloud. A few of the biggest full-service cloud platforms available today are:

Services like these enable any company to get access to virtually limitless cheap storage capacity without maintaining their own hardware, and they all allow you you configure and deploy Hadoop clusters (or Spark), or build containers and microservices on top of your data using technologies like Docker.

While there will still be a need for data scientists to work with the data that a company collects, they will rely on the IT professionals who will be called upon to build and maintain the increasingly important data pipelines and warehouses, plus the DevOps automation that will connect them all. But with everyone in a hurry to unlock the profit potential of their data, only companies with highly trained IT teams will have the key to turn Big Data from a buzzword into a reality.

Your business has always kept some information on its customers, things like contact information, billing information, and transaction information. Way back when, maybe it was all kept in a ledger. Then it made its way into paper files and a rolodex. Eventually it grew into spreadsheets, then MS Access, then a full on database. So don’t think of this as a new world, just the same continuous evolution that has been going on forever.

So don’t wait, start learning about some of the most in-demand skills in the IT field today!

Not a CBT Nuggets subscriber? Start your free week today.