Big Data and advanced analytics are helping companies obtain greater insight from their business systems. The opportunities for leveraging data are endless. However, companies should be cautious in deciding if and how to set up a Big Data program. In previous posts, we explored “What is Big Data?” and how to get started. In this interview, Geotab asks technology author Phil Simon some key questions about starting up a Big Data project.
GEOTAB: What is your definition of Big Data?
PHIL SIMON: Put simply, the term Big Data represents the vast quantities of largely unstructured information streaming at us faster than ever. It’s comprised of tweets, emails, blog posts, sensor data, satellite images, and many more things.
Perhaps it’s best understood against its antithesis: structured data. Spreadsheets, lists of paychecks, sales, employees, and other transactions are neat and orderly that integrate easily with Microsoft Excel. Big Data is organized differently, as such, requires new tools to be stored, analyzed, and understood.
GEOTAB: How can an organization determine if Big Data and advanced analytics are right for them?
PHIL SIMON: There are no magic checklists, but I always start by asking simple question: Can you tell me about a time in which the organization made a key business decision based on data? If a Chief Experience Officer cannot provide a simple answer to that query, then it’s unlikely that the organization’s culture has embraced data-driven thinking. And make no mistake: success with Big Data is as much about people and culture as it is about technology and the number of petabytes of information that it stores. “Data” and “technology” don’t make decisions. People do.
Look at Amazon, Netflix, Facebook, and Google. These organizations excel at Big Data for many reasons—not the least of which is that they have embraced cultures of analytics. At these über–successful companies, data-based decisions are the rule, not the exception.
Suggested Reading: Converting Big Data into Relevant Fleet Management Information
GEOTAB: What resources are needed to work with Big Data?
PHIL SIMON: Employees can only do so much with Small Data tools such as relational databases, SQL statements, and traditional business-intelligence dashboards. As I write in The Visual Organization, employees need access to better data-visualization tools. They need to embrace a mind-set of data discovery and exploration. This means not having to submit report requests to the IT department ad nauseum.
Contrary to what many people believe, hiring a proper data scientist is no elixir. I advise hiring employees who enjoy learning new things and change their minds when presented with information that contradicts their preconceived notions. There’s a reason that a common Google interview question is, “Can you tell me about a time that you changed your mind?”
GEOTAB: What is your top tip for success with analytics and Big Data?
PHIL SIMON: First, realize that no organization goes from “zero to Google” overnight. It takes time to build an organization’s proficiency with Big Data. Google (a.k.a. Alphabet) can do things now that it couldn’t possibly do back in 1998. Second, there’s no one right way to “do” Big Data. In fact, there are different methodologies and philosophies of organizing a Big Data project. The project can be led by a specific business unit, by a team, or by an entirely independent group. Along these lines, you don’t have to do everything in-house. It’s increasingly common to “rent” data scientists via sites such as Kaggle.
GEOTAB: What is the biggest mistake that companies make when starting up a Big-Data project?
PHIL SIMON: They think of them as traditional IT “projects.” This is almost always a recipe for disaster. Rather, it’s best to think of Big Data is a marathon, not a sprint. An organization is never finished with Big Data. That’s one of the most common myths around Big Data: there is a finish line. There isn’t. Act accordingly.
10 Big Data/Advanced Analytics Terms You Should Know
While there’s much, much more to say about this, an algorithm is a self-contained step-by-step set of operations to be performed. For instance, Google uses algorithms extensively to rank page results and autocomplete user queries.
Cloud computing is data storage and processing over the internet, instead of locally on a computer hard drives and servers. The rise in cloud computing has been made possible by the increasing affordability of internet services, along with enhanced security and flexibility. No longer do employees need to be at their offices and computers to access their data and apps. Organizations no longer need to be in the data-storage business. They can outsource this aspect of their business. (For more on this, read the book: The Big Switch: Rewiring the World, from Edison to Google).
Organizations used to only store “data” in relational databases. At a high level, Hadoop is an open-source framework — or software platform — that allows for storing and analyzing vast quantities of data. This article describes when and when not to use Hadoop.
The Internet of Things (IoT)
Gartner defines the Internet of Things (IoT) as “the network of physical objects that contain embedded technology to communicate and sense or interact with their internal states or the external environment.”
No longer do computers and tablets exclusively generate data. In the near future, cars, refrigerators, wearable devices, and many other things will provide interesting insights.
Because there’s so much data today, humans can’t possibly understand and interpret it all. Through machine learning, a type of artificial intelligence, computers and algorithms can find patterns that enable better decision making and predictions. Examples of machine learning include the self-driving car, Netflix recommended selections system, and the Facebook news feed.
Put simply, this is data about data. If you make a take a photo with your camera, the photo itself is data. The time, data, location, and other details of that photo represent the metadata.
For decades, users have written Structured Query Language (SQL) statements to extract, update, and create data from structured and related tables.
While still enormously powerful, SQL doesn’t work nearly as well on large, messier, unstructured datasets. This is why NoSQL exists. Note that it stands for “not only SQL.”
With structured data, one can easily determine “averages” and “maximums” of sales, employee salaries, etc. The same cannot be said of product reviews, tweets, and other unstructured data (see below). Thanks to Natural Language Processing (NLP), software is starting to make sense out of words written and spoken by humans.
The Three V’s
Originally coined by Gartner’s Doug Laney, data today is streaming at us with increasing velocity (speed of data processing), variety (types of data), and volume (amount of data). Note that many software vendors and analysts have tried to piggyback on these with additional — and often superfluous —v’s.
Unlike its structured counterpart, unstructured data is messier. Think photos, videos, tweets, blog posts, etc. If you can analyze it in Excel, then the data is probably not unstructured.
Did we miss any key terms in Big Data? Leave a comment and let us know what you would include in the top ten.
About Phil Simon
Phil Simon is a recognized technology authority. He is the award-winning author of seven management books, most recently Message Not Received. He consults organizations on matters related to communications, strategy, data, and technology. His contributions have been featured on The Harvard Business Review, CNN, Wired, NBC, CNBC, Inc. Magazine, BusinessWeek, Quartz, The Huffington Post, The New York Times, Fox News, and many other sites.
What’s the Big Deal with Big Data?
“What the Internet of Things and Big Data Mean for Car Safety: An Interview with Neil Cawse,” by Phil Simon, Huffpost Business