Unit.1 | Big Data
Learning Unit | Big Data

Our Digital World and Big Data

Chapter 01/07

Close your eyes for a moment and think about everything you did this morning before you left home.

Maybe you sent someone a message, checked the weather, made a doctor’s appointment or paid a bill. Now try to imagine how people in your parents’ generation might have done those things when they were your age. Look any different?

Our Digital World

It’s easy to take for granted how digitized our lives have become. But our computers, laptops, tablets and smartphones are more than just convenient tools for streaming movies and sending text messages. They are the products of a technological revolution that has transformed the way we live, work, learn and play. Just one generation ago, the gadgets and devices that populate the homes of today’s average middle-class family existed only in the imaginations of people who grew up watching The Jetsons. But today’s space-age home is not floating in outer space—it’s the house next door.

Two people look out the window of a base on another planet.

Andrei Sokolov, mid-1960s

Where Human Connection Meets Data

Our digital devices connect us to billions of people around the world and give us access to millions of terabytes of data. At the heart of our digital lives is the internet, which now connects more than half the people on the planet and nearly everyone in North America. view citation[1] The widespread adoption of digital sensors in a vast array of devices installed in homes, businesses and public infrastructure has more recently created a network of interconnected touchpoints known as the internet of things (IoT).

Our digital devices connect us to billions of people around the world and give us access to millions of terabytes of data.

Every minute, we send 13 million text messages, conduct 3.8 million Google searches, send 159 million email messages, make 176,000 Skype calls and receive 103 million spam emails, not to mention our social media activity. view citation[2] We also generate enormous amounts of data just by living our everyday lives: purchasing consumer products by scanning barcodes, using the location sensors in our smart devices to find our way, being registered by geospatial information systems on roads and using electricity measured by digital meters built into the power grid. view citation[3]

All told, we generate 2.5 quintillion bytes of data every day.

All told, we generate 2.5 quintillion bytes of data every day. view citation[4] We are creating so much data so quickly that 90 percent of all the data in existence was created in just the last two years. view citation[5] The challenge of storing, distributing and analyzing this data is so immense that it has spawned a new field of study and business sector all its own: big data.

A group of people collaborating on a project using ipads and laptops.

The Challenge and Opportunity of Big Data

The mass of data associated with the average consumer is so large and exhaustive that it may someday be possible to piece together a complete timeline of your entire life based solely on your accumulated data. Forget about writing that memoir!

Defining Big Data

Big data is the term that describes this flood of digital data. Specifically, big data refers to data sets that are so large, so complex and growing so rapidly that traditional data-processing applications like spreadsheets cannot make sense of them.

Getting a handle on big data is comparable to drinking from a firehose—if the firehose never turned off, no spills were allowed, you had to tell a detailed story about each drop of water and instead of just one firehose there were four billion of them.

Visual: Big Data’s Scale

Loading...

In the last minute...

  • search bar.

    Google searches have been conducted.

  • envelope icons.

    Spam emails have been sent.

  • two cell phones.

    Text messages have been sent.

Big Data and the Five Vs

Researchers who specialize in working with big data—in a field called big data analytics—summarize the challenges of big data as the five Vs: volume, velocity, variety, veracity and value. view citation[6]

By 2021 the internet could carry a million minutes of video content every second of the day.

It’s easy to see why the volume of digital data we generate is a problem. Researchers at Cisco estimate that by 2021 the internet will carry 1 million minutes of video content every second of the day. view citation[7] At that rate, it would take you 5 million years to watch all the video crossing global networks in a single month. Just storing all of this data is a huge challenge.

Tied to the problem of volume is velocity. To take our previous example, if the internet carried 1 million minutes of video in a month, that amount of data would be easier to analyze; but with a million minutes pouring through the internet every single second, data analysis becomes a much more difficult problem to solve.

Find Careers in Analytics

Your Next Move: Business Analyst

The variety of digital data creates another set of challenges. Data falls into two categories: structured and unstructured. Structured data, like a baseball team’s stats for the season or the contacts list on your smartphone, fits neatly into rows and columns. It’s easy to analyze, allowing you to identify the best batter on the team or sort your friends alphabetically.

But unstructured data—such as website content, image files, audio files and videos—is enormously complex. Even if you could store every video uploaded to the internet in the past month, how would you select the best ones to watch or separate the comedies from the dramas? Your security camera might capture a high-resolution image of the person who stole a package off your porch, but it can’t tell you that person’s name. This is the challenge posed by the variety inherent in unstructured data.

Data is knowledge, and there is virtually no limit to what we can learn from big data.

Veracity refers to the difficulty of verifying the accuracy or quality of the data we collect. Data may be incomplete or imprecise, or it may come from an unreliable source. When an election prediction is wrong or a new product launches and fails, bad data may be to blame. The more important but less obvious cost of incomplete or inaccurate data is missed opportunity.

We can analyze data to extract useful information from it, but most of the digital data we generate every day is either redundant or meaningless. That’s why determining the value of digital data is one of the most crucial aspects of big data analytics. Collecting, storing and analyzing data requires time, money, resources and skill. For the companies and public agencies that invest in big data analytics, solving these challenges is only worthwhile if it gives them valuable information.

References

  1. “World Internet Usage and Population Statistics.” Internet World Stats. View Source

  2. “Data Never Sleeps 6.0.” Domo. June 2018. View Source

  3. “Data ex Machina: Introduction to Big Data.” Annual Review of Sociology. July 2017. View Source

  4. “Ethics & Big Data.” Technology in Society. May 2017. View Source

  5. “Ethics & Big Data.” Technology in Society. May 2017. View Source

  6. “The V’s of Big Data: Velocity, Volume, Value, Variety, and Veracity.” XSi. March 2014. View Source

  7. “Cisco Visual Networking Index: Forecast and Trends, 2017–2022 White Paper.” Cisco. February 2019. View Source

Next Section

History of Big Data

Chapter 02 of 07

Discover key events and milestones that helped shape big data.