Understanding data for non-tech

Mathilde Lebegue-Carniato
5 min readNov 19, 2019

Internet, social networks, connected devices have changed not only lifestyles but also everything we now know about each other. The information that can be collected is all over the place, sometimes even not knowing what to do with it. The volumes of information circulating at each instant are so large that it is often difficult to quantify them.

According to Data Never Sleep, every second 60,000 Google searches are performed, 18,000 Youtube videos are viewed, 7,000 tweets are sent, $4,000 worth of recipes recorded on Amazon [1]… This information is a source of considerable wealth, and an undeniable marketing asset for those who are aware of it and who benefit from it.

Platforms that push the boundaries of research, by providing such untapped amounts of information, make it possible to associate the depth of quality with the volume and statistical relevance of quantity; these are the players who stand out from the mass.

What is data?

Data is how we call all the “information” we mentioned above that circulates on the telephone or Internet networks.

Big Data is a concept popularized by John Mashey, a computer scientist at Silicon Graphics in the 1990s [2], which refers to the total amount of data that has become so large that it exceeds the intuition, human analytical capabilities and even those of some traditional computer tools for database or information management.

This Big Data expression allows us to measure the entire amount of data generated by the Internet, social networks and also all connected devices, whether they are watches, refrigerators or vacuum cleaners.

According to Peter Gentsch[3], the concept of data collection for marketing purposes is not new, and already several decades earlier, many tools were available for data generation (points of sale, credit cards…). What has changed with the emergence of Big Data, however, is the amount of data, the speed of circulation and the sources of data generation.

The current sources of data creation:

  • Internet of Things (IoT, This refers to all the objects of our daily lives with an Internet connection).
  • Online searches and behaviors of Internet users.
  • Data collected by sensors (GPS, heartbeats…), which are generally transmitted to another system.
  • As well as the data from digital devices (mobile, tablet, computer, game console…).

Depending on the source of emission or collection, the form of the data may vary. These can be numbers, text, images or videos.

The 5 characteristics of Big data

In 2001, Douglas Laney analyzed the concept of Big Data by establishing three dimensions, then extended to five called the 5V:

  • Volume: which corresponds to the amount of data to analyze, study and save. This is one of the most important challenges for companies that use data. When the volume becomes too large and we start talking about Big Data, it means for the company that it is necessary to invest in advanced technologies capable of managing such a large quantity. IBM expects that by 2020, 40 Zettabytes of data (43 billion Gigabytes) will have been created, 300 times more than in 2005[4].
  • Velocity: this dimension refers to the speed at which the data is produced, but also to the power of the systems that must absorb and analyze this quantity.
  • Variety: as stated above, the data may be of different kinds. This, therefore, requires systems that until now have only processed structured databases to process text or image data that is considered as unstructured or semi-structured data.
  • Veracity: This is about the quality and reliability of the data. Not all of them are reliable and interesting to analyze (stolen data, fake data…). This is one of the biggest challenges, as although the previous dimensions require computing power or powerful systems, it remains more complex as the degree of reliability of the data is difficult to estimate.
  • Value: This is the potential value of the data and its analysis. All big data projects had only 100 million euros in revenue in 2009, these revenues have increased considerably since 2012, generating nearly 42 billion euros in global market revenue in 2018, and some predictions estimate these revenues at more than 100 billion by 2030[5].

Useful typology of data in marketing

Data also has some features that have revolutionized research and marketing studies:

  • Active data or social data also called “User Generated Content”, refers to all data actively generated by users. For example, posting a comment on a social network is an active data. This type of data is powerful because it expresses the true feelings of users and gives an accurate picture of what matters to them.
  • Passive data is, on the contrary, generated without the user being aware of it. Most of the time, the user doesn’t know that he or she is at the origin of the data creation such as the number of connections to a platform for example. (These data are often used as KPIs, they are called passive KPIs, but you can also consider active data with active KPIs. Think about it for your next campaign!)
  • Longitudinal data refers to the collection of data over time. This makes it possible to analyze for a user (or any other repository) the evolution over time of the data studied. Data is also accessible in real-time, allowing companies to react and make decisions more quickly.

Classification of data by source

The data can be classified into three categories according to their origin:

  • First-party data: This corresponds to the data that belong to the company. They may concern products, customers and marketing operations.
  • Second-party data: corresponds to data generated by the activity such as those collected during marketing campaigns.
  • Third-Party data: is data that does not belong to the company and has not been collected by the company but comes from a third party who may have access to information that is surely complementary to the first two categories.

In the next article, we will see how big data has allowed the emergence of a new branch of marketing. Stay Tuned!

Bibliography:

[1] Data Never Sleep

[2] [5] Thomas Bouran,.Les 5V du big data. Regards croisés sur l’économie 2018/2 (n° 23), pages 27 à 31 LINK

[3] Peter Gentsch. AI in Marketing, Sales, and Service: How Marketers without a Data Science Degree can use AI, Big Data and Bots. 2018.

[4] IBM

--

--

Mathilde Lebegue-Carniato

Product manager & maker, I love to explore, test and share !