Data is literally everywhere. There are more than 40 zettabytes of data in the world, which is equal to 40 trillion gigabytes of data. Data is the fuel of every industry, from healthcare to transportation.
However, we need to know how to manage, protect, and clean that vast amount of data in order to use it to its fullest.
How can you make the most of data? The first step, when it comes to handling data, is knowing its type and properties.
Depending on its value, data can be qualitative (categorical) data and quantitative (numerical) data.
We analyze each of them.
Qualitative data (categorical)
They describe an object or a group of elements They are known as categorical data because, as the name suggests, they can label a group of elements or data points in a specific category. Examples can be colors, plants and places
Qualitative data is then classified into 2 other subtypes: “ordinal” and “nominal”.
Ordinal data follows a specific order or rank, as in test scores, economic status, or military rank.
However, nominal data does not follow a specific order like ordinal data. Consider gender, city, employment status, colors, etc.
Quantitative data (numerical)
On the other hand, quantitative data deals with numerical values on which we can apply mathematical operations: height, fruits in a basket, children in a school.
Although they look similar, there is something else you need to keep in mind: Quantitative data can be continuous or discrete.
The difference is that we can break continuous data into smaller units and still make sense. However, this is not possible with discrete data, since dividing it into smaller units will give us unreasonable values.
For example, weight is continuous because we can measure it in kilograms, grams, and milligrams and still have a valid weight value. But can we apply the same concept to a discrete value, like children in a school? That would be more than unreasonable, since it is not possible to divide a child in half or into smaller units,
Types of data according to sensitivity
Data confidentiality is a controversial issue with many loose ends still to be tied up. However, the repercussions of neglecting them are so severe that if someone uses your personal data without your permission, you may face a class action lawsuit. Therefore, being able to classify data according to its sensitivity is a fundamental aspect of working as a data professional. So let’s briefly cover the 4 sensitivity levels:
Low data sensitivity
Low-sensitivity or public data is the kind of data that almost anyone can access and share without harming people or institutions. Examples include content from public websites, such as blogs and downloadable materials, directory information, and company information.
Medium data sensitivity
Data at this level is for internal use only. Minor harm can occur when disclosing medium-sensitivity data, such as donor data, emails, and personnel records.
High data sensitivity
This is confidential data and its disclosure for any reason can cause serious damage to both individuals and institutions. Highly sensitive data includes passwords, social security numbers, financial data, etc.
The reasons why understanding data types is crucial.
Knowing the exact format and size of your data helps save time and space.
Reduces the probability of errors in the cleaning and analysis stages.
It ensures that the functions you write later will give you the desired results.
And it helps instrumentation, which is the process of tracking data and sending it to other systems.
To instrument data correctly and create an effective monitoring plan, you must determine all data types in advance.