article

Data classification guide: What is data classification?

Improvements in data classification capabilities have resulted in an expansion of use cases; it is used not just for organizing and making information accessible but to support users in comparing and analyzing data. Data classification is also part of security initiatives; for instance, it can help protect sensitive information by enabling controls that direct the appropriate security responses based on the type of data being retrieved, transmitted, or copied to prevent unauthorized access.

What is data classification?

Data classification is the process of separating, organizing, and tagging data into relevant groups or classes.

The objective of data classification is to make information easier to locate, access, sort, store, and protect for future use.

Data classification is critical for risk management, compliance, and data security, as it helps sort information based on the level of sensitivity, the risks it presents, handling requirements, and access limitations.

The type of data dictates its data classification. While any number of categories can be used for data classification, the following are the most commonly used. Most organizations follow these standards to ensure consistency and avoid complexity and confusion. It should be noted that several of these categories are sometimes bundled under the umbrella category of sensitive information.

Confidential or restricted
Confidential data, also referred to as restricted data, may only be accessed by limited individuals or groups. Access to confidential information usually requires special authorization or clearance and requires data protection (e.g., encryption). Examples of confidential data include:
Internal
Internal data is information related to a specific organization and is meant for the exclusive use of individuals associated with the organizations (e.g., employees or contractors). Access to internal data generally has relatively low-security protections. Examples of internal data are:
Private
Private data is primarily personal information. Not all private data is protected by law, but it usually has basic protections, such as passwords or biometric access restrictions. Private data protected by law is personally identifiable information (PII). Examples of private data include:
Proprietary
Proprietary data is confidential or restricted data associated with a specific organization. In most cases, proprietary data gives the organization a competitive edge or unique differentiation. It requires data protections in line with those for confidential or restricted data. Examples of proprietary data are:
Public
Public data is information that is in the public domain. This type of data can be used and distributed without restrictions on its use (i.e., read, research, review, and store) and does not require data protection. Examples of public data include:
Biometric identifiers (e.g., fingerprints or voice prints)
Certification or license numbers
Credit card numbers and expiration dates
Debit card personal identification numbers
Employee records
Financial records
Insurance provider information
Medical and health records (i.e., protected health information or PHI)
Social Security Numbers
State-issued identification card numbers or driver’s license numbers
Student records
Tax information
Vehicle identification numbers (VINs)
Archived files
Corporate guidelines
Email and messenger platforms
Employee manuals
Internal email messages or memos
Internet protocol (IP) addresses
Cellphone content
Emails
Employee identification numbers
Online browsing history
Personal contact information (e.g., email addresses, home addresses, and phone numbers)
Research data
Student identification numbers
Trade secrets (e.g., formulas, models, and processes)
Budget spreadsheets
Business plans
Revenue projections
Technical specifications of a new product
Birth and death records
Company executive information
Court records
First and last names
Incorporation dates
License plate numbers
Licensing records
Press releases

There are three main types of data classification according to industry standards—content-based, context-based, and user-based. The use cases and types of data drive selection of the best approach.

Content-based
With content-based data classification, software is used to inspect and identify the content of files. A category is assigned based on the type of content in a file, such as confidential, internal, private, proprietary, public, restricted, or sensitive.
Context-based
Context-based data classification uses software to review several factors related to the information, such as application, location, and creator. These variables are evaluated to find indirect indicators of what category the information falls into, such as proprietary or restricted.
User-based
Information is assessed and categorized manually based on the judgment of a knowledgeable user. This type of data classification is often initiated by the creator of a document and sometimes reviewed before the document is released.

Organizations should develop and maintain data classification policies, procedures, and guidelines that define categories and criteria.

Policies should also detail the roles and responsibilities of employees with regard to classifying and handling information, such as sharing and storage.

Why the enterprise needs data classification

There are many reasons why the enterprise needs data classification, including the following.

Access to additional data

When implemented systematically, data classification helps organizations manipulate, track, and analyze all the data needed for their strategies, goals, and objectives.

Assurance of confidentiality, availability, and integrity

The CIA triad is a guiding principle for most data security programs. Data classification facilitates this by making it easy to understand what types of information an organization has and ensuring that it meets CIA triad requirements.

Enhanced data security and privacy

Data classification is foundational for effective data privacy and security. It gives organizations visibility into the types of data they have and allows them to quickly sort it and apply the appropriate access controls to meet internal security and external compliance requirements.

Benefits of data classification

Ensures compliance with regulatory requirements
Expedites analysis and discovery of insights
Facilitates data governance
Helps organizations understand:
Improves data security and privacy
Increases efficacy of access management and control
Minimizes duplications of data
Mitigates risk
Reduces data management costs
Supports cyber resilience
What sensitive data they have
Where sensitive data resides
Who can access, modify, and delete sensitive data
The impact of the sensitive data being leaked, destroyed, or improperly modified

Data classification challenges

Understanding the challenges of data classification helps overcome them and realize the benefits. The most commonly cited challenges of data classification include the following.

Cost control

Data classification is notoriously difficult when it comes to budgeting. Increasing data volume, changing security policies, and inconsistent management requirements driven by types of classifications can vary widely, with costs spiraling quickly.

Data volume

While most data classification systems can handle large volumes of data, issues still arise. Although the data can be classified, it can be costly to store and manage – especially sensitive information, which requires enhanced data protection.

Incorrect data classification

Technologies used for data classification automation can mislabel data, fail to recognize duplicate data, or lack the information needed to correctly classify information that is in unrecognized file formats.

Missing association

Data classification tools can fail to detect indirect associations that change the classification level for a file. For instance, a name and file with medical study data may not be sensitive, but when combined, they become protected health information, which is considered sensitive data.

Data classification and the data lifecycle

Data lifecycle management processes control information from creation to destruction. Embedding data classification into the data lifecycle enhances visibility into information types to enable proper handling at every stage, to ensure that requirements for data security, privacy, and compliance are met.

Data classification begins with creation and should continue to be a consideration as data moves through the lifecycle with ongoing evaluations of and adjustments to the classification level.

Data classification naturally fits into each of the six stages of the data lifecycle.

Creation
Data is continuously generated in multiple formats, such as documents, emails, social media, and websites. It should be classified when it is saved.
Use
People and systems use data, usually with access controlled based on a correlation of roles, authorizations, and classification levels.
Storage
Data is stored with access controls and encryption employed according to data classification levels.
Sharing
Rules for sharing data between employees, customers, partners, systems, and applications should be governed according to data classification.
Archiving
The type and protections required for data archives should be based on the type of data classification.
Destruction
At some point, most data, regardless of classification, should be destroyed. The destruction schedule should take the data classification level into account.

Data classification and data discovery

Data discovery locates information that is often in far-flung silos; data classification then identifies it and tags it according to its associated category. Combining data discovery and data classification gives organizations the visibility needed to operationalize and protect information effectively.

Data classification and discovery apply to all information in the three data types:

Structured data
Structured data is text-based information (e.g., names, addresses, order details, or medical records) that is collected in predefined data models, such as rows and columns, and stored in systems, such as relational databases or data warehouses.
Unstructured data
With unstructured data, there is no defined data model for the information (e.g., email messages, videos, or transcripts), that is stored in applications, data warehouses, and data lakes.
Semi-structured data
Semi-structured data is loosely organized and tagged (e.g., server logs and messages organized in files or with hashtags) and is usually stored in applications or relational databases.

Use cases for data classification and discovery include:

Audits
During an audit, organizations can be required to produce many types of information. Data classification ensures that information is quickly and easily accessible. Data discovery helps users find the specific information that is needed.
Cloud migrations
When transferring data from on-premises to the cloud, data discovery and classification ensure that all data types are moved to the right type of storage and made accessible to authorized users (i.e., machines and people).
Data Subject Access Requests (DSARs)
DSARs are a requirement under the European Union’s General Data Protection Regulation (GDPR). An individual can submit a DSAR to a company that requires the organization to disclose what personal data they have collected, how that data is used, how it is intended to be used, and why it was collected.

Similar requests can be made according to data privacy laws in the United States and other countries. Data discovery and data collection are vital for responding to DSARs in a timely manner.
Mergers and acquisitions
Data classification and discovery play critical roles when integrating data from two or more organizations. These processes help ensure data protection and minimize duplications.

Organizations realize countless benefits when using data classification and discovery, including:

Collecting data from databases and silos and consolidating it into a single source
Controlling data ingress and egress through networks, applications, systems, and devices
Detecting misuse of all data
Ensuring data access controls are applied correctly
Faster identification of data protection gaps
Improving data analysis and resulting insights
Increasing visibility into data across the organization
Supporting compliance
Understanding the what, where, and why of data

How data classification works

The main steps in the data classification process are:

Identify and gather the data.
Define classification levels (e.g., sensitive, confidential / restricted, private, proprietary, and public).
Categorize the data according to classification, measuring the sensitivity of information according to three key criteria at three levels of severity for implications of unauthorized access (i.e., low, moderate, high):
Apply security controls and monitoring commensurate with the data classification level assigned to the information.
Implement processes for ongoing data classification reviews and updates to ensure accuracy and relevance, making changes as needed.

Confidentiality
Integrity
Availability

Data classification optimizes ROI and results

An oft-heard gripe about data classification is that it is difficult, but this is one of the easiest challenges to overcome. Data classification is difficult when organizations try to handle it manually.

However, with software, data classification is largely automated, with policies seamlessly embedded into user workflows. In addition, sensitive data “hidden” in silos can be automatically detected and appropriately classified.

Organizations that embrace data classification see a rapid return on their investment with time savings, increased productivity, and optimized security. Implementing and using tools and following best practices allows organizations to take full advantage of data classification, improve access to valuable information, and uplevel data protection.

In addition, data classification ensures that organizations meet stringent and difficult-to-achieve data protection requirements set forth by an increasing number of laws, regulations, and standards.

Date: December 27, 2023Reading time: 12 minutes

Productivity

Get started

See what SailPoint identity security can do for your organization

Discover how our solutions enable modern enterprises today to meet the challenge of ensuring secure access to resources without compromising productivity or innovation.

Request a demo Contact us