DLP - RiskInsight

Migrate your work environment to Office 365 with confidence

GEneviEveLardon — Tue, 21 Jul 2020 17:14:42 +0000

Recent events have shown us that teleworking is no longer a luxury for employees, but a real necessity to ensure the continuity of organisations’ activities.

For those who have not yet taken the plunge (mainly ETIs and the public sector), it is essential to start thinking about Cloud collaboration and communication platforms as soon as possible. This, in order to be able to ensure continuity of service in case of force majeure (cyber attack, natural disaster or even pandemic), or even to envisage a more consequent migration.

For this Digital Workplace platform, a close collaboration between the security team and the workplace will be a prerequisite!

In this article, I will share with you some feedbacks on the deployment of Office 365, Microsoft’s solution that is becoming increasingly popular with the companies we support.

There is a lot of interesting documentation on the subject on the Internet (“Top 10 best practices” or “3 good reasons to connect the xxx application to ensure your security”). Microsoft summarizes some of these good practices in these two articles:

Today, I am not going to repeat here a non-exhaustive list of these good practices, but rather to remind you of six points of attention when opening such a service.

1st point: Building the safety standard, a pillar of the future relationship between the safety and workplace teams.

As with any project of this type, the first step is to assess the potential of the service and see how it can meet the initial need, through the development of a business case. The possibilities offered by Office 365 are numerous: office automation, instant messaging or email, data visualization, development of applications without code, etc.

As far as cybersecurity teams are concerned, there are two choices: to oppose this migration because of the risks linked to the American Cloud or to support the reflection to create new secure uses.

In the vast majority of cases, the second choice is preferred. A tripartite relationship then begins, between the workplace teams, security and architects, with the aim of building a service for the users. A result of this step could be the development of a security standard, resulting from a risk analysis, defining the services used and with the associated configuration.

Among the issues to be addressed are generally the following three themes:

What uses should be offered to people in a situation of mobility? With what authentication?
What new services to offer with the possibilities of integration with APIs?
How to share documents with external users?

The current trend is to provide answers with a “Zero Trust” approach. Any deviation from the defined safety standard will have to be detected, thanks to the implementation of dashboards and supervision. The adage “Trust does not exclude control” has never made more sense.

This reflection may even be an opportunity to ask fundamental questions in order to lay a coherent foundation for the working environment. For example, why leave email, a 30-year-old system, open to everything and externally block my Teams and SharePoint shares? Improving the user experience can only be achieved by standardizing security practices.

2nd point: Data protection, a subject with the wind in its sails

Parallel to the construction of the service, comes the subject of the data that will be used in the tenant. For this, two simple questions must find answers (often complex).

How do I protect my data?

Today, unstructured data protection strategies are based on a common basis: the linking of data to a level of sensitivity. This correspondence leads to protection measures to be put in place:

– Encryption with keys controlled by the CSP or the organisation;
– Restriction of rights (or DRM);
– Conditional access with multi-factor authentication;
– Data Leakage Protection (or DLP).

In order not to over-protect data and thus avoid undermining the user experience, encryption and rights restriction can be reserved for the most critical data. Other data will still remain under control using more traditional measures, such as end-to-end encryption and exposure control.

A key factor for such a project will be to turn it into a real business project, with a comprehensive awareness programme dedicated to classification.

How to remain compliant with the regulations?

An organisation may be subject to local, implementation-related and sector-specific regulations, depending on its activities.

These regulations and directives in some cases impose real obstacles that need to be removed at the outset of the project: data retention, legal archiving, geolocation, judicial investigation, requests related to personal data.

Let’s take a concrete example: Russia. With the law on personal data of 2015, the national regulatory authority imposes the obligation to keep the source (called primary database) of its citizens’ data on Russian soil. In practice, this means that the Active Directory (primary base of corporate identities) of the Russian entity must remain Russian. From there, the information can be synchronized with the GAL (Global Access List) and Azure Active Directory.

The thorny issue of stock management

What to do with the data already existing? This is a complex issue, especially if the opening of a Cloud collaboration solution is linked to the decommissioning of existing file servers.

First of all, there is a technical question. Will the company’s network be able to support massive migrations of .pst and documents? In particular, it will not necessarily be useful to migrate data that does not comply with the retention policy.

Secondly, historical data may have heterogeneous levels of sensitivity and be subject to various regulations. A trade-off will be necessary to arbitrate between local data retention, risk acceptance and a broad classification project before or after migration.

3rd point: The Target Operating Model, guaranteeing the preservation of security over time

The operational model of a service such as Office 365 defines the responsibilities of the players (administrators, support staff, etc.) and the principles of object management. It is complementary to the security standard mentioned above, providing an operational vision.

The TOM must be drawn up prior to the opening of the service and updated regularly. It must include at least the following subjects.

A model of administration

Microsoft offers by default about 50 administration roles, not counting the RBAC roles of services (e.g. Exchange and Intune). A relevant use of these roles and custom roles will help to avoid having too many General Administrators and to follow the principle of least privilege. The implementation of Just-in-Time access will moreover make it possible to monitor the actual use of roles, while reinforcing security.

A semi-architectural / semi-security community

Like any SaaS platform, Microsoft regularly upgrades the functionalities of its collaborative suite. The mission of this community will be to monitor trends, in order to master new uses and keep control of the tenant considering the evolutions.

The life cycle of shared identities and spaces

If shared spaces (Teams, SharePoint) are not managed freely, this can lead to an explosion in the number of spaces that do not comply with the security standard. The reports of the editors of Data Discovery solutions are quite striking. To avoid this, it is necessary to establish a life cycle for shared spaces. These rules can include a naming convention, retention policies, a lifespan, principles for rights management.

The establishment of a single portal for the creation of these spaces will make it possible to implement these good practices, while promoting the user experience.

Similarly, a life cycle for Azure AD objects (including guest users, security groups, Office 365 groups and applications) must be defined and equipped. Here are two examples that deserve to be addressed: the delegation of APIs is left open and leaves the door open to massive data leaks; users invited to collaborate are never deleted. For this, two strategies are possible:

The establishment of a single portal for the creation of these spaces will make it possible to implement these good practices, while promoting the user experience.

#1 – Creation of a Custom Automation Engine decorrelated from the IAM, via an in-house application developed in PowerShell ;
#2 – Integration of a Powershell / Graph API connector to the IAM solution in place in order to present a complete management of the objects, disregarding their direct hosting.

4th point: take a fresh look at the subject of user identity

Indeed, the subject of identity is a pillar of SaaS! So, take the time to consider all the possibilities and risks of SaaS Identity Providers (or IdPs). In particular, it is unthinkable in 2020 to consider Azure Active Directory as a simple Domain Controller in the Cloud.

Three approaches are possible for the source of identities accessing Office 365.

The dissociation of identities, a quick-win but complicated from a user’s point of view

It is possible to dissociate the local and Cloud identities if the local DA is no longer available or to decorate the Cloud workspace from the historical IS. This scenario is obviously not in favour of an optimal experience, but may be a valuable asset in the event of a crisis.

The use of local identity in the Cloud, a classic strategy

In order to reconcile security and user experience, it is necessary to use the same identity between the legacy applications and this new service. For this, three technical scenarios are available:

Identity Federation : This historic solution is widely used by large French companies that are reluctant to host passwords in the Cloud and wish to have SSO;
Password Hash Sync (PHS): This solution, recommended by Microsoft and the British equivalent of ANSSI, is implemented by the vast majority of Microsoft customers. This solution can also be used as a back-up when the federation service is no longer available;
Direct Authentication (Password Through Authentication or PTA): This solution provides the best user experience but has the disadvantage of passing the password through Azure AD.

Migrating one’s identity repository to the Cloud, a longer-term vision

Before or after migration, it may be appropriate to consider fully migrating the source of identities into the Cloud (whether Azure AD or a third party solution), in order to take advantage of the new possibilities. There are still several prerequisites that need to be lifted, such as printer, GPO and terminal management.

5th point: Gradually open up services to encourage controlled adoption

It is always easier to open a new service than to go back for safety reasons. Massively opening the different services of the collaborative suite has the advantage of offering a maximum number of uses cases but can cause several side effects.

First of all, services that are not officially supported and left in the hands of users for testing purposes represent a definite risk. They need to be configured and hardened. In some cases, it may even be preferable to disable the corresponding licenses.

Secondly, a controlled launch of the tools will help control costs during the first months or years of the transition. As Microsoft licences represent a certain load, it is possible to optimize unused licences.

Change management is also a key aspect to consider; to promote the user experience, of course, but also to promote data security. It is essential to have a clearly defined roadmap and user journey. Accompanied adoption will lay the foundations for proper governance of shared spaces and data (both in terms of exposure and protection).

It will be useful to consider creating a community of evangelists and users in order to maintain momentum in the adoption of the new functionalities brought by Microsoft. A uservoice system could be an asset; the ideal would be to listen to the needs of users and prioritise future openings.

6th and last point: Licences, the lifeblood of Office 365 and its security

SaaS solutions are generally subject to a monthly invoiced licensing model. The choice of Microsoft 365 licences must be the result of a global reflection. It cannot remain the prerogative of workplace teams and be determined solely by the need for collaboration and communication.

Indeed, the choice of licensing level will condition the security strategy of the tenant. This choice will have a wider impact on the strategy for securing the work environment. Indeed, Microsoft is increasingly positioning itself as a challenger to security solution providers, being the only one to offer such a complete suite.

The licensing of security options must be dealt with at the start of the project and at each renewal. It will be cheaper to include a licensing package from the outset than to order AAD P1 licences on an emergency basis to cover an unforeseen need for conditional access.

In this strategy to be defined, it may be appropriate to target individuals to adapt the security requirements to their profile (VIP, admin, medical population, etc.).

This approach, presented here for Office 365, can be generalised to any SaaS (Solution as a Service) service, or even IaaS (Infrastructure as a Service) or PaaS (Platform as a Service) service.

Cet article Migrate your work environment to Office 365 with confidence est apparu en premier sur RiskInsight.

Boost your cybersecurity thanks to Machine Learning? Part 1 – « Absolutely, here’s how! »

Carole Meyziat — Fri, 03 Jul 2020 12:00:14 +0000

Nowadays, we hear about artificial intelligence (AI) everywhere, it affects all sectors… and cybersecurity is not to be left out! According to a global benchmark published by CapGemini in the summer of 2019, 69% of organizations consider that they will no longer be able to respond to a cyber-attack without AI. Gartner places AI applied to cybersecurity in the top 10 strategic technological trends for 2020.

Throughout two articles, we will explore AI’s capabilities, specifically those pertaining to Machine Learning for cybersecurity. In this first article, we will go through each stage of a Machine Learning project focused on a cybersecurity use scenario: the exfiltration of data from the IS, on a very simplified case. We have chosen a case study, but the concepts of this article are applicable to all Machine Learning projects and can be transposed to any other use case, most notably cyber.

First of all, what are we talking about?

The term Artificial Intelligence (AI) includes all the techniques that allow machines to simulate intelligence. Today, however, when we talk about AI, we very often talk about Machine Learning, one of its sub-domains. These are techniques that enable machines to learn a task, without having been explicitly programmed to do so.

For us cybersecurity professionals, this is a good thing: we often find it difficult to describe explicitly what it is we want to detect! Machine Learning then provides us with new perspectives, that have already many application cases, of which the main ones are illustrated hereunder:

The example of a use case for ML-enhanced cybersecurity: the DLP

To illustrate the contribution of Machine Learning to cybersecurity, we have chosen to focus on the fraudulent extraction of data from a company’s information system. In other words, the case of DLP (Data Leakage Prevention), an issue encountered by many companies. We want to detect suspicious outbound communications in order to prevent them from happening.

«Very well but… how do we identify a suspicious communication? »

By large traded volumes? By a strange destination? By an unusual connection time?

In reality, our problem is complex to explain and what we need to assess is likely to change over time. Therefore, by using only static detection rules, our security teams find it difficult to be exhaustive. They can play on the thresholds of these rules to refine the detected elements, but unfortunately still find themselves with a large number of false positives to deal with.

We understand that the Machine Learning as we defined it previously can be useful here. What if we try it?

Step 1: Clarify the need

That is what we just did!

Step 2: Choose the data

When we hear the words Machine Learning, we usually must understand “data” to feed the algorithms. Lots of data, and of good quality!

When asking where to get useful data for our data exfiltration case to our requesting business (which for once is cybersecurity!), the web proxy stands out as the big winner: it sees almost all the traffic that comes out through the IS. So, we recovered its logs and they look like this:

« This all seems quite complicated…»

Data scientists have indeed enough reasons to get lost: on the one hand, the whole thing is not easily understandable, and on the other hand, after consultation with the cybersecurity business, not all fields are really useful for our use case. We therefore selected some of them with the cybersecurity business before continuing.

The result is easier for data scientists to use!

Step 3: prepare the data

Data scientists can now “explore the data” in order to ensure optimal learning of the algorithm. Here, they give us a surprising element in the distribution of our requests according to their upload volume. Since we want to detect data exfiltration, this variable is of particular interest to us.

The value of our variable is not distributed, we even have a very high volume at 0.

“But still, there are a lot of these requests with a null upload volume; is it really relevant to keep them in our case? “.

Indeed, after discussion with the cybersecurity business, it appears that these data do not bring much for our use case. So we decided to remove them. Our sample was then distributed as follows:

After several back and forth exchanges between data scientists challenging the data from a statistical point of view and cybersecurity teams responding with their professional eye, the data is simplified as much as possible. Data is then:

Enriched by creating new variables that are denser in useful information. We introduced a relative upload volume to each site, measuring the difference between the upload volume of a request and its average value over the last 90 days. We could also add the connection time for example.
Normalized by reducing the amplitude of each variable to decrease an over- or underweighting of certain variables.
Digitized, as most algorithms can only interpret numerical variables.

We can now split our data set in two: one set that will be used to train our model, one set that will allow us to test its performance. Several separation methods exist, enabling us to keep certain characteristics of the data (e.g. seasonality), but the objective remains the same: to guarantee an evaluation measure as close as possible to the model’s real performances, by presenting the model with data that it did not have at its disposal during training.

Step 4: Choosing the learning method and training the model

Some algorithms are more efficient than others for a given problem, it is therefore necessary to make a reasoned choice.

There are two main categories of Machine Learning algorithms:

Supervised, when we have labeled data as a reference to give as an example to our algorithm. These algorithms are for example used in cybersecurity by anti-spam solutions: they can learn via the users’ classification of emails as spam for example.
Unsupervised, when we do not know precisely what we want to detect or when we lack examples to provide the algorithm with for its learning (i.e. we lack labeled data).

As explained above, the context of our use case points us more towards the second option. It is for the same reasons that we initially thought of Machine Learning. We then choose our unsupervised learning algorithm (Isolation Forest here, but we could have chosen another one) and train our model.

Step 5: Analyze results

We use our test data set to evaluate the effectiveness of our model in detecting exfiltration cases.

The designed model detects patterns in the data (queries), then compares the new data (queries) with these patterns and highlights those that deviate from what it considers to be the norm through its learning (anomaly score).

Here are our results:

« Ok, but how should I interpret all this ? »

The graph on the left represents the anomaly scores associated with each query in the test set, sorted in chronological order. To the right are the logs with the highest anomaly scores.

After investigation with the cybersecurity business:

The peak in yellow, corresponds to a much larger upload volume than others, from a user who extracts a large volume of data. This anomaly is legitimate. However, an alert based on a static volume per request rule would also have detected this suspicious communication.
More interesting now, the peaks in red, correspond to requests for low volumes of regular uploads to unknown sites from the same user. These anomalies are harder to detect with conventional means, yet our algorithm has given them the same anomaly score as a large volume. They therefore become just as high a priority to qualify for our cybersecurity alert management teams.

Now, let’s focus on the large package in the center of the graph (in orange). On the first day, we observe a large anomaly score, a sudden sending of data by many users to the city’s transit website. After investigation we realize that this is not a real security incident, but the annual sending of receipts for the continuation of transport subscriptions (we are at the beginning of September …). We then observe that the algorithm “understands” that these flows return to several users and progressively integrates them as a habit. The risk score therefore decreases day by day.

The model therefore detects what is out of the norm, regardless of the standard, and corrects itself with experience. This is where Machine Learning presents a real added value compared to traditional detection methods.

If the performance of the model on this first simplified use case attests to the potential value of the Learning Machine, it may be time to move on to step 6 – deployment to scale!

In a second article we will come back to these steps to highlight the success factors and pitfalls to be avoided when studying the possibilities of Machine Learning in cybersecurity.

Cet article Boost your cybersecurity thanks to Machine Learning? Part 1 – « Absolutely, here’s how! » est apparu en premier sur RiskInsight.

DLP: how to avoid leaks without having to plug any holes

GEneviEveLardon — Mon, 11 Feb 2019 18:40:56 +0000

Today, more than ever, data protection is one of the major challenges facing companies. Pressure in this area is mounting: increasing legislation (such as the GDPR), new requirements from regulators, rising cyber threats, the challenge of user awareness, and more.

Meanwhile, the ecosystem within which data develops is becoming continually more complex. Indeed, information systems, which are in the full throes of transformation, are opening up to the outside world, becoming interconnected with numerous public cloud services, and creating escape routes for the company’s data.

A diversity of events can result in a data leak: employee negligence, internal fraud, third-party hacking etc. and the routes out are just as varied: email, Shadow IT, USB sticks, printers, etc. When an incident occurs, the consequences can be significant. The media take pleasure in persistently highlighting cases of hacking that have resulted in data leakage from major companies, something that permanently damages corporate reputations. The associated financial losses can also be significant, compounded by regulatory penalties and lost confidence on the part of customers and partners.

Today’s IS: a complex ecosystem that can open many doors to data leaks

DLP, an under-used – but eminently feasible – approach

The major challenge that data leaks represent is not, however, insurmountable. Some companies, including banks, have taken the lead in this area, compared with other sectors, in deploying tools to avoid data leaks that come under the heading of Data Leak Prevention (or Data Loss Protection—DLP). These tools enable them to track sensitive data and apply rules that control data flows, in line with defined policies. These rules can be applied at terminal level (workstations, servers, etc.), application level (Office 365, etc.) or network level (proxies, etc.).

Implementing such solutions, however, requires a rigorously-designed project involving both the Information Security Department and the company’s business functions. Three main factors can be used to reduce the complexity involved in this approach:

The issues that need to be addressed, and the corresponding technical solutions, in such a project, will depend on corporate objectives aimed at mitigating the risk of data leakage, as well as the level of current practices and classification methods.

It’s also imperative, when implementing DLP solutions, to preserve the user experience: users should not expect to see their activities impacted by new protection mechanisms. Therefore, security objectives must take into account the needs of the business, which may require sensitive information to be exchanged with the outside world.

The recipe for a successful DLP project

Firstly, selection of the DLP tool should be based on the objectives defined at the start of the project, in terms of the structure of the data to be protected and the channels of exchange to be analyzed.

Some market solutions are highly mature when it comes to detecting whether data is sensitive, regardless of the data structure or transmission channel. The detection of structured data is simpler because it’s easier to characterize (for example: a social security record, or credit card number, have a defined number of digits). For unstructured data (which comprises 80% of all data, according to Gartner), detection can be based on the analysis of the metadata introduced during classification.

Next, the project should be framed to define and formalize the four essential areas of a DLP project, which are the keys to success in deploying the solution:

Mapping sensitive data and defining the associated protection rules

Where a company has already mapped data and processing activities that are considered sensitive—as well as what it deems legitimate flows—this can serve as a basis for the development of the DLP policies and detailed protection rules during the project.

If such mapping has not been carried out, a DLP project cannot succeed without strong involvement from the business functions. The project team will need to connect with the relevant departments and activities, to identify the sensitive data and the associated processing activities. This initial analysis will enable the demarcation of legitimate processing, storage, and transmission channels, both internal and external, to be separated out. And doing it successfully will mean working closely with key contacts from the various departments who will need to be interviewed to gather the information needed.

Following this, the project team can create the DLP policies to cover scenarios that represent data leaks.

Feedback from major corporates, however, shows that a key success factor in such projects is knowing how to pick your fights; it’s unrealistic—at least at first—to try to implement all potential DLP policies. Implementing good coverage of the company’s most critical data will already demonstrate a satisfactory level of maturity compared with current norms.

The identification of the legal and regulatory requirements associated with the processes being analyzed

The regulations that apply to sensitive data, such as personal data (for example national information processing laws, the EU’s GDPR, etc.) impose specific limits on the extent that such data can be legitimately processed. Moreover, companies operating in an international context have to comply with local regulatory frameworks, of which each has its own particularities. This results in a diversity of rules to be followed concerning data processing.

When it comes to legal compliance, it’s important to take the advice of the company’s own legal and compliance departments, as well as the various international entities who can approve the analyses and protection rules to be applied to the data.

The main points to be addressed during this regulatory due diligence are the processing of personal data, the notification of users about the processing being carried out, the place that the processed data is stored, and the transfer channels used.

Defining the process for managing data leak incidents

The operational implementation of previously considered DLP scenarios then requires the project team to define the resources and processes that will be set in motion when a data leak is detected. These will, of course, need to be tailored to the company’s incident management processes:

Who will receive the alerts related to potential data leaks (the SOC (if there is one), a dedicated team linked to a business function, etc.)?
What resources are to be put in place during the investigation of an impacted area (for example, in the event of a highly sensitive area being affected, will an inquiry need to maintain a certain level confidentiality)?
Depending on the level of criticality, which hierarchical and operational levels should be made aware?

Unlike technical security incidents, it may be important to integrate relevant business teams, or the security manager of the part of the business in question, into the process in order to define the criticality of a data leak and its scope. In cases involving structured data, criticality can be evaluated simply, using correspondence tables, but the thinking required is of a completely different nature when unstructured data is involved (for example, an email from a company manager or a document related to a confidential project).

Strong sponsorship will also be required to ensure that the objectives and methods implemented under DLP are approved by the various business functions, the HR department, and employee representatives.

Implementing a tool tailored to the scenarios defined

Along with the definition of the incident management process, the supervision model and choice of tools must also be fleshed out. In addition to being able to address the detection scenarios defined, the tool selected will need to comply with certain prerequisites specific to the company’s ecosystem, as well as with the results of the regulatory due diligence performed. The criteria for the choice of technical solution should include the ability to:

Integrate it with SOC tools (SIEM, etc.), and ideally with other enterprise security solutions (proxy, encryption tools/DRM, etc.);
Tailor it to the business environment (collaborative platforms, file servers, etc.);
Take into account the diversity of IT assets and the information system in case of deployment of add-on or application.

In addition, the effective implementation of a DLP strategy must, as an imperative, cover all channels of exchange and business use cases, in order not to leave any backdoors open (for example, installing a DLP tool at server, mail, and file levels, while leaving USB ports unprotected).

The four pillars of DLP

Implementing the solution doesn’t mark the end of the interest in data leak prevention: the DLP process must be part of a process of continuous improvement. The study of false positives and alerts should lead to regular reviews (at least every six months) to improve the detection scenarios in use. To do this, it’s good practice to anticipate, right from the beginning of the project, the associated resource requirement from Run teams, and to start with the basic scenarios.

It also makes sense to incorporate the DLP project’s objectives within a larger program to address data protection, including the review of file server rights and permissions, authentication with conditional access, and the integration of supervision with SOC and the encryption of files and applications.

Cet article DLP: how to avoid leaks without having to plug any holes est apparu en premier sur RiskInsight.

Classification: that essential aspect of data protection

GEneviEveLardon — Sat, 12 May 2018 13:31:39 +0000

Data is the 21st century’s black gold: an observation you won’t be particularly surprised to hear. The fact that it is ever-more exposed (through the increasing use of APIs and SaaS applications such as Office365, Salesforce, Shadow IT, etc.) and therefore at greater risk, won’t surprise anyone either.

The question is no longer whether data can leak (intentionally or not) and be misappropriated, but rather, how to secure it, and limit the impact when it does leak.

Against a backdrop like this, security models need to evolve. The castle model is now largely outdated, and is being replaced by that of the airport. Data-centric protection then becomes an imperative. And such protection also has to meet the daily needs of those same users who worry about being affected.

2 the different types of data … And the different approaches they require

The large data-protection projects launched by major players all face the same problem: how to decide how sensitive a given piece of information actually is. The answer to this question is key: it’s this that determines the relevant level of protection needed to avoid data leakage.

Today, there are two broad types of data:

Structured data, which refers to all information that follows a particular format, and is easily identifiable as such: a CRM field, social security number, official certificates, and email addresses, as well as a host of other data that can be expressed in a clearly defined format (1). Typically, this information is found in the databases of applications.
Unstructured data, which can exist in any format (such as MS Office documents, PDFs, images, videos, music, business application files, etc.). It should be noted that data which, at first glance, might be considered structured (for example, the telephone field of a CRM), may not be so if the format in which the data is entered is not followed strictly.

Structured data can be easily identified, and its sensitivity assessed according to predefined norms; but unstructured data presents a problem of a whole different magnitude—and it’s mostly this type of data that employees generate day to day. In concrete terms, this translates into an inability of the relevant security tools (such as: Data Loss Prevention/DLP) to identify a leak or the misappropriation of vital information.

The classification of unstructured data, then, represents the cornerstone of any data protection strategy—and it’s something that has to be done manually by end users.

But what is classification?

“Data classification” means the entirety of the technical and organizational processes used to categorize information produced by the employees of an organization. Following the categories defined – according to levels of sensitivity (for example, internal, confidential, secret, etc.) or by relevant organizational functions (such as HR, R&D, Purchasing, etc.) – classification allows data to be placed within the appropriate regulatory, legislative, or security framework.

Historically very basic (for example, a checkbox in a header or on the first page of a document, or the manual addition of metadata), classification consolidates data, and makes users responsible, by placing them at the center of the process, while, at the same time, offering them an improved experience (a simple interface and clear advice).

In practice, classification tools offer a diverse range of functionality:

For new files, either manual or automatically determined classification according to predefined rules (for example, the presence of a certain number of social security numbers);
For existing files, the manual scanning of files stored in local directories or on premises, according to predefined rules;
The addition of metadata (or tagging) to the file: this metadata, which can be interpreted by third-party tools, unlocks visibility for supervisory tools such as Data Loss Prevention;

The addition of visual marking elements (such as headers, footers, and watermarks) to raise awareness among end users.

The results of classification projects have been inconclusive so far

RSSI procedures tend to take into account issues of data classification, and the issue is core to most major corporations’ policies. This imperative is reinforced by recent regulations such as the GDPR or the French Military Programming Act (LPM) which require the mapping of data and uses. However, few organizations, other than banks, have successfully implemented effective classification strategies.

There are several reasons for this gap:

End users are generally not aware of the nature of the sensitive data or its impact: while the highest classification levels (“C4”, “Secret”, “Confidential”, etc.) are used for documents likely to put companies, or even entire Groups, at risk; these usually represent about 1% of all such information – although this proportion is close to 10% in some companies. Conversely, it is not uncommon for a user to share files containing sensitive personal data, or passwords, without any classification or protection.
Thus, any data-classification project requires strong change-management support for end users. This should use clear messages and concrete examples, that allow users to classify information easily. Periodic recaps will also be needed to remind users what constitutes good practice. In fact, a user who handles sensitive data—day in, day out, may no longer be aware of the impact of this data being compromised.
If they fail to provide users with sufficiently ergonomic approaches, companies cannot expect solid results. Experience shows that checkboxes for classification levels on cover pages, headers, or footers are only rarely selected.
The classification of the entirety of a company’s data is a transformation project in its own right and requires strong commitment from functional and corporate teams if it is to be widely delivered. This commitment must be even greater if the classification strategy that has been defined impacts users (through obligations to classify documents, use encryption, etc.).

Classification takes center stage again

The topic is back, in force, with large corporates, driven by digital transformation programs—requiring the rethinking of data protection, and with the large players in the market—who are shaping their offerings around the subject. Some analysts, like Gartner, even foresee the consolidation of data-protection solutions into a single, classification-centric solution.

Awareness and ergonomics will need to be combined, if such approaches are to be successful and end users are to buy into the process. The two will need to work together – hand in glove.

In a future article, we’ll be looking at how the market is evolving for historical security players, and how the implementation of an effective classification strategy can provide a springboard for new impetus in data protection.

(1) A regular expression is a string of characters that corresponds to a specific syntax. For example, a French phone number can have one of three formats: 0123456789, +33123456789 or 0033123456789.

Cet article Classification: that essential aspect of data protection est apparu en premier sur RiskInsight.

Le paradoxe des projets de Data Leak Prevention (DLP) : une problématique clé, des solutions matures… mais une mise en œuvre qui fait encore peur

Ali Fawaz — Thu, 28 Mar 2013 13:14:18 +0000

L’évolution des menaces et de la réglementation pousse les entreprises à être de plus en plus attentives à leurs données et à orienter les protections sur ce périmètre. Les solutions de prévention contre la fuite d’information, ou DLP, apportent des éléments de réponses à leur problématique. Pour autant, si le besoin semble réel et les solutions matures, les retours d’expérience restent limités par rapport à ce que l’on pourrait attendre.

Un apport des DLP complémentaire à la lutte contre l’intrusion et au contrôle d’accès

Une fuite d’information peut provenir de trois sources différentes. L’attaquant externe est souvent celui qui vient à l’esprit en premier. Cependant, l’expérience montre que ce sont les utilisateurs internes, autorisés ou non, qui font fuir le plus d’information.

Suivant la position de celui qui fait fuir l’information, trois grandes étapes peuvent être enchaînées : intrusion, accès à l’information, diffusion de l’information – dont la nécessité dépend des accès initiaux de l’acteur à l’origine de la fuite d’information. À chacune de ces étapes, des solutions de sécurité permettant de réduire le risque existent.

Il convient d’agir à toutes les étapes d’une fuite d’information en s’appuyant sur des mesures allant de la sécurité physique aux solutions de Digital Right Management (DRM), en passant par le chiffrement de flux, le cloisonnement, ou encore la gestion des accès et des habilitations…

Si de telles mesures sont déjà mises en œuvre, les outils de DLP permettent alors essentiellement de se prémunir contre des erreurs ou malveillances d’utilisateurs ayant un accès légitime à l’information. En ce sens, ils permettent d’apporter une protection au plus proche de la donnée.

Des solutions fonctionnellement matures

Les mécanismes de contrôle des DLP sont mis en œuvre à travers des règles ou politiques centralisées permettant d’analyser les traitements faits sur la donnée quelle que soit sa nature ou son support.

Grâce à des agents déployés sur le réseau et/ou sur les postes de travail, le DLP va pouvoir empêcher la copie d’un fichier sur un périphérique externe, l’envoi d’un document sensible par email, l’impression d’un document ou encore la publication d’une information confidentielle sur les réseaux sociaux.

Après analyse et filtrage des données par la solution DLP, différentes mesures de prévention peuvent être prises, avec un impact plus ou moins élevé pour l’utilisateur : alertes, demande de justification, blocage…

Enfin, il convient de noter que les acteurs du marché mettent de plus en plus l’accent sur le contexte d’utilisation de la donnée. Certains éditeurs proposent ainsi des fonctionnalités de gouvernance au sein de leur solution de DLP permettant par exemple de savoir exactement où se trouvent les données sensibles et qui y a accès.

Le marché des DLP est donc de plus en plus mature : la couverture fonctionnelle proposée est élevée et évolutive, la gestion de l’impact sur les collaborateurs de plus en plus souple. Néanmoins, les retours d’expérience restent limités par rapport à ce que l’on pourrait attendre.

La raison de ce paradoxe vient du fait que les métiers sont trop souvent insuffisamment impliqués dans les projets de DLP, alors même que ces projets n’ont que peu de chance d’aboutir sans eux, en particulier vu le volet RH nécessairement associé.

Adopter une approche par les résultats pour mobiliser les métiers

Il est illusoire de vouloir protéger toutes ses données dans tous les cas d’usage imaginables. Une approche purement technique visant un périmètre exhaustif n’a que peu de chance de convaincre, particulièrement dans la conjoncture économique actuelle.

Une approche par les résultats mêlant ciblage précis, démarche outillée, accompagnement et visibilité est donc à favoriser dès la sélection de la solution. Une fois les objectifs atteints sur un périmètre prioritaire, on peut envisager de l’élargir.

La première étape, primordiale, est donc la définition du périmètre prioritaire de données à protéger et des cas d’usage fonctionnels à traiter. Identifier les dix données les plus critiques, s’appuyer sur des situations fonctionnelles avérées, commencer par un nombre limités de supports pour réduire les aléas techniques sont autant de facteurs clés de succès.

La définition des processus de surveillance (politiques d’interaction avec les utilisateurs, processus en cas d’alerte…) ne doit également pas être négligée. Sur ce volet, et dès le début du projet, il est important de mobiliser les fonctions RH de l’entreprise pour valider le mode de mise en œuvre de la démarche DLP (alerte, blocage, journalisation…), construire les processus de gouvernance associés et au final envisager un passage devant les instances représentatives du personnel.

Lorsque le cadrage global du périmètre fonctionnel est effectivement achevé, la phase de sélection de la solution peut être entamée. Une démarche outillée impliquant la réalisation d’une maquette est indispensable pour s’assurer de l’adéquation de la solution aux cas d’usages fonctionnels identifiés et évaluer les résultats envisageables.

En cas de résultats satisfaisants, un déploiement progressif est à envisager avec un leitmotiv : la sensibilisation des utilisateurs.

Enfin, en mode récurrent, l’intégration à un SOC (Security Operation Center) peut permettre de bénéficier de la maturité de la gestion opérationnelle de la sécurité pour optimiser la surveillance d’une part et l’accompagnement et la visibilité fournis aux métiers d’autre part.

Cet article Le paradoxe des projets de Data Leak Prevention (DLP) : une problématique clé, des solutions matures… mais une mise en œuvre qui fait encore peur est apparu en premier sur RiskInsight.