
Perforce Compliance Lead: Dont You Forget About Non-Production Data
www.forbes.com
02 September 2022, Berlin: Resting loungers at Vabali Spa Berlin are seen before opening. (to dpa ... [+] "Saunas fight against high energy prices") Photo: Christoph Soeder/dpa (Photo by Christoph Soeder/picture alliance via Getty Images)dpa/picture alliance via Getty ImagesSystems go live. We talk about the stage when software application development projects push to live production status as if it were some kind of tangible assembly line in a factory somewhere, which in virtual terms it kind of really is. Enterprise applications running in live production ingest their raw ingredients (data), process the information streams through various stages of manufacture (normalization, analytics, AI engines and so on) and then churn out a finished product, which is often neatly packaged up for the user interface through various levels of abstraction to make consumption easier.What Is Non-Production Data?Because theres so much focus on live production systems, we sometimes forget about the data that exists in non-production environments. These information resources may include testing and prototyping datasets, they might encompass unstructured information that exists in the murky waters of the so-called data lake, they may come from source data repositories that are not part of working applications yet still form an essential information aquifer to feed IT services higher up and they could include data that belongs to recently decommissioned applications, or those that are sunsetting and about to earn their retirement in the annals of legacy software.From a data security, privacy and information management perspective, these seemingly unloved piles of information are still important from a compliance and governance perspective, so how should we get our house in order?Non-production data may sprawl from place to place with little visibility or control over whats happening. According to David Wells, head of product for data compliance at enterprise DevOps solutions company Perforce,Imagine a source dataset and a target dataset, where the source contains unaltered data and the target has been transformed into a more secure version of the former. Data engineers will typically alter actual data using techniques such as anonymization, bucketization (replacing a range of values with a single one), differential privacy or altering data outliers. To some extent, this prevents linkage attacks that might allow someone to connect data in the target set to the source dataset but the risk is not entirely eliminated, explained Wells.MORE FOR YOUArent there quick fixes for this kind of situation out there by now? Sure, he agrees, use of synthetic data is one solution but even though it is randomly generated, it usually still maintains the aggregate patterns of the original dataset. All of which means that a malicious attack can still be mounted by successfully reidentifying some of the source data if it is also publicly available (such as a list of home addresses).The Developer ChallengeDevelopers working with non-production data realities face similar obstacles, said Perforces Wells. Datasets typically need linkages between similar pieces of data. So when using test data, replacing the last two digits of a five-digit zip code without changing an address could cause an address validation function to fail, or altering a cars vehicle identification number might turn a Subaru into a Cadillac. Furthermore, the problem grows as more pieces of data are re-related (such as age and date of birth).In addition, we must also remember that actual (i.e. live production) data often poses permutations that were not considered by software developers or data engineers at any level, especially when new software version releases are put into play because actual data has patterns that will stress and strain processing in unexpected ways. When that data moves from a state of production to non-production, there are architectural implications for information compliance and safety that - really - we should have been thinking about in the pre-production stage.Data patterns also matter to testers and there is often a need to be far more realistic than many data processing toolsets will allow. All of this means that if and when any of these information streams ends up languishing as non-production data, due care and consideration are still required.The Bigger They AreFurthermore, the greater the scale, the harder the situation becomes, clarified Wells. Test data needs to be transformed or generated consistently to maintain referential integrity. While managing this may be relatively achievable on a small scale, as soon as larger apps are involved, it becomes increasingly expensive, complex and time-consuming. This gets exacerbated as systems begin having different representations for the same domain of data (for example, system A puts a persons name in full in one field and system B uses the first and last name separately). Yet, for security to be effective, teams must address a minimum of 18-20 sensitive customer fields.So, whats the answer he asks? First, we clearly need to take a more proactive rather than reactive or defensive approach to protecting data in non-production environments so that we move to a mindset that focuses on prevention rather than cure.Techniques like static data masking can automatically discover sensitive data and replace it with fictitious, production-like, fully functional data with consistent referentiality across development, testing and analytics teams, highlighted Wells. This addresses challenges in protecting both test data and analytic data by permanently anonymizing sensitive information in non-production environments. This ensures compliance with privacy regulations while maintaining data usability for testing or analytics.The Perforce compliance lead provides us with an example i.e. take a bank that wants to provide anonymized production-like data for analytics without violating GDPR.. using a masking tool like Delphix, it can:Automate sensitive data discovery across Oracle and SQL databases.Apply masking policies to anonymize personally identifiable information, ensuring referential integrity between testing and analytics environments.Use the products hyperscale architecture to deliver masked datasets within hours rather than days, reducing development cycle times while ensuring compliance.Masking can also help tackle scale by masking sensitive data at scale across complex, multi-terabyte datasets using automated discovery and masking. Multi-threaded processing helps mask large datasets rapidly without impacting downstream operations, while support for heterogeneous environments (database, files, cloud and on-premises locations) provides comprehensive, consistent coverage.Beyond masking, other data protective measures include data loss prevention, data encryption and strict access control. However, these all have their pros and cons, heeded Wells. Similarly, audits complement prevention, but vulnerabilities may not be found until after a problem is created. In reality, non-production data protection strategies may use a combination of techniques and tools. For example, an organisation might use data loss protection together with strict access control and adding masking can ensure that the sensitive data is irreversibly masked and cannot be re-identified.Cultural Shift To Security-FirstListening to the Perforce compliance team talk about the need to lock down not just data, but all data, other data and, oh, those bits of data too, it becomes clear that we need to move towards a security-first culture where data protection is baked in from the start of non-production processes.We talk about zero trust design and we extoll the virtues of baked-in security provisioning for todays increasingly mission-critical and life-critical data resources and repositories, but there is obviously a more rounded data lifecycle going on inside enterprises across all verticals that demands data management practices that are both holistic and rather more agnostic in terms of their application.
0 Comments
·0 Shares
·137 Views