The number of applications businesses develop every year is exploding. It’s no longer enough to provide excellent customer service in person. Your customers expect the same and more from your website, from mobile apps to third-party platforms. Digital transformation is a challenge many try to tackle and scale, using stellar tech stacks, state-of-the-art AI, and off-shore development teams. However, the lack of high-quality and privacy-safe test data often hinders both quality results and overall security. Here is what we have found while working with huge enterprises and mid-sized companies globally.
Privacy is only an afterthought if it even comes up
Time and again, we witness reputable companies using radioactive production data for testing software applications or, worse, for training and testing AI systems. The implications of such practices are manyfold. The Norwegian Data Protection Authority issued a hefty fine for the Norwegian Confederation of Sport for using personal data in testing a cloud computing solution. The fine also recommended using synthetic data for testing instead of violating the General Data Protection Regulation (GDPR) and endangering customers’ privacy. Educating test engineers about the dangers of using production data for testing is a must. Providing them with tools to generate meaningful test data is also imperative.
Enterprises are not capable of providing useful test data to their developers
In industries where sensitive data is handled daily, privacy maturity is at a higher level already. However, heavily de-anonymized datasets, for example, hundreds of one-cent transactions, do not provide the level of realism necessary for testing. For QA engineers, getting their hands on the data they actually need is a constant and costly struggle. Development cycles and time-to-market lengthens while testers – due to no fault of their own – fail to deliver robust systems. To make matters worse, legacy anonymization tools like pseudonymization and data masking fail to protect from de-identification. New tools, like synthetic data generators, offer the only GDPR-compliant and meaningful way of delivering suitable test data for application and AI development.
Synthetic data for testing and beyond
Synthetic data generators are becoming the go-to privacy-enhancing tools for enterprises worldwide. These artificial synthetic datasets are modeled on production data, are highly realistic, yet contain none of the original data points. Synthetic data generators offer features extremely useful for testing, such as subsetting huge datasets into representative but smaller, more manageable datasets or generating entire databases with referential integrity. Synthetic data can be flexibly augmented to include data not available in the original and generated on-demand for specific test cases. What’s more, the resulting data is no longer personal data and, as such, is out of scope for GDPR. Sharing highly realistic synthetic test data with off-shore development teams might be the missing piece the puzzle engineers have been waiting for to deliver robust products in record time. Learn more about synthetic test data, and your dev team will thank you!
For more information on MOSTLY AI you can visit their website
You can also catch up on our recent podcast: Driving fairness in AI using synthetic data - A Conversation with Alexandra Ebert, Chief Trust Officer, MOSTLY AI