De kracht van synthetische data

The Power of Synthetic Data ADA

Regelgeving omtrent (persoonlijke) data wordt steeds strenger en dat is maar goed ook. Toch zorgt het aanscherpen van deze regels voor de nodige uitdagingen. Data is voor veel bedrijven namelijk ook essentieel om in te kunnen spelen op de behoeften van de markt. Precies voor deze uitdaging is een oplossing ontwikkeld: synthetische data. Ontdek de kracht van synthetische data en het model dat is ontwikkeld in deze Engelstalige blog.

Data security and usage have, in recent light, become an important consideration for not only companies but also individuals whose data is being protected. Every day, we produce around 2.5 quintillion bytes of data  (Forbes, 2018) be it in the form of social media posts, tweets, transactions, likes, web searches etc. All of this data is invaluable to companies as they use it to build and understand their customer profiles, look for trends, identify opportunities, tailor better services and products, and even anticipate events to capitalise on. However, this data can also be used to exploit, influence and abuse. This is why we need regulations, like GDPR, in place that govern and hold companies and individuals accountable for the way they use and gather the data. 

Right of access

GDPR evolved from a rule to become a regulation – the first of its kind in the European Union. Under this regulation, personal data or PII (personally identifiable information) is protected by restricting the processing and usage of the data. This regulation protects the end consumer and empowers them to be able to choose what happens with their data and understand how companies are using their data and for what – this is known as ‘right of access’. Under this regulation, individuals can choose whether or not companies can use their data for different purposes. Companies have to delete any data they might have from the individual if the individual decides to revoke their right of access.

High fines

Another feature of the GDPR focuses on the usage of the data and prohibits companies to use data other than for specific purposes that are inherent to their business models. The companies need to be able to state what data they collect and for what purpose. So, if a company is using production data for testing, this could amount to unlawful processing, especially if it was not explicitly stated what the data would be used for when getting the consent from the individual. There are of course ways to avoid incurring high fines and one of those methods is to use pseudonymised/masked data. The usage of pseudonymised data is more relaxed under GDPR and does not have the strict regulations to comply with however, there is still a risk of a data breach. Even better is the use of anonymised data, which is not regulated by GDPR, although this data comes with risks as well. Anonymised data is data that cannot be traced back to a certain individual, but recent studies have shown that anonymised data can still be traced back to identifying the underlying individuals, which makes this strategy still susceptible to adversarial attacks (Nature Communications, 2019) .

Synthetic data

This is where the power of synthetic data shines. Synthetic data looks and feels just like the real data holding all the characteristics and relationships present in the real data. Sogeti’s Testing^AI team has developed a new solution to create synthetic data with AI called ADA – Artificial Data Amplifier. ADA uses really advanced neural networks to generate synthetic data that can then be used in place of real data. ADA is not a generic data management tool; it is a custom solution that needs to be trained on real data. Typically, ADA extracts a dataset used in an application, environment or report. It then generates synthetic data and pushes it back into your databases. The advantages of using synthetic over real data are two-fold. First, the advantage of creating an entire dataset that looks and feels like your real data but without the security risk of any data breach is valuable for companies that operate in very highly regulated industries. Secondly, this solution is scalable meaning that we can create endless amounts of data based on a small sample of the real data. The advantage here is that we can create enough data for testing that is once again, GDPR compliant as it is purely synthetic.  To learn more about ADA or find out how you can implement synthetic data, contact the Testing-AI team!

Artificial Data Amplifier

Meer weten?

Benieuwd naar de mogelijkheden van synthetische data? Neem contact met ons op!