Differential privacy in healthcare:The promise of synthetic data

It is, as its name suggests, artificially generated. It is most often created by funnelling real-world data through a noise-adding algorithm to construct a new data set. The resulting data set captures the statistical features of the original information without being a giveaway replica. Its usefulness hinges on a principle known as differential privacy: that anybody mining synthetic data could make the same statistical inferences as they would from the true data — without being able to identify individual contributions.

The promise of synthetic data | Financial Times

Can this be used in healthcare? This idea sounds tempting but I would be hard-pressed to find a use case scenario. We all work on basic assumptions- that the data is likely to be “structured” or would yield some insights if and when coaxed out through analysis. It is seeing a fervent activity with an “open challenge” thrown in for some good measure.

I saw the mention in one of Apple’s keynotes and it was enough to dismiss them marketing snake oil. However, the idea is extremely appealing- will it stand past the scrutiny of healthcare regulators? I’d be exploring it in detail later.

The central idea is to get the analysis but get rid of the “discrimination” and bias. However, the field is so fluid that we are as good as drawing lines in the sand. It reflects a utopian “vision”and still requires a considerable study to generalise its applicability.