Question

What are your Tips and Tricks for Anonymizing Data?


Badge +11

While working with customers and sensitive data, we on the Safe Software Support Team often run into road blocks troubleshooting issues when FME Workspaces and source data cannot be shared (nor can an NDA be arranged in a timely manner – and for this argument, neither can a screen share be arranged).

There is an excellent Knowledge Center article that takes a real world example, and provides a tutorial with many actions the user can take to anonymize sensitive data.

Particularly the following techniques are used:

- Removing Attributes (AttributeRemover, AttributeManager)

 

- Renaming Attributes (AttributeRenamer, AttributeManager)

 

- Replacing Attribute Values (AttributeManager, StringReplacer)

 

- Geographic Manipulation (CenterPointReplacer, Offsetter)

 

What actions do you take, or what do you think is missing from this list? Are there any custom transformers that could help?

And is it possible that a custom transformer could be created to aid in automation of this practice?

Looking forward to any input or comments that you provide! Thank you.

[ keywords : anonymous data random values randomization scramble scrambling ]


10 replies

Badge +11
Another reason source data can't be easily shared is because it is in a database. Here I would make use of a Sampler>Recorder transformers to create an FFS file that could be shared (after anonymizing the data of course!).
Badge +16

You can obfuscate geocode locations to block-face centroids for privacy purposes.

For example lets assume criminal activity occurred here.

You can see the link returns an AddRange attribute, in this case 7343-7599.

We want the geometry to use the range median - 7471, so we take the XY from there.

Lastly we make sure to get the address label without house number.

Wallah - coordinates and values that protect privacy.

Badge +22

If the issue is with geometry, especially relative to other features, you could use a commonLocalReprojector and then a coordinateSystemRemover.

Badge +22

Personally I create a new workspace, that all it does is illustrate the error. This is generally accomplished with Creators and RandomNumberGenerators, though I do have some custom transformers that generate datasets, including random distribution of points, complex lists.

 

 

Attached is an example of a workspace that demonstrates a bug in the tester.

 

testerlisthasvalue.fmw

Userlevel 5
Badge +25

I try and use the same aproach as @jdh, try and recreate the issue with Creator, 2DGridCreator etc.

Badge +22

Personally I create a new workspace, that all it does is illustrate the error. This is generally accomplished with Creators and RandomNumberGenerators, though I do have some custom transformers that generate datasets, including random distribution of points, complex lists.

 

 

Attached is an example of a workspace that demonstrates a bug in the tester.

 

testerlisthasvalue.fmw

PS. @MarkAtSafe, This bug has a case number C125236 but I don't think I ever got any follow up on it.

 

Badge +2

Here's a nice presentation from Heidi Lee at the RCMP on Anonymizing Property Crime using FME

Userlevel 4
Badge +25

Personally I recommend the AttributeCompressor. That way you can compress/encrypt values to anonymize them - and still have the ability to get the data back by de-encrypting. Just make sure you remember what the password is!

Badge +8

I would simply ask for the log file and print screens. If the customer does not want to share the workspace then I call that hitting a "support" wall . In some cases I would need to pay them a visit at the office to help them. Depending on the case.

Userlevel 1
Badge +4

There are some Custom Transformers in the Hub that can help a bit.

 

 

* If handling personal information you can get random user-data by https://hub.safe.com/transformers/randomusercreator

* In rare cases you could even use fake geometries from https://hub.safe.com/transformers/fakecountrycreator

Reply