I am building a data quality control layer and currently checking data formatting.
As part of the format checking, I would like to ensure all data will not contain uncommon characters, which I guess, means anything OUTSIDE the normal ASCII character set (0 - 127).
Efficiency would be imperative here considering this is a very small part of a very large processing "layer" and will likely involve checking every character in a potentially very large dataset.
I'm guessing regex would be ideal but I cant figure out a way to implement regex without explicitly listing every single strange character, and I'm unfamiliar with regex as it is...
Any recommendations?
Thanks!