Hi, I have a huge data table where some cells have codes (e.g. ND, TR, TC, XX, etc.). This table gets updated all the time, and I read this data, and I want to get rid of these codes and just replace with zero. I have no idea how many codes are out there (a code may not be in the current table but may appear in an updated version). So, I just want to select any cell with a string and replace it zero. How can I do this? Maybe a conditional statement similar to the ISTEXT formula in Excel? Or a wildcard (*)? Thank you :-)
You can use the stringReplacer. Set the mode to Regular Expression and the text to replace to '.*'
Thank you, I tried that but it didn't replace the codes (e.g. the TR code here).
I'm just using Replacement Text TEST at the moment for testing purposes, but I actually want to make these cells zero, is that possible (as 0 is not a replacement 'text' but a number) ?
Thank you, I tried that but it didn't replace the codes (e.g. the TR code here).
I'm just using Replacement Text TEST at the moment for testing purposes, but I actually want to make these cells zero, is that possible (as 0 is not a replacement 'text' but a number) ?
The single quotes around the regex are not part of the regex expression. It will also match anything, so if you have numbers they will also be replaced with zeros.
The single quotes around the regex are not part of the regex expression. It will also match anything, so if you have numbers they will also be replaced with zeros.
Hmmmm.... I did try without the single quotes. Didn't work either.
Well, you could use a simple AttributeManager and set everything to 0. Maybe even with a conditional statement (if ColName [Attribute has a Value] then ColName=0).
Or a NullAttributeMapper could be used to do the same sort of thing.
But do you mean if there is a value that is not a code, it needs to be kept?
The single quotes around the regex are not part of the regex expression. It will also match anything, so if you have numbers they will also be replaced with zeros.
@mary If you can attach a small sample dataset and your simplified workspace that will help us narrow down why this isn't working for you. Make sure you have selected "Replace Regular Expression and the .* with no quotes as suggested by @ebygomm should work for you.
Well, you could use a simple AttributeManager and set everything to 0. Maybe even with a conditional statement (if ColName [Attribute has a Value] then ColName=0).
Or a NullAttributeMapper could be used to do the same sort of thing.
But do you mean if there is a value that is not a code, it needs to be kept?
Hi, yes I also have values that are not codes that need to be kept. I have already set all blanks to 0, and removed all < signs. I still have some codes left that I need to remove (set to 0). And then there's the actual values I need to be kept. I'll try and get a small dataset together.
The single quotes around the regex are not part of the regex expression. It will also match anything, so if you have numbers they will also be replaced with zeros.
I want to keep the numbers. I just have codes that I want to remove, so my final dataset will have all numbers.
Hi, yes I also have values that are not codes that need to be kept. I have already set all blanks to 0, and removed all < signs. I still have some codes left that I need to remove (set to 0). And then there's the actual values I need to be kept. I'll try and get a small dataset together.
Or is there a transformer that shows numeric values only which will also help me get rid of the strings.
Hi, yes I also have values that are not codes that need to be kept. I have already set all blanks to 0, and removed all < signs. I still have some codes left that I need to remove (set to 0). And then there's the actual values I need to be kept. I'll try and get a small dataset together.
You can use a Tester with regex '^\\d*$' to find your numeric values.
As it seems like the conditions of what needs to be set to 0 is quite complex, you could potentially look at the AttributeValidator which will enable you to perform a range of tests.
Hi, yes I also have values that are not codes that need to be kept. I have already set all blanks to 0, and removed all < signs. I still have some codes left that I need to remove (set to 0). And then there's the actual values I need to be kept. I'll try and get a small dataset together.
The Tester can search for Type Is "Numeric" if you pick the right operator.
The single quotes around the regex are not part of the regex expression. It will also match anything, so if you have numbers they will also be replaced with zeros.
Ah, so can you use the Tester to test for Type Is Numeric, and whatever is left must be a code and should be replaced with 0? Or you could even encode it all into an AttributeManager with conditional values.
@mary Try the Attribute Validator and the Validation Rule: Type. The failed features you can set to 0
Without seeing the data it's difficult to know if this will cover all scenarios, but a NullAttributeMapper with If Attribute Value Contains Regex of
R^\d*\.*\d]
And choose to map to New Value of 0 should work
The regex should match anything that is not a number (8 or 8.1) so anything that contains not a number is set to 0
Hi, yes I also have values that are not codes that need to be kept. I have already set all blanks to 0, and removed all < signs. I still have some codes left that I need to remove (set to 0). And then there's the actual values I need to be kept. I'll try and get a small dataset together.
Absolutely.
Personally, I prefer using regex since I found that strings which were also hexadecimal by chance pass as Numeric (E.g. We have an asset called '0X1' - Not my idea to name it that).
Thanks everyone, I've attached an example data set. Maybe there's a simpler way to get the results I want and maybe I'm thinking in the wrong direction, so I'm just going to explain what my goal is.
Please see attached example data set (first tab).
So I want to clean up the data with column headings A to AB.
I want to get rid of all the blank cells.
I want to get rid of all < values.
(I have done these 2 things using the NullAttributeMapper).
Then there are still some codes left (e.g. X, TR - unknown how many codes, as this dataset refreshes all the time). So I want to get rid of these codes too. I want to keep only numeric values that don't have a < sign.
THEN I want to keep only the numbers that are greater or equal than the guidelines in the second Excel tab.
So for column A for example I only want to keep all values that are greater or equal than 0.05 for Guideline 1.
For Column A, I also want to keep all values that area greater or equal than 0.0001 for Guideline 2.
And I want to keep these datasets (guidelines) separate.
So I now realise that by comparing to guidelines, the strings most likely will be filtered out anyway as 'failed'.
I know this has become a bigger problem now, just not sure how to compare to guidelines either, I need some sort of lookup - not sure.
Thanks everyone, I'm very new to FME.
@mary Thanks for the more complete explanation of the problem and the sample data. I think you should be able to do everything in either NullAttributeMapper . But first, on the Excel reader parameters, set Read Blank Cells As: Null (default is Missing). Then, use regex in NullAttributeMapper:
A-Za-z]|<
NullAttributeReplacer works better here than StringReplacer as it uses 'contains' so you can use a simpler regex expression, where as StringReplacer would require a more complete expression.
A great site for testing and understanding regex expression is: https://regexr.com/
thank you @markatsafe and everyone else for your help!