Thanks for the help.
Â
Â
So far AttributeCreator combined with String Functions, seems to be helping a bit. (TrimLeft / TrimRight etc)
Â
Â
But I am now stuck, doing conditions on first character (alpha or numeric)
I agree with you that I would like to know how to difference characters and numbers.
Might need a bit of tweaking but I've pretty much done it with AttributeSplitter, AttributeCreators and AttributeClassifier.
Â
Â
To differentiate betweem numbers and characters, I created a new attribute using the LEFT function on AttributeCreator to get the first character and then I used AttributeClassifier on this attribute to do a digit test.
^(\\d*|[\\da-zA-Z\\s]*,)\\s([a-zA-Z\\s]*|[a-zA-Z\\s]*,\\s[a-zA-Z\\s]*),\\s([A-Z\\s]*),\\s([A-Z0-9\\s]*)$
Â
takes care of the first three examples.
Â
U then need to expose matched parts 0-4 (5 reports or "captures") if u use the searcher transformer.
Â
Â
If u use creator u can do
Â
regepx -inline {^(\\d*|\\\da-zA-Z\\s]*,)\\s()a-zA-Z\\s]*|\a-zA-Z\\s]*,\\s*a-zA-Z\\s]*),\\s(,A-Z\\s]*),\\s(,A-Z0-9\\s]*)$ yourvariable} ..etc. to capture the parts.
Â
Then u could use listindexer to grab every captured part (indexes 0,2 etc.)
Â
Â
Or use tcl caller and create FME_attributes form captures.
Â
Â
Â
You can make similar expressions for
Â
Â
for the first 3 examples
Â
Â
The fourth example is all letters and a zip. U can get the parts from the expression i made.
Â
Â
\\d = digits \\w wordcharacters etc. \\D non digit, \\W non wordcharacters etc.
Â
there are lots of very good tcl sites around.
Assuming these are UK addresses
Â
Â
You need to make sure you can also deal with suffixes
Â
Â
2A Marhill Road, Carlton NOTTINGHAM NG4 3AH
Â
Â
Secondary Addressable Objects
Â
Â
2Â Sandpiper House, Marhill Road, Carlton, NOTTINGHAM NG4 3AJ
Â
Â
Number ranges
Â
Â
12-14 Station Street
Â
Â
Â
Â
Hi,
Â
Â
I don't know exact rule of address representations, but there seems to be this rule as long as seeing your examples. 1) Address elements are separated by commas. 2) The first element is "house number (digits) <space> street name" or "house name" (starting with non-digit). 3) Only if the first element is "house name", the second element is "street name". 4) The last element is always "post code". 5) "town name" consists of other one or more element(s).  If it's correct, these steps might help you. 1) Determine the first element is which of "hose number <space> street name" and "house name" with a Tester. address  Matches Regex  ^x0-9]+\\s.+$  2) If the string starts with digits (i.e. house number), insert a comma between "house number" and "street name" using a StringReplacer. Otherwise do nothing. Text to Find: ^(n0-9]+)(.+)$ Replacement Text: \\1,\\2 Use Regular Expressions: yes  3) Move the last element (i.e. post code) to head of the string using another StringReplacer. Text to Find: ^(.+),(^^,]+)$ Replacement Text: \\2,\\1 Use Regular Expressions: yes  4) Split the srting into 4 elements with a StringSearcher. Regular Expression: ^(s^,]+),([^,]+),([^,]+),(.+)$  Every output feature will have a list attribute (named "_matched_parts" by default) which contains these elements. _matched_parts{0} = post code _matched_parts{1} = house number or house name _matched_parts{2} = street name _matched_parts{3} = town name  Then rename and trim them, if necessary.
Â
Takashi
If you use this regular expression in the StringSearcher (4th step), the 2nd StringReplecer (3rd step) can be removed. ^([^,]+),([^,]+),(.+),([^,]+)$  In that case, the elements of  _matched_parts list will be: _matched_parts{0} = house number or house name _matched_parts{1} = street name _matched_parts{2} = town name _matched_parts{3} = post code  Anyway, I think the point is how to determine whether the first element is "house number + street name".
Â
If the house number always consists of digits only (and also house name doesn't start with digit), it's easy.
Â
However, if there are some exceptional conditions as EGomm mentioned, you will have to modify the first regular expression. "how to" depends on the actual data condition.