Skip to main content

Hi

I have a large csv file which has a column with multiple references in, this is 1 of the records;

2810X000000466,,APQ4LA805C34REUGJU,2845AP;2810X000000467,,4985,2845ER;2810X000000468,,361301760100,2845CT;2810X000000469,,DV361301760100,2845NC;2810X000000470,,APQ4LA805C34REUGJU,2845NA;2810X000189003,,BC,2845PS;2810X000267245,,E00137336,2845OA;2810X000267246,,E01027004,2845LS;2810X000267247,,E02005628,2845MS;2810X000331704,,C,284599;2810X000384460,,01/04/1993,284591

 

Each reference is prefixed with a comma, then the reference, comma then a colon. For example;

,361301760100,2845CT;

 

I am specifically looking to extract only the 'CT' cross reference. Can someone give me some advice on how to write a regualr expression to extract this reference using the StringSearcher in FME. please?

thanks

 

Simon Hume

 

As I see it, you have a 6 character word (in your example one is just numeric - 284599) between a coma and a semicolon so 

c,]\w{6}o;]

should do it. Then just trim , and ;

If you know your data always has CT on a specific column

s,]\w{4}nC]aT]s;]

 


,\w+,\w+CT;

seems to do the trick, provided there is only one 'CT;' substring, and everything between comma's meets the constrains set by using \w (The \w metacharacter is used to find characters from a-z, A-Z, 0-9, including the _ (underscore) character).


Reply