Skip to main content
Question

Extracting text from column


simonhume
Contributor
Forum|alt.badge.img+2

Hi

I have a large csv file which has a column with multiple references in, this is 1 of the records;

2810X000000466,,APQ4LA805C34REUGJU,2845AP;2810X000000467,,4985,2845ER;2810X000000468,,361301760100,2845CT;2810X000000469,,DV361301760100,2845NC;2810X000000470,,APQ4LA805C34REUGJU,2845NA;2810X000189003,,BC,2845PS;2810X000267245,,E00137336,2845OA;2810X000267246,,E01027004,2845LS;2810X000267247,,E02005628,2845MS;2810X000331704,,C,284599;2810X000384460,,01/04/1993,284591

 

Each reference is prefixed with a comma, then the reference, comma then a colon. For example;

,361301760100,2845CT;

 

I am specifically looking to extract only the 'CT' cross reference. Can someone give me some advice on how to write a regualr expression to extract this reference using the StringSearcher in FME. please?

thanks

 

Simon Hume

 

2 replies

caracadrian
Contributor
Forum|alt.badge.img+23
  • Contributor
  • August 13, 2021

As I see it, you have a 6 character word (in your example one is just numeric - 284599) between a coma and a semicolon so 

[,]\w{6}[;]

should do it. Then just trim , and ;

If you know your data always has CT on a specific column

[,]\w{4}[C][T][;]

 


geomancer
Evangelist
Forum|alt.badge.img+48
  • Evangelist
  • August 13, 2021
,\w+,\w+CT;

seems to do the trick, provided there is only one 'CT;' substring, and everything between comma's meets the constrains set by using \w (The \w metacharacter is used to find characters from a-z, A-Z, 0-9, including the _ (underscore) character).


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings