Question

Regex with cyrillic words don't work in ListSearcher


Hi,

I faced with issue: regex with cyrillic symbols don't return any 'found' result in ListSearcher, while it works for StringSearcher and in Regex Editor test.

Regular expression: (?i:^??\\.?????\\s?\\d+\\/?\\d*\\s?[?-??-?]?$)

 

Test list values:

 

_list_{0} (encoded: UTF-8): ??????? ?????????? ??

 

_list_{1} (encoded: UTF-8): ??? 1?

 

_list_{2} (encoded: UTF-8): ??? "????????"

 

_list_{3} (encoded: UTF-8): 2 ????

 

 

Expected result: Regex should match on string _list_{1} (encoded: UTF-8): ??? 1? and index attribute should be set to 1.

 

Parameters of ListSearcher on below picture:

 

 

I have checked other regex without cyrillic and it return result in ListSearcher. So I wonder is that any issue with handling regex with cyrillic inside ListSearcher or it is something else?

Thanks in advance for answers!

 


4 replies

Badge +2

@wolejims I have been able to reproduce the issue you reported with the ListSearcher failing to identify cyrillic strings. We'll try and get this fixed. I've attached the example workspace that reproduces the problem (2019.0): listsearcherwithcyrilliccharacters.fmw

@wolejims I have been able to reproduce the issue you reported with the ListSearcher failing to identify cyrillic strings. We'll try and get this fixed. I've attached the example workspace that reproduces the problem (2019.0): listsearcherwithcyrilliccharacters.fmw

@markatsafe Thank you for having taken up this.

Userlevel 1
Badge +10

You should be able to use python in place of the listsearcher until this is fixed

Store the regex in an attribute called regex (without the ?=: at the beginning

Then a python caller to search the list and return the index of the first match

import fme
import fmeobjects
import re

def listsearch(feature):
   
    regex = feature.getAttribute('regex')
    list = feature.getAttribute('_list{}')
    matched_elements = []
    for i, value in enumerate(list):
        match = re.match(regex, value, re.IGNORECASE)
        if match:
            matched_elements.append(i)
    if len(matched_elements) == 0:
        feature.setAttribute('first_match',"none")
    else:
        feature.setAttribute('first_match',matched_elements[0])

listsearcherwithcyrilliccharacters_python.fmw

You should be able to use python in place of the listsearcher until this is fixed

Store the regex in an attribute called regex (without the ?=: at the beginning

Then a python caller to search the list and return the index of the first match

import fme
import fmeobjects
import re

def listsearch(feature):
   
    regex = feature.getAttribute('regex')
    list = feature.getAttribute('_list{}')
    matched_elements = []
    for i, value in enumerate(list):
        match = re.match(regex, value, re.IGNORECASE)
        if match:
            matched_elements.append(i)
    if len(matched_elements) == 0:
        feature.setAttribute('first_match',"none")
    else:
        feature.setAttribute('first_match',matched_elements[0])

listsearcherwithcyrilliccharacters_python.fmw

Many thanks for script @egomm, it works for me.

Reply