Skip to main content
Question

Regex with cyrillic words don't work in ListSearcher


Hi,

I faced with issue: regex with cyrillic symbols don't return any 'found' result in ListSearcher, while it works for StringSearcher and in Regex Editor test.

Regular expression: (?i:^??\\.?????\\s?\\d+\\/?\\d*\\s?[?-??-?]?$)

 

Test list values:

 

_list_{0} (encoded: UTF-8): ??????? ?????????? ??

 

_list_{1} (encoded: UTF-8): ??? 1?

 

_list_{2} (encoded: UTF-8): ??? "????????"

 

_list_{3} (encoded: UTF-8): 2 ????

 

 

Expected result: Regex should match on string _list_{1} (encoded: UTF-8): ??? 1? and index attribute should be set to 1.

 

Parameters of ListSearcher on below picture:

 

 

I have checked other regex without cyrillic and it return result in ListSearcher. So I wonder is that any issue with handling regex with cyrillic inside ListSearcher or it is something else?

Thanks in advance for answers!

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

Forum|alt.badge.img+2

@wolejims I have been able to reproduce the issue you reported with the ListSearcher failing to identify cyrillic strings. We'll try and get this fixed. I've attached the example workspace that reproduces the problem (2019.0): listsearcherwithcyrilliccharacters.fmw


  • Author
  • May 23, 2019
markatsafe wrote:

@wolejims I have been able to reproduce the issue you reported with the ListSearcher failing to identify cyrillic strings. We'll try and get this fixed. I've attached the example workspace that reproduces the problem (2019.0): listsearcherwithcyrilliccharacters.fmw

@markatsafe Thank you for having taken up this.


ebygomm
Influencer
Forum|alt.badge.img+43
  • Influencer
  • May 23, 2019

You should be able to use python in place of the listsearcher until this is fixed

Store the regex in an attribute called regex (without the ?=: at the beginning

Then a python caller to search the list and return the index of the first match

import fme
import fmeobjects
import re

def listsearch(feature):
   
    regex = feature.getAttribute('regex')
    list = feature.getAttribute('_list{}')
    matched_elements = []
    for i, value in enumerate(list):
        match = re.match(regex, value, re.IGNORECASE)
        if match:
            matched_elements.append(i)
    if len(matched_elements) == 0:
        feature.setAttribute('first_match',"none")
    else:
        feature.setAttribute('first_match',matched_elements[0])

listsearcherwithcyrilliccharacters_python.fmw


  • Author
  • June 5, 2019
ebygomm wrote:

You should be able to use python in place of the listsearcher until this is fixed

Store the regex in an attribute called regex (without the ?=: at the beginning

Then a python caller to search the list and return the index of the first match

import fme
import fmeobjects
import re

def listsearch(feature):
   
    regex = feature.getAttribute('regex')
    list = feature.getAttribute('_list{}')
    matched_elements = []
    for i, value in enumerate(list):
        match = re.match(regex, value, re.IGNORECASE)
        if match:
            matched_elements.append(i)
    if len(matched_elements) == 0:
        feature.setAttribute('first_match',"none")
    else:
        feature.setAttribute('first_match',matched_elements[0])

listsearcherwithcyrilliccharacters_python.fmw

Many thanks for script @egomm, it works for me.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings