Question

Regex with cyrillic words don't work in ListSearcher

4 years ago
21 May 2019
4 replies
0 views

wolejims
2 replies

Hi,

I faced with issue: regex with cyrillic symbols don't return any 'found' result in ListSearcher, while it works for StringSearcher and in Regex Editor test.

Regular expression: (?i:^??\\.?????\\s?\\d+\\/?\\d*\\s?[?-??-?]?$)

Test list values:

_list_{0} (encoded: UTF-8): ??????? ?????????? ??

_list_{1} (encoded: UTF-8): ??? 1?

_list_{2} (encoded: UTF-8): ??? "????????"

_list_{3} (encoded: UTF-8): 2 ????

Expected result: Regex should match on string _list_{1} (encoded: UTF-8): ??? 1? and index attribute should be set to 1.

Parameters of ListSearcher on below picture:

I have checked other regex without cyrillic and it return result in ListSearcher. So I wonder is that any issue with handling regex with cyrillic inside ListSearcher or it is something else?

Thanks in advance for answers!

4 replies

M

+2

@wolejims I have been able to reproduce the issue you reported with the ListSearcher failing to identify cyrillic strings. We'll try and get this fixed. I've attached the example workspace that reproduces the problem (2019.0): listsearcherwithcyrilliccharacters.fmw

W

wolejims
Author
2 replies
4 years ago
23 May 2019

@wolejims I have been able to reproduce the issue you reported with the ListSearcher failing to identify cyrillic strings. We'll try and get this fixed. I've attached the example workspace that reproduces the problem (2019.0): listsearcherwithcyrilliccharacters.fmw

@markatsafe Thank you for having taken up this.

Userlevel 1

+10

ebygomm
Participant
3078 replies
4 years ago
23 May 2019

You should be able to use python in place of the listsearcher until this is fixed

Store the regex in an attribute called regex (without the ?=: at the beginning

Then a python caller to search the list and return the index of the first match

import fme
import fmeobjects
import re

def listsearch(feature):
   
    regex = feature.getAttribute('regex')
    list = feature.getAttribute('_list{}')
    matched_elements = []
    for i, value in enumerate(list):
        match = re.match(regex, value, re.IGNORECASE)
        if match:
            matched_elements.append(i)
    if len(matched_elements) == 0:
        feature.setAttribute('first_match',"none")
    else:
        feature.setAttribute('first_match',matched_elements[0])

listsearcherwithcyrilliccharacters_python.fmw

W

wolejims
Author
2 replies
4 years ago
5 June 2019

You should be able to use python in place of the listsearcher until this is fixed

Store the regex in an attribute called regex (without the ?=: at the beginning

Then a python caller to search the list and return the index of the first match

import fme
import fmeobjects
import re

def listsearch(feature):
   
    regex = feature.getAttribute('regex')
    list = feature.getAttribute('_list{}')
    matched_elements = []
    for i, value in enumerate(list):
        match = re.match(regex, value, re.IGNORECASE)
        if match:
            matched_elements.append(i)
    if len(matched_elements) == 0:
        feature.setAttribute('first_match',"none")
    else:
        feature.setAttribute('first_match',matched_elements[0])

listsearcherwithcyrilliccharacters_python.fmw

Many thanks for script @egomm, it works for me.

Regex with cyrillic words don't work in ListSearcher

4 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded