Solved

filtering out duplicate numbers

  • 8 September 2016
  • 8 replies
  • 8 views

So I have two main sources:

An sql script.

And an excel sheet.

I want to compare the attribute ADRNR from one source with

the other ADRNR1. If its a match then it shouldn't do anything because i didn't hook a writer to that and if it doesn't then it fails and should give me the values that aren't duplicates. I tried doing this with a simple tester but it doesn't work unfortunately. Here is how my tester looks like:

the Failed output should give me the values that aren't duplicates.

I think its because fme tucks every source nicely under eachother so that it can't compare it with eachother.

Is my assumption correct and how can I make this work? Do I need to combine the data from the sources first? I just want to do a comparison between the these attributes of different sources and filtering out the unique ones.

The workspace looks like this.

Thanks in advance.

icon

Best answer by takashi 13 September 2016, 12:09

View original

8 replies

Userlevel 2
Badge +16

Have a look at the FeatureMerger. Requester is SQL, Supplier is XLS. Join on the ADRNR. The Merged is all records that exist in both. The Not Merged exist only in SQL. The Unreferenced exist only in XLS.

Userlevel 5
Badge +26

Can you give a little bit more background about what you're trying to do? I have a feeling you may want to do a join (so join the data that's coming in from the Excel sheet(s) to the output of the SQLCreator, for example by using a FeatureMerger transformer)

Userlevel 5
Badge +26

Have a look at the FeatureMerger. Requester is SQL, Supplier is XLS. Join on the ADRNR. The Merged is all records that exist in both. The Not Merged exist only in SQL. The Unreferenced exist only in XLS.

Dang, you beat me to it ;)

 

 

Userlevel 4
Badge +26
Yeah, the key point there is that pulling connections together like that doesn't merge the data - it just accumulates it. Each record is still a separate entity, which is why your Tester doesn't work.

 

 

You could use a FeatureMerger as suggested by others here. Otherwise - since you seem to know SQL - you could try an InlineQuerier transformer. That puts all the incoming data into a virtual database table that you can query as you like.

 

Hey all thanks for the reply's. I've been fiddling with it yesterday and still can't get it working. So basically the different attribute needs to be in one table so that a couple of attributes can be compared with a tester. Unfortunately the featuremerger is not giving me any output at the merged sections (except for the not merged en duplicate supplier). This is how I've set up my featuremerger.

I've set up the sql script as the requestor and the excel sheets as a the suppliers. I just want the the attributes ADRNR1 ADRNR2 and ADRNR3 added into the table that comes out the excel sheet. It seems that featurmerger automatically wants everything merged into one field? I just want the other attributes added to the adrnr like it shows in the merged section. Like below. It'll show the fields next to eachother instead of tucking source under source.

I'm also fidling with the inlinesql though that would also need some puzzling still.

Got something like this

SELECT "ADRNR", "ADRNR1", "ADRNR2", "ADRNR3"

FROM RESULT, OUTPUT01, OUTPUT, OUTPUT00

WHERE ADRNR != ADRNR1

AND ADRNR != ADRNR2

AND ADRNR != ADRNR3

Unfortunately its in an endless loop for now. Do I need to specify the source at the select section too?

Like SELECT "OUTPUT01.ADRNR1", "OUTPUT.ADRNR"

Thanks for the help. I'll keep puzzling while I'll wait for some guidance.

Userlevel 3
Badge +17

Hi @adriaan, multiple "Join On" rows will be evaluated with "AND" condition always. In your setting, the FeatureMerger compares "ADRNR" of a requestor with "ADRNR1" AND "ADRNR2" AND "ADRNR3" for each supplier. However, a single supplier has just one of "ADRNR1" OR "ADRNR2" OR "ADRNR3", so every requestor will not match any supplier. I think it's the situation you have observed.

If your goal is to detect requestor features which do NOT match any supplier on "ADRNR", try changing the attribute name of all the suppliers to the same name (e.g. "ADRNR"), and then merge them to the requestor features. I think that your desired feature will be output via the NotMerged port.

I feel like such an idiot guys haha. Again thanks for the valuable input!

So I constantly was hitting a wall with the featuremerger and inlinequery. The featuremerger wasn't giving me the correct output and the inlinequery kept building up data (800000) records.

Today the lightbulb went on and I checked what kind of output my reader gave me and they all gave me missing records. I fiddled with the source now and then and I assumed it would auto update the readers with the new values. It didn't.

I also tested fme by making some changes in the excel source and readding it to fme. I deleted the old reader and added in the new reader and hooked it to the same file again (after I edited it) and it still searched for old attributes that I renamed in the meantime. It seems that FME saves readers in its cache and reuses those.

It did however updated the attributes after I renamed the file and added it as a reader. So when you change attributes in the excelsheet. Make sure to save it under a new name and then readd it in fme as a new reader.

Either the featuremerger of inlinequery worked after fixing this. Thanks!

Badge +16

I feel like such an idiot guys haha. Again thanks for the valuable input!

So I constantly was hitting a wall with the featuremerger and inlinequery. The featuremerger wasn't giving me the correct output and the inlinequery kept building up data (800000) records.

Today the lightbulb went on and I checked what kind of output my reader gave me and they all gave me missing records. I fiddled with the source now and then and I assumed it would auto update the readers with the new values. It didn't.

I also tested fme by making some changes in the excel source and readding it to fme. I deleted the old reader and added in the new reader and hooked it to the same file again (after I edited it) and it still searched for old attributes that I renamed in the meantime. It seems that FME saves readers in its cache and reuses those.

It did however updated the attributes after I renamed the file and added it as a reader. So when you change attributes in the excelsheet. Make sure to save it under a new name and then readd it in fme as a new reader.

Either the featuremerger of inlinequery worked after fixing this. Thanks!

Updating feature types usually helps, but sometimes I have noticed that re reading the xls data is necessary after adaptation of the source data.

 

 

Reply