Solved

comparison of data

  • 2 August 2018
  • 8 replies
  • 16 views

I have two files a csv file and an Excel file with 4 worksheets, I need to compare the two files to see if all records in the Excel file with all sheets are present in the csv file, there is a particular column in the Excel file, it is a string e.g. 'ff_abcd_12345', I need to get everything apart from the numbers from this and compare it to a field in the other file. There are thousands of records which need to be compared, all I need to do is check if the record exists and report it as matched or unmatched. Any ideas on how I deal with this will be much appreciated.

icon

Best answer by takashi 22 August 2018, 17:01

View original

8 replies

Badge +2

Hi @saqibamin,

Since the excel file is over 4 sheets when you read in the data you can check all the sheets are being read in using the Reader parameter settings and then set it to a Single Merged Feature type if the schema is the same - this way you will only have 1 Excel reader to work with rather than 4.

To extract everything apart from the numbers from the string in the excel file e.g. ff_abcd you could either use a StringSearcher containing the regular expression [a-z]{2}_[a-z]{4}. This will only work if every feature has the same expression (if the number of letters varies this regex can be altered). If not you could use an AttributeSplitter with the delimiter = _ and then an AttributeCreator to join list{0} = ff and list{1} =abcd back together again.

Then for the comparison I think in your case the Matcher transformer is the way to go. You can match based on selected attributes and if there are any without a match these will come out of a separate port.

Note: for the matcher to work the attributes to compare should have the same name.

Hi @saqibamin,

Since the excel file is over 4 sheets when you read in the data you can check all the sheets are being read in using the Reader parameter settings and then set it to a Single Merged Feature type if the schema is the same - this way you will only have 1 Excel reader to work with rather than 4.

To extract everything apart from the numbers from the string in the excel file e.g. ff_abcd you could either use a StringSearcher containing the regular expression [a-z]{2}_[a-z]{4}. This will only work if every feature has the same expression (if the number of letters varies this regex can be altered). If not you could use an AttributeSplitter with the delimiter = _ and then an AttributeCreator to join list{0} = ff and list{1} =abcd back together again.

Then for the comparison I think in your case the Matcher transformer is the way to go. You can match based on selected attributes and if there are any without a match these will come out of a separate port.

Note: for the matcher to work the attributes to compare should have the same name.

@hollyatsafe If I was to compare against multiple columns, e.g. I have two spreadsheets, the first spreadsheet contains records which have a “Lane_Area”, these need to be compared to a second spreadsheet which has specific named columns which have “Lane_Area1”, “Lane_Area2” etc.

 

 

In the first spreadsheet there is an attribute called “Area_Id” which corresponds with the number after “Lane_Area” in spreadsheet 2, e.g 1 to 20.

 

 

I need to compare the area for each record in spreadsheet 1 to a record and column in spreadsheet 2.

 

 

Instead of just comparing a row in one spreadsheet to a row in another spreadsheet I am actually comparing the columns as well.

 

 

Is there a way I can do this?

 

 

 

Badge +2
@hollyatsafe If I was to compare against multiple columns, e.g. I have two spreadsheets, the first spreadsheet contains records which have a “Lane_Area”, these need to be compared to a second spreadsheet which has specific named columns which have “Lane_Area1”, “Lane_Area2” etc.

 

 

In the first spreadsheet there is an attribute called “Area_Id” which corresponds with the number after “Lane_Area” in spreadsheet 2, e.g 1 to 20.

 

 

I need to compare the area for each record in spreadsheet 1 to a record and column in spreadsheet 2.

 

 

Instead of just comparing a row in one spreadsheet to a row in another spreadsheet I am actually comparing the columns as well.

 

 

Is there a way I can do this?

 

 

 

Hi @saqibamin,

 

I think this could be done but it won't be neat. Here is the solution I have come up with:

 

1. From sheet 1 change the Area ID to display attributes how they appear in the columns:

2. From sheet 2 use the AttributeExploder to change the column headers into an additional row - you can then look for a match to both the area and the ID.

 

3. Use a FeatureJoiner with sheet 1 as the Left and sheet 2 as the Right input. Leave the Join mode set to Inner and set up the two attributes to join on. Then any records that come out of the Unjoined Left are those that appear in Sheet 1 but are not present in Sheet 2.

 

Please let me know if you have any problems setting this up.

I'm attaching two screenshots samples of what I am trying to compare

 

 

 

 

Book2 is the challenging one, as if I do a transpose, the name becomes a column and I can't then uniquely identify the row.

 

 

Any suggestions will be much appreciated.

 

Hi @saqibamin,

 

I think this could be done but it won't be neat. Here is the solution I have come up with:

 

1. From sheet 1 change the Area ID to display attributes how they appear in the columns:

2. From sheet 2 use the AttributeExploder to change the column headers into an additional row - you can then look for a match to both the area and the ID.

 

3. Use a FeatureJoiner with sheet 1 as the Left and sheet 2 as the Right input. Leave the Join mode set to Inner and set up the two attributes to join on. Then any records that come out of the Unjoined Left are those that appear in Sheet 1 but are not present in Sheet 2.

 

Please let me know if you have any problems setting this up.

Thanks for your input @hollyatsafe, I have provided some more info via screenshots

 

Badge +2
Thanks for your input @hollyatsafe, I have provided some more info via screenshots

 

Hi @saqibamin,

 

Did you have any success using the method I described above? Can you attach the two spreadsheets as files so that I can see if I can find another method to do this?

 

Thanks for your input @hollyatsafe, I have provided some more info via screenshots

 

@hollyatsafe,

 

 

Please find atached the two excel files: book1.xlsx and book2.xlsx

 

 

Userlevel 2
Badge +17

Hi @saqibamin, If you need to reform the Book2 table structure into the same as the Book1, this workflow is a possible way.

0684Q00000ArL80QAF.png

JSON Template Expression (Basic):

[
    {"Item" : "FDER", "sum" : fme:get-attribute("FDER")},
    {"Item" : "HTYU", "sum" : fme:get-attribute("HTYU")},
    {"Item" : "GHJK", "sum" : fme:get-attribute("GHJK")},
    {"Item" : "JILN", "sum" : fme:get-attribute("JILN")},
    {"Item" : "KILJ", "sum" : fme:get-attribute("KILJ")}
]

JSON Template Expression (Advanced): The Tester is not necessary in the workflow above if you use this expression.

[
    for $item in ("FDER", "HTYU", "GHJK", "JILN", "KILJ")
    let $sum := xs:double(fme:get-attribute($item))
    where $sum != 0
    return {"Item" : $item, "sum" : $sum}
]

Reply