Question

Inline Query

  • 13 February 2014
  • 6 replies
  • 14 views

Badge
Hi,

 

 

I am trying to use the Inline Querier to join some data together from 2 sets for inputs CSVs, but am having some problems writing the query. Can someone help?

 

 

‘Table 1’ consists of the following fields

 

1 SurveyDate (yymmdd)

 

2 X

 

3 Y

 

4 Value (INT)

 

 

e.g.

 

140211

 

1000

 

2000

 

15

 

 

 

‘Table 2’ consists of the following fields

 

1 SurveyDate (yymmdd)

 

2 StartValue (INT)

 

3 EndTime (INT)

 

4 Road (Text)

 

5 Carriageway (Text)

 

 

140211

 

10

 

500

 

FME Street

 

L(eft side)

 

 

 

I am trying to complete join which is as follows

 

 

SELECT *

 

FROM

 

Table 1, Table 2

 

WHERE

 

Table1.SurveyDate = Table2.SurveyDate

 

AND

 

Table 1.Value >= Table 2. StartTime

 

AND

 

Table 1.Value<Table2.EndTime

 

 

 

I need to do various FME operations before and after the join, therefore ideally I want to the query within FME rather than having to write out the data, use a DB to perform the join and the and reread the data back into FME.

 

 

Thanks in advance.

6 replies

Badge +11
Hi Rob, 

 

 

I suggest you use a FeatureMerger for joining the CSV files.

 

Afterwards, you can use the Tester/TestFilter to check the Start and End time conditions.

 

 

 

 

best regards, 

 

Jelle

 

 

Badge
Hi Jelle,

 

 

Thanks very much for the answer, your suggestion is actually how I am currently completing this work, however it seemed quite slow based on the number of points I have in my files. I though it may potentailly be a quicker exercise to complete the join/test within a 'DB' SQL type environment.

 

 

Regards,

 

 

Rob

 

 

Badge +11
this article might interest you: http://blog.safe.com/2011/05/fmeevangelist79/
Userlevel 2
Badge +17
Hi Rob,

 

 

Of course the InlieQuerier can be used, and high performance can be expected.

 

In the SQL statement, table or field name has to be surrounded by [ ] if it contains white space. And be careful not to make typos ...

 

 

Takashi
Badge +3
Hi,

 

 

You could extract the data form the csv's and build 2 lists, use SurveyDate as the group by attribute.

 

 

If Surveydate is not unique, you need to use a combination wich is to build the list. Mybe the X,Y. Then u need to build the list by grouping on X,Y.

 

Basicaly u need to simulate an unique index. (this goes for both tables of course)

 

Surveydate the would become a simulated foreignkey.

 

 

U then have table1_list and table2_list (use featureype to append to listname or enter manually))

 

 

U can simulate Oracle full join by doing a match on the lists.Input both lists to 1 matcher. Choose SurveyDate as attributes to match, no geometry.

 

Use only single match output.

 

U can explode it and perform the bolean expression on the result using a simple expressionevaluator.

 

 

It totals only a couple of transformer.

 

 

 

 

 

I rebuild an old MSQL file we had to an fme workbench using the described technique.

 

In your case u are joining on 1 attribute only, so its less convulated then my script.

 

 

Badge
Hi,

 

 

Thanks for all the suggestions, I have managed to build the query that I need within the transformer and tested it with a small example dataset, (thanks for the tip about the square brackets, Takashi).

 

 

Essentially I had previously used to the feature merger to create an join, by creating a list and then exploding it I used a tester to establish whether the survey time was within the required range. The problem was that potentially as part feature merger step the number of survey record points was multiplied by the number of records within Table 2 (100000*400 = 4000000). After completing this step, evidently the test step would return only the required and correct matches within the range. However, I am sure that this is probably very inefficient as you create huge numbers of records only to discard most of them. I realise that the SQL is effectively completing the same processes but I am hoping that with the steps 1 and 2 being completed within the query and DB environment will be more efficient.

 

 

I need to just test the process using the production datasets.

 

 

Thanks again.

 

 

Best wishes,

 

 

Rob

Reply