Question

InlineQuerier table update using substr, LIKE, and SET

  • 9 August 2018
  • 7 replies
  • 3 views

Badge

I am really a neophyte at SQL so could use some help. I have 2 tables, my primary data is a dgn file and the table to update from is an Excel spreadsheet.

dgn file:

0684Q00000Ar9t8QAB.jpg 

excel:

0684Q00000ArFFdQAN.jpg

I want to join the excel to the dgn file when the first 12 characters of igds_level_name contains the NameCode in the Excel file. I will then update the dgn.Category from the excel.Category. My output should only be the dgn file.

The last attempt at making this work:

SELECT * FROM Output
    LEFT JOIN "US36_dgn_Categories00" ON (SUBSTR("igds_level_name",1,12) LIKE "US36_dgn_Categories00"."NameCode")
    SET "Output"."Category" = "US36_dgn_Categories00"."Category"

7 replies

Userlevel 2
Badge +17

Hi @tsurveyor, I think this SQL statement works as "almost" intended. You should remove the AttributeCreator from your workflow.

SELECT a.*, b."Category"
FROM "Output" as a
LEFT JOIN "US36_dgn_Categories00" as b
ON substr(a."igds_level_name", 1, 12) LIKE '%'||replace(b."NameCode", '_', '@_')||'%' ESCAPE '@'

Note: Since the underscore '_' is a special character that indicates any character in the right operand of the LIKE operator, you will have to escape it by a character specified with the ESCAPE clause. I specified '@' to escape the special character in the example above, and replaced '_' in the value of "NameCode" with '@_'. 

Then, why "almost"? 

The statement above outputs two features for each DGN feature which has a string value starting with "ROW_EASE" in the "igds_level_name" attribute, since the string contains both "EASE" (Category = "Easement") and "ROW_" (Category = "Boundary"). Possibly there could be other similar cases.

I don't know if it's your intention, but if you don't like those duplicates, you will have to add a process to remove duplicates after the InlineQuerier.

Badge

Hi @tsurveyor, I think this SQL statement works as "almost" intended. You should remove the AttributeCreator from your workflow.

SELECT a.*, b."Category"
FROM "Output" as a
LEFT JOIN "US36_dgn_Categories00" as b
ON substr(a."igds_level_name", 1, 12) LIKE '%'||replace(b."NameCode", '_', '@_')||'%' ESCAPE '@'

Note: Since the underscore '_' is a special character that indicates any character in the right operand of the LIKE operator, you will have to escape it by a character specified with the ESCAPE clause. I specified '@' to escape the special character in the example above, and replaced '_' in the value of "NameCode" with '@_'. 

Then, why "almost"? 

The statement above outputs two features for each DGN feature which has a string value starting with "ROW_EASE" in the "igds_level_name" attribute, since the string contains both "EASE" (Category = "Easement") and "ROW_" (Category = "Boundary"). Possibly there could be other similar cases.

I don't know if it's your intention, but if you don't like those duplicates, you will have to add a process to remove duplicates after the InlineQuerier.

Takashi, THANKS! The duplication problem is why I was originally trying to do a step iteration (conditional loop) through the excel file, but it seems that would not work. I had created the "Category" attribute in table a and assigned Null to the value. I was then going to use a WHERE to update (SET) the new Category field only if it was Null - stopping additional joins. I just couldn't figure out how to make that work. This does increase features by almost 50% but then running duplicate deletions might be the way to go.

 

Really appreciate your help. Hopefully my SQL will get better. Logic I have, syntax is another matter.

 

 

Userlevel 2
Badge +17

Hi @tsurveyor, I think this SQL statement works as "almost" intended. You should remove the AttributeCreator from your workflow.

SELECT a.*, b."Category"
FROM "Output" as a
LEFT JOIN "US36_dgn_Categories00" as b
ON substr(a."igds_level_name", 1, 12) LIKE '%'||replace(b."NameCode", '_', '@_')||'%' ESCAPE '@'

Note: Since the underscore '_' is a special character that indicates any character in the right operand of the LIKE operator, you will have to escape it by a character specified with the ESCAPE clause. I specified '@' to escape the special character in the example above, and replaced '_' in the value of "NameCode" with '@_'. 

Then, why "almost"? 

The statement above outputs two features for each DGN feature which has a string value starting with "ROW_EASE" in the "igds_level_name" attribute, since the string contains both "EASE" (Category = "Easement") and "ROW_" (Category = "Boundary"). Possibly there could be other similar cases.

I don't know if it's your intention, but if you don't like those duplicates, you will have to add a process to remove duplicates after the InlineQuerier.

This statement selects the "Category" having minimum "ReadOrder" in the matched records, if two or more Excel records matched a DGN feature. Make sure that the Type of "ReadOrder" is set to "integer" in the InlineQuerier Inputs configuration.

 

SELECT
    a.*,
    (
        SELECT "Category" FROM "US36_dgn_Categories00"
        WHERE substr(a."igds_level_name", 1, 12)
            LIKE '%'||replace("NameCode", '_', '@_')||'%' ESCAPE '@'
        ORDER BY "ReadOrder" LIMIT 1
    ) as "Category"
FROM "Output" as a 

 

Badge

Hi @tsurveyor, I think this SQL statement works as "almost" intended. You should remove the AttributeCreator from your workflow.

SELECT a.*, b."Category"
FROM "Output" as a
LEFT JOIN "US36_dgn_Categories00" as b
ON substr(a."igds_level_name", 1, 12) LIKE '%'||replace(b."NameCode", '_', '@_')||'%' ESCAPE '@'

Note: Since the underscore '_' is a special character that indicates any character in the right operand of the LIKE operator, you will have to escape it by a character specified with the ESCAPE clause. I specified '@' to escape the special character in the example above, and replaced '_' in the value of "NameCode" with '@_'. 

Then, why "almost"? 

The statement above outputs two features for each DGN feature which has a string value starting with "ROW_EASE" in the "igds_level_name" attribute, since the string contains both "EASE" (Category = "Easement") and "ROW_" (Category = "Boundary"). Possibly there could be other similar cases.

I don't know if it's your intention, but if you don't like those duplicates, you will have to add a process to remove duplicates after the InlineQuerier.

PERFECT! Again, I thank you.

 

 

Badge

Hi @tsurveyor, I think this SQL statement works as "almost" intended. You should remove the AttributeCreator from your workflow.

SELECT a.*, b."Category"
FROM "Output" as a
LEFT JOIN "US36_dgn_Categories00" as b
ON substr(a."igds_level_name", 1, 12) LIKE '%'||replace(b."NameCode", '_', '@_')||'%' ESCAPE '@'

Note: Since the underscore '_' is a special character that indicates any character in the right operand of the LIKE operator, you will have to escape it by a character specified with the ESCAPE clause. I specified '@' to escape the special character in the example above, and replaced '_' in the value of "NameCode" with '@_'. 

Then, why "almost"? 

The statement above outputs two features for each DGN feature which has a string value starting with "ROW_EASE" in the "igds_level_name" attribute, since the string contains both "EASE" (Category = "Easement") and "ROW_" (Category = "Boundary"). Possibly there could be other similar cases.

I don't know if it's your intention, but if you don't like those duplicates, you will have to add a process to remove duplicates after the InlineQuerier.

In fact, this will work so that substrings are not necessary. I originally wanted to parse the levels by words in the entire field but was using substring to minimize duplicates. This puts my efforts at CADD processing to the next level.

 

 

Userlevel 4
Badge +13
In fact, this will work so that substrings are not necessary. I originally wanted to parse the levels by words in the entire field but was using substring to minimize duplicates. This puts my efforts at CADD processing to the next level.

 

 

One final thought on this -- if there is a very large volume of data going through, you might want to avoid doing the SQL on every single feature. It still feels to me that this could be done once and produce a table from every level name to the category. Then that table coul be used in a new workspace that would actually do the data transformation, because a static lookup (AttributeValueMapper) (or DatabaseJoiner) would be much faster that going through an InlineQuerier.

 

 

If performance is not an issue, please disregard my thoughts here. But if it is, I'd suggest trying to do this once ahead of time, and then apply the results to the actual data later.

 

 

Badge
One final thought on this -- if there is a very large volume of data going through, you might want to avoid doing the SQL on every single feature. It still feels to me that this could be done once and produce a table from every level name to the category. Then that table coul be used in a new workspace that would actually do the data transformation, because a static lookup (AttributeValueMapper) (or DatabaseJoiner) would be much faster that going through an InlineQuerier.

 

 

If performance is not an issue, please disregard my thoughts here. But if it is, I'd suggest trying to do this once ahead of time, and then apply the results to the actual data later.

 

 

Not sure what you are suggesting. Even if I capture all of the unique level names I still have the problem of assigning a category to those items. But on the other hand I am doing it only once if I slightly modify your suggestion. So I see your suggestion more in this way:

 

1. Extract all of the unique level names

 

2. Use the InlineQuerier to assign a category to each of these unique names

 

3. Use this new unique_level_names/category table to then merge with the CAD features

 

I can see where this would be faster because the SQL is only going through one of each instead of thousands, and then the featuremerger (I am using 2017) only has to join.

 

Do I have it right?

 

 

 

 

Reply