Solved

SQL: aggregate table data in subgroups

9 years ago
January 15, 2016
22 replies
67 views

peteralstorp
Contributor
91 replies

Hi, fellow FMEers.

I have a simple yet complex problem that I need to solve with SQL (SQLite) for my FME-script to run efficiantly. This may not be the correct forum, but since there are so many talented people here who enjoy helping out and solving complex problem I thought I'd give it a try here anyway. I have data I want to aggregate but in subgroups - taking z-level into account. It's a geological profile where I want to simplify the incoming data before I create my profile.

Example data

x, y, z_from, z_to, value

1,2,1,2,A

1,2,2,3,A

1,2,3,4,B

1,2,4,5,A

1,2,5,6,A

1,2,6,7,A

1,2,7,8,C

with SQL I want to aggregate on value but only rows that are adjacent (sort of a dissolve but on non-spatial data). And I want to calculate (also in sql) the correct z_from- and z_to-values. So the result from the example data above should be:

x, y, z_from, z_to, value1,2,1,3,A

1,2,3,4,B1,2,4,7,A1,2,7,8,C

Any help on this will be remembered for all eternity.

Peter

Best answer by jeroenstiers

Hi peteralstorp,

I don't know how to solve this using only SQL but I would like to share a solution using Python because I think it is an easy solution.

The idea is that you store the first feature with a different value than the last one. The moment there is a feature with a different value than the one stored, you update the z_to-value in this stored feature with the last imported feature and export this changed feature to the workspace.

Just let me know if anything is unclear.

Good luck!

import fme
import fmeobjects

# Template Class Interface:
class dissolveData(object):  
    
    def __init__(self):
        self.feature = -1
        self.cValue = -1
        self.to_z = -1 

    def input(self,feature):
                
        # Check if this feature has a value that is different than the last value
        if feature.getAttribute('value') != self.cValue or self.cValue == -1:     
            # If that is the case, the feature in self.feature should get the z_to value
            # of the last feature and be exported towards the workspace.
            if self.feature != -1:
                self.outputFeature()
            
            # Store this feature as the base of the next output
            self.feature = feature
            self.cValue = feature.getAttribute('value')    
            
        # Update the z_to-variable
        self.z_to = feature.getAttribute('z_to')
 
    def close(self):
        # Making sure we also export the last 'dissolved' feature.
        self.outputFeature()

    def outputFeature(self):
	# Update the z_to attribute from in the stored feature and export it
        self.feature.setAttribute("z_to", self.z_to)
        self.pyoutput(self.feature)

View original

Did this help you find an answer to your question?

takashi
7723 replies
9 years ago
January 15, 2016

Seems to be hard to do that with SQL statement only. I think that using regular transformers is easier.

Do you have to accomplish it with SQL anyway?

peteralstorp
Author
Contributor
91 replies
9 years ago
January 15, 2016

takashi wrote:

Seems to be hard to do that with SQL statement only. I think that using regular transformers is easier.

Do you have to accomplish it with SQL anyway?

Hi T. And thanks for your concern. I would really like to solve it with SQL since this will be a recurring problem that needs fixing. At least I want to avoid a bunch of transformers that are difficult to understand. Is there an easy way to solve this via a regular transformer that you know of please let me know, and I will try it out.

peteralstorp
Author
Contributor
91 replies
9 years ago
January 15, 2016

takashi wrote:

Seems to be hard to do that with SQL statement only. I think that using regular transformers is easier.

Do you have to accomplish it with SQL anyway?

To clarify: if you see a somewhat simple solution using regular transformers please let me know

takashi
7723 replies
9 years ago
January 15, 2016

takashi wrote:

Seems to be hard to do that with SQL statement only. I think that using regular transformers is easier.

Do you have to accomplish it with SQL anyway?

Assume that the features are sorted by "value" and "z_from" ascending. If the features aren't sorted, sort them with the Sorter beforehand.

1. AttributeCreator: if {the current feature and the previous feature have the same "value"} and {"z_from" of the current record is equal to the "z_to" of the previous record}, replace "z_from" of the current feature with "z_from" of the previous feature.

2. Aggregator: aggregate the features grouping by "value" and "z_from", and save all attributes into a list.

3. ListIndexer: extract tha last element of the list.

peteralstorp
Author
Contributor
91 replies
9 years ago
January 15, 2016

takashi wrote:

Seems to be hard to do that with SQL statement only. I think that using regular transformers is easier.

Do you have to accomplish it with SQL anyway?

Thank you so much, this feels like you set me off on the correct path! I didn't know how to use the Enable Adjacent Feature-function. However, now I still get the same amount of rows as I start up with so something must be wrong, I can't figure it out. Do you have any ideas what might cause this?

takashi
7723 replies
9 years ago
January 15, 2016

I'm unclear what is wrong in the actual situation. Please explain the situation in more detailed.

Another thought. This may also be resolved with geometric operations, I think it's an advantage of FME.

1. Two VertexCreators: translate each feature to a line segment (z_from, 0) - (z_to, 0).

2. LineJoiner: connect the line segments grouping by attributes except z_*.

3. BoundsExtractor: extract {minimum x, maximum x} of the resulting line as {z_from, z_to}.

Addition. If z_from and z_to aren't increasing in the connecting order, consider using two CoordinateExtractors instead of the BoundsExtractor.

+19

erik_jan
Contributor
2181 replies
9 years ago
January 15, 2016

takashi wrote:

I'm unclear what is wrong in the actual situation. Please explain the situation in more detailed.

Another thought. This may also be resolved with geometric operations, I think it's an advantage of FME.

1. Two VertexCreators: translate each feature to a line segment (z_from, 0) - (z_to, 0).

2. LineJoiner: connect the line segments grouping by attributes except z_*.

3. BoundsExtractor: extract {minimum x, maximum x} of the resulting line as {z_from, z_to}.

Addition. If z_from and z_to aren't increasing in the connecting order, consider using two CoordinateExtractors instead of the BoundsExtractor.

@takashi

I think you had the group by in the Aggregator wrong. You wanted to Group on X, Y and Value like in the LineJoiner and not on Z_from.

takashi
7723 replies
9 years ago
January 15, 2016

erik_jan wrote:

@takashi

I think you had the group by in the Aggregator wrong. You wanted to Group on X, Y and Value like in the LineJoiner and not on Z_from.

Hi @erik_jan, since the values of x and y of all features are identical in the data example, I thought they can be omitted in the Aggregator setting. If x and y should be considered as group identifiers, of course they should be set to the "Group By", as you mentioned.

The AttributeCreator modifies "z_from" so that it contains an identical value (i.e. first "z_from" in a group) for each group. It's the key point of the first solution.

+19

erik_jan
Contributor
2181 replies
9 years ago
January 16, 2016

takashi wrote:

The AttributeCreator modifies "z_from" so that it contains an identical value (i.e. first "z_from" in a group) for each group. It's the key point of the first solution.

Yes, I see. You are right!

Sorry about that.

jeroenstiers
178 replies
Best Answer
9 years ago
January 16, 2016

Hi peteralstorp,

I don't know how to solve this using only SQL but I would like to share a solution using Python because I think it is an easy solution.

Just let me know if anything is unclear.

Good luck!

import fme
import fmeobjects

# Template Class Interface:
class dissolveData(object):  
    
    def __init__(self):
        self.feature = -1
        self.cValue = -1
        self.to_z = -1 

    def input(self,feature):
                
        # Check if this feature has a value that is different than the last value
        if feature.getAttribute('value') != self.cValue or self.cValue == -1:     
            # If that is the case, the feature in self.feature should get the z_to value
            # of the last feature and be exported towards the workspace.
            if self.feature != -1:
                self.outputFeature()
            
            # Store this feature as the base of the next output
            self.feature = feature
            self.cValue = feature.getAttribute('value')    
            
        # Update the z_to-variable
        self.z_to = feature.getAttribute('z_to')
 
    def close(self):
        # Making sure we also export the last 'dissolved' feature.
        self.outputFeature()

    def outputFeature(self):
	# Update the z_to attribute from in the stored feature and export it
        self.feature.setAttribute("z_to", self.z_to)
        self.pyoutput(self.feature)

peteralstorp
Author
Contributor
91 replies
9 years ago
January 16, 2016

takashi wrote:

I'm unclear what is wrong in the actual situation. Please explain the situation in more detailed.

Another thought. This may also be resolved with geometric operations, I think it's an advantage of FME.

1. Two VertexCreators: translate each feature to a line segment (z_from, 0) - (z_to, 0).

2. LineJoiner: connect the line segments grouping by attributes except z_*.

3. BoundsExtractor: extract {minimum x, maximum x} of the resulting line as {z_from, z_to}.

Addition. If z_from and z_to aren't increasing in the connecting order, consider using two CoordinateExtractors instead of the BoundsExtractor.

@takashi I think the problem is that the prior feature (when read) already has been altered with a new z_from-value. Could that be the problem? The transformer documentation says "When retrieving the value of a prior feature, any changes made by the AttributeCreator to that feature will take effect."

peteralstorp
Author
Contributor
91 replies
9 years ago
January 16, 2016

takashi wrote:

Seems to be hard to do that with SQL statement only. I think that using regular transformers is easier.

Do you have to accomplish it with SQL anyway?

peteralstorp
Author
Contributor
91 replies
9 years ago
January 16, 2016

takashi wrote:

I'm unclear what is wrong in the actual situation. Please explain the situation in more detailed.

Another thought. This may also be resolved with geometric operations, I think it's an advantage of FME.

1. Two VertexCreators: translate each feature to a line segment (z_from, 0) - (z_to, 0).

2. LineJoiner: connect the line segments grouping by attributes except z_*.

3. BoundsExtractor: extract {minimum x, maximum x} of the resulting line as {z_from, z_to}.

Addition. If z_from and z_to aren't increasing in the connecting order, consider using two CoordinateExtractors instead of the BoundsExtractor.

Again, thank you! This is interesting but it is not good for my current project.

peteralstorp
Author
Contributor
91 replies
9 years ago
January 16, 2016

This did it for me, @jeroenstiers ! Thank you very much. This is so much appreciated!

peteralstorp
Author
Contributor
91 replies
9 years ago
January 16, 2016

takashi wrote:

Seems to be hard to do that with SQL statement only. I think that using regular transformers is easier.

Do you have to accomplish it with SQL anyway?

I would like to mark this as a correct answer as well, but since the Attributecreator-problem (looking at features that have already been altered) doesn't seem to work I await your answer/comments first. Anyway, big thanks again, your efforts are noticed and most welcome!

peteralstorp
Author
Contributor
91 replies
9 years ago
June 20, 2016

jeroenstiers wrote:

Hi peteralstorp,

I don't know how to solve this using only SQL but I would like to share a solution using Python because I think it is an easy solution.

Just let me know if anything is unclear.

Good luck!

import fme
import fmeobjects

# Template Class Interface:
class dissolveData(object):  
    
    def __init__(self):
        self.feature = -1
        self.cValue = -1
        self.to_z = -1 

    def input(self,feature):
                
        # Check if this feature has a value that is different than the last value
        if feature.getAttribute('value') != self.cValue or self.cValue == -1:     
            # If that is the case, the feature in self.feature should get the z_to value
            # of the last feature and be exported towards the workspace.
            if self.feature != -1:
                self.outputFeature()
            
            # Store this feature as the base of the next output
            self.feature = feature
            self.cValue = feature.getAttribute('value')    
            
        # Update the z_to-variable
        self.z_to = feature.getAttribute('z_to')
 
    def close(self):
        # Making sure we also export the last 'dissolved' feature.
        self.outputFeature()

    def outputFeature(self):
	# Update the z_to attribute from in the stored feature and export it
        self.feature.setAttribute("z_to", self.z_to)
        self.pyoutput(self.feature)

jeroenstiers My only problem now is that I need to aggregate on an id (called miid) also, since I have many groups of profiles going through the same script. Do you have a suggestion on how to do this? So, the comparison on 'value' should also incorporate a control that miid is the same... Am I being unclear?

+15

gio
Contributor
2252 replies
9 years ago
June 20, 2016

You can achieve his in SQL by using a

Case when .. lead .. over (partition by ..) AND/OR lag .. over (partition by )

when ..

else

end

Construction

Then group result to get unique record for each set.

peteralstorp
Author
Contributor
91 replies
9 years ago
June 21, 2016

gio wrote:

You can achieve his in SQL by using a

Case when .. lead .. over (partition by ..) AND/OR lag .. over (partition by )

when ..

else

end

Construction

Then group result to get unique record for each set.

Thank you, Gio. I'm afraid my SQL-skills are too poor to make this work, although I'm sure it would! I'm going to have to stick to my Python script.

peteralstorp
Author
Contributor
91 replies
9 years ago
June 21, 2016

gio wrote:

You can achieve his in SQL by using a

Case when .. lead .. over (partition by ..) AND/OR lag .. over (partition by )

when ..

else

end

Construction

Then group result to get unique record for each set.

If you have the time I would be grateful for a more explicit example... I know this is asking too much...

peteralstorp
Author
Contributor
91 replies
9 years ago
June 22, 2016

takashi wrote:

I'm unclear what is wrong in the actual situation. Please explain the situation in more detailed.

Another thought. This may also be resolved with geometric operations, I think it's an advantage of FME.

1. Two VertexCreators: translate each feature to a line segment (z_from, 0) - (z_to, 0).

2. LineJoiner: connect the line segments grouping by attributes except z_*.

3. BoundsExtractor: extract {minimum x, maximum x} of the resulting line as {z_from, z_to}.

Addition. If z_from and z_to aren't increasing in the connecting order, consider using two CoordinateExtractors instead of the BoundsExtractor.

For the record. I just tried this solution and it works fine! Thank you, Takashi. From what I can gather it's not possible to mark two solutions as the correct answer.

jeroenstiers
178 replies
9 years ago
June 23, 2016

peteralstorp wrote:

Hi peteralstorp,

I think I understand what you need. Instead of having 1 attribute to aggregate on, you have two. Is this correct?

If so, the easiest solution is to create a new attribute concatenating both. Doing so will result in the requested behaviour. If you would like to solve this without concatenating, you just have to add three lines of code to (1) store also the second attribute (line 11), (2) check if this second attribute has changed (line 15) and (3) update the stored value (line 24).

Does this answer your question?

.@peteralstorp

peteralstorp
Author
Contributor
91 replies
9 years ago
June 27, 2016

jeroenstiers wrote:

Hi peteralstorp,

I think I understand what you need. Instead of having 1 attribute to aggregate on, you have two. Is this correct?

Does this answer your question?

.@peteralstorp

Of course. As simple as that. Thanks again @jeroenstiers! I am so grateful for FME Knowledge and all its members.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

SQL: aggregate table data in subgroups

22 replies

Reply

Helpful Members This Week

Recently Solved Questions

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

How to set a "reply_to" parameter in flow automation action "email send"

AttributeValidator Pass Nulls

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Multiple parameter in PythonCaller scripticon

Fetching a parameter of a Custom Transformer in a PythonCaller (of that CT), results in erroricon

Create Multiple Parameters in Scripted Python Parametersicon

Issues with Python script in FME (working script outside FME)icon

Any update on 'ArcPy "import" statement tries to perform import action?'icon

Helpful Members This Week

Recently Solved Questions

RasterExpressionEvaluator Expression to select raster GRAY8 values

FME 2025.1 PythonCaller can't run arcpy?

Tag unknown # features with ID from a previous record

How to set a "reply_to" parameter in flow automation action "email send"

AttributeValidator Pass Nulls

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings