Skip to main content

I just ran a workspace in Workbench 2019 to analyse software concurrent usage data. Basically the dataset contains a feature for each instance a user fires up a piece of software and includes a start and end time. The dataset only covers 4 or so weeks and has 55,466 features...not huge by any means.

To solve this problem I plotted points for each start time and lines by using the end time as the 2nd coordinate (y=0, X1=Start, X2=End). Essentially this produces a Gantt chart with all lines and points on the X axis. Then I use the PointOnLineOverlayer to find how many lines lie under each start point, which gives the number of users at each point in time. Sorting for that count then gives the maximum concurrent usage.

The problem however is that this transformer splits each line by the point above it FIRST, before it processes the points output. So I ended up with 31,829,560 new line features that I really don't need! As a consequence, it took 5 hours & 14 minutes to run the workspace, with the biggest chunk wasted on things I didn't need. I only needed a list of some (original) line attributes assigned to the intersecting points.

Is there an alternative? Maybe this is a good use case for an idea post on enhancing this transformer to toggle the line-splitting functionality/output if not needed?...thoughts welcome! Thanks.

I'm not sure if I understand the situation exactly, but possibly the SpatialRelator could be a workaround.


Hello @takashi, thanks...will take a look. In the meantime, here's a small graphic from my initial testing with a very small dataset to explain what I'm doing. In this example I'm showing a Y offset between lines, but I'm plotting them all on top of each other. I'm interested in only finding which lines lie below each start point of each line and have no need, nor interest, in splitting the lines. Thanks.


I'm not sure if I understand the situation exactly, but possibly the SpatialRelator could be a workaround.

Thanks a lot @takashi! I didn't know about that transformer and it does exactly what I need :) I connected the points to the Requester input, the lines to the Supplier input and used the spatial predicate "Requestor Intersects Supplier" to get the result.


I'm not sure if I understand the situation exactly, but possibly the SpatialRelator could be a workaround.

I have to say though, it's not as fast as I thought it would be. I narrowed down the dataset to 44,514 features but it's going through very slowly. I thought it would be much faster (although I understand it has to go through 45,514 x 45,514 comparisons...2,071,524,196!)


If I understood the requirement correctly, database operation could also be an alternative, and it might be much faster than spatial approach.

That is, once you extract the minimum x and the maximum x for each line (e.g. _xmin, _xmax), you can use the InlineQuerier with this SQL Query to get the number of concurrent lines for each starting point (= _xmin). In this example, the resulting "n" would represent the number of concurrent lines for each starting point.

SQL Query

select
    a.*,
    (select count(*) from lines as b
    where a._xmin between b._xmin and b._xmax) as n
from lines as a

0684Q00000ArLejQAF.png


If you are just interested in what's "around" something, the neighbor finder is an amazing transformer as well. As far as performance, you will see massive benefits if you can create a group by in any of the transformer where there is a lot of data.

If possible you might also look at spatial sorting, this allows for data in the "same area" to have a group by parameter if you do not have one already. Also the Topology builder is an amazing tool that will compute spatial interactions and is extremely fast as well.


If you are just interested in what's "around" something, the neighbor finder is an amazing transformer as well. As far as performance, you will see massive benefits if you can create a group by in any of the transformer where there is a lot of data.

If possible you might also look at spatial sorting, this allows for data in the "same area" to have a group by parameter if you do not have one already. Also the Topology builder is an amazing tool that will compute spatial interactions and is extremely fast as well.

Thanks for the tips :) I will do some more testing and post back. So far I found a couple of issues with speed that are worth noting. In this case, feature caching has quite an impact and disabling it speeds things up a bit. Using the SpatialRelator as @takashi suggested seems to be the most promising.

Lastly, I was performing "cleanup" of the data after the "concurrent" data is computed. There are several instances that need to be removed (ie: multiple instances of the software that I'm tallying running at the same time or from the same software suite/different version do not consume an additional license). I was using a ListDuplicateRemover and ListElementCounter to address this, but then figured I need to do things more efficiently and remove duplicates before processing. So I added an Intersector and LineCombiner before processing the data, grouping by an attribute I added (concatenation of the Username and machine name) so that any overlapping lines are combined into one instead. This actually reduced the data to be spatially analysed in half. I then used a CoordinateExtractor and VertexCreator to compare the line startpoints to the lines themselves using the SpatialRelator and now I get a result in about 35 minutes, which is a definite improvement. The problem is that most points are too far to intersect and if I can find a way to not bother processing those obvious ones, it would go much, much faster than checking all points against all lines. Oh and lastly...a plugged in laptop works faster than on battery power...duh! Thanks again.


If I understood the requirement correctly, database operation could also be an alternative, and it might be much faster than spatial approach.

That is, once you extract the minimum x and the maximum x for each line (e.g. _xmin, _xmax), you can use the InlineQuerier with this SQL Query to get the number of concurrent lines for each starting point (= _xmin). In this example, the resulting "n" would represent the number of concurrent lines for each starting point.

SQL Query

select
    a.*,
    (select count(*) from lines as b
    where a._xmin between b._xmin and b._xmax) as n
from lines as a

0684Q00000ArLejQAF.png

Thanks for this alternative method! For some reason my numbers are not matching, so I'll need to take a closer look. As to speed, it doesn't seem faster than the SpatialRelator. The major drawback is that I cannot (or don't know how!) to create a list of other important attributes at each computed point. This is something I wanted to have, and is easily done with spatial transformers. I will do some more testing with this approach though.