Solved

How to read live data streams in text files


Badge +2

I am wondering if anyone have any suggestions to use FME to process live data streams that are funneled into various text files, server log files being one of the examples. Thanks!

icon

Best answer by fmelizard 2 February 2018, 18:18

View original

10 replies

Userlevel 2
Badge +14

Hi @bo,

You can use WebSockets to capture these messages/data streams. Here are some resources to get you started:

Hopefully that helps, let us know if you need anymore information.

-Liz

Badge +2

Hi @bo,

You can use WebSockets to capture these messages/data streams. Here are some resources to get you started:

Hopefully that helps, let us know if you need anymore information.

-Liz

Thanks Liz. In this particular case, we have a log file that is constantly being appended by other processes. And we need to use the log file as the data source. Then the question becomes how can we open the log file for read as a stream.

 

 

 

 

Userlevel 2
Badge +14
Thanks Liz. In this particular case, we have a log file that is constantly being appended by other processes. And we need to use the log file as the data source. Then the question becomes how can we open the log file for read as a stream.

 

 

 

 

Hi @bo,

 

If you have access to FME Server then you can create a schedule to run your workspace to grab the appended data. You can set up the schedule to run until cancelled so your output dataset will be continually updating from the log file.

 

 

Let me know if I'm not on the right track of what you are asking.
Badge +2

Hi @bo,

You can use WebSockets to capture these messages/data streams. Here are some resources to get you started:

Hopefully that helps, let us know if you need anymore information.

-Liz

Hi @lizsanderson, this process could work even though it is not "real-time" stream. Wonder if we could efficiently grab only the "newly-appended" data since last read?

 

Userlevel 2
Badge +14
Hi @lizsanderson, this process could work even though it is not "real-time" stream. Wonder if we could efficiently grab only the "newly-appended" data since last read?

 

Hi @bo,

 

This article Directory Watch Publisher with Idle Time Delay (Advanced) (2017) updates your output file by only grabbing the newly appended data. It works on a Modify trigger and the file can be locally or within FME Server. It writes to Google Fusion Tables, but ultimatly it can be written to format.

 

I'm going to pass this question on to one of our experts and see if they have a better solution for you.

 

Badge +2

Hi,

 

 

If you read in the latest file, and had a copy of the log file in it's previous state (last time you read it) you could use a duplicatefilter transformer to only work with the new, unique rows/lines in the log file. You'd then have to schedule the workspace to run really frequently, so it wouldn't quite be live streaming but fairly close.,

Hi,

 

If you on a frequent schedule read in the text file, but also kept a record of the previous state of the text file (reading them in with a feature for every line) you could use a DuplicateFilter to only process the new lines added. You'd have to run the job via a schedule very frequently, I don't know of a way that you could live stream a text file.
Badge +2

Hi,

 

 

If you read in the latest file, and had a copy of the log file in it's previous state (last time you read it) you could use a duplicatefilter transformer to only work with the new, unique rows/lines in the log file. You'd then have to schedule the workspace to run really frequently, so it wouldn't quite be live streaming but fairly close.,

Hi,

 

If you on a frequent schedule read in the text file, but also kept a record of the previous state of the text file (reading them in with a feature for every line) you could use a DuplicateFilter to only process the new lines added. You'd have to run the job via a schedule very frequently, I don't know of a way that you could live stream a text file.
Thank you @jlutherthomas for sharing your approach. I agree that the scheduling-based approach is less than "live." @lizsanderson just suggested using directory watch is a better pattern in this case. I also have concern with the potential latency of using DuplicateFilter espeically when event frequency is high and when the log file gets big...
Badge +2

Hi @bo,

You can use WebSockets to capture these messages/data streams. Here are some resources to get you started:

Hopefully that helps, let us know if you need anymore information.

-Liz

@lizsanderson, Directory watcher should help and is better than scheduling! I read the article briefly, not seeing how the newly appended data is captured and send to the target stream. Instead, it appears to me that article is talking about taking a snapshot at an time interval for backup...

 

Userlevel 4
Badge +13

Hi @bo,

The Directory Watch has a min poll of 1 minute, whereas a schedule can run every second if needed. Another option is to use a looping custom transformer to re-read the file over and over again in an endless cycle (it would require an engine (or workspace) to run constantly though . I put one together which you can test out.

textfilestreamreader.fmx

Badge +2

Hi @bo,

The Directory Watch has a min poll of 1 minute, whereas a schedule can run every second if needed. Another option is to use a looping custom transformer to re-read the file over and over again in an endless cycle (it would require an engine (or workspace) to run constantly though . I put one together which you can test out.

textfilestreamreader.fmx

@MattAtSafe, Thanks for providing the custom transformer. I do not fully understand the workspace at this moment and it gives us a great start!

 

Reply