First of all, there's an invalid character on lines 59456 and 59473, e.g.:
<notes>IPID - 10762190
SS - 42109, 25614
Actuator Max 30m ? Min 15m</notes>
You might want to e.g. use the StringReplacer to fix those first.
You can then read the XML like this (or similar using the FeatureReader):
Result:
When I try to read it using a FeatureReader I get:
When I check the file in Notepad++ to line 59455 I see some newlines, probably from the source data field SiteNotes.
This corrupts the XML structure which is causing your issue.
You also can only use FME to do check the XML structure:
- Read the xml as text file, change parameter to read whole file at once (entire text in one feature). Make sure you read the correct encoding. (UTF-8).
- Connect a XMLValidator. The validator will stop at the first error.
Tip: Create a two samples of the file for faster iteration while working. One corrupt file, one good file. When it works, just change to the original big file.
Use the good file to generate the paths from the structure you need with a FeatureReader.
So you need to replace or remove the incorrect newlines in SiteNotes. One way to do this:
- Remove all correct newlines. (not the incorrect ones) StringReplacer, Replace Regular Expression, replace >
by >. - Replace all remaining newlines with another, not used, character, for example a pipe. This way you are able to restore the newlines from SiteNotes in a later phase of the proces. StringReplacer, Replace Regular Expression, replace
by |. - Connect a XMLValidator to check if this was the only error.
Now you can write the XML file or just process the corrected file.
- XMLFragmenter. (Create a feature for each logger.) Match loggers/logger
- XMLFlattener. (This can also be done in the XMLFragmenter, but for learning this might be easier to do step by step.) Match logger.
Edit: added workspace template
parsexml.fmwt
First of all, there's an invalid character on lines 59456 and 59473, e.g.:
<notes>IPID - 10762190
SS - 42109, 25614
Actuator Max 30m ? Min 15m</notes>
You might want to e.g. use the StringReplacer to fix those first.
You can then read the XML like this (or similar using the FeatureReader):
Result:
Thank you - the invalid character was created when I copied the xml into notepad.
When I try to read it using a FeatureReader I get:
When I check the file in Notepad++ to line 59455 I see some newlines, probably from the source data field SiteNotes.
This corrupts the XML structure which is causing your issue.
You also can only use FME to do check the XML structure:
- Read the xml as text file, change parameter to read whole file at once (entire text in one feature). Make sure you read the correct encoding. (UTF-8).
- Connect a XMLValidator. The validator will stop at the first error.
Tip: Create a two samples of the file for faster iteration while working. One corrupt file, one good file. When it works, just change to the original big file.
Use the good file to generate the paths from the structure you need with a FeatureReader.
So you need to replace or remove the incorrect newlines in SiteNotes. One way to do this:
- Remove all correct newlines. (not the incorrect ones) StringReplacer, Replace Regular Expression, replace >
by >. - Replace all remaining newlines with another, not used, character, for example a pipe. This way you are able to restore the newlines from SiteNotes in a later phase of the proces. StringReplacer, Replace Regular Expression, replace
by |. - Connect a XMLValidator to check if this was the only error.
Now you can write the XML file or just process the corrected file.
- XMLFragmenter. (Create a feature for each logger.) Match loggers/logger
- XMLFlattener. (This can also be done in the XMLFragmenter, but for learning this might be easier to do step by step.) Match logger.
Edit: added workspace template
parsexml.fmwt
That's incredibly helpful - thank you so much for your detailed response!
That's incredibly helpful - thank you so much for your detailed response!
Welcome, but the real issue seemed to be the invalid character, as @david_r points out. Reading the file as UTF-8 text solved the issue, works as well when you remove the stringreplacers. Facepalm.
When I try to read it using a FeatureReader I get:
When I check the file in Notepad++ to line 59455 I see some newlines, probably from the source data field SiteNotes.
This corrupts the XML structure which is causing your issue.
You also can only use FME to do check the XML structure:
- Read the xml as text file, change parameter to read whole file at once (entire text in one feature). Make sure you read the correct encoding. (UTF-8).
- Connect a XMLValidator. The validator will stop at the first error.
Tip: Create a two samples of the file for faster iteration while working. One corrupt file, one good file. When it works, just change to the original big file.
Use the good file to generate the paths from the structure you need with a FeatureReader.
So you need to replace or remove the incorrect newlines in SiteNotes. One way to do this:
- Remove all correct newlines. (not the incorrect ones) StringReplacer, Replace Regular Expression, replace >
by >. - Replace all remaining newlines with another, not used, character, for example a pipe. This way you are able to restore the newlines from SiteNotes in a later phase of the proces. StringReplacer, Replace Regular Expression, replace
by |. - Connect a XMLValidator to check if this was the only error.
Now you can write the XML file or just process the corrected file.
- XMLFragmenter. (Create a feature for each logger.) Match loggers/logger
- XMLFlattener. (This can also be done in the XMLFragmenter, but for learning this might be easier to do step by step.) Match logger.
Edit: added workspace template
parsexml.fmwt
Thank you.
My next question is how do I get nested XML? Using @david_r s method of reading in at the logger level I now need to get all the messages associated with the logger, but only the most recent message (with the highest id).
FME doesn't seem to expose all the message id's when I just use the <logger> as the element to match.
I've attached another file.
xml2.txt
The messages are there, but the standard behavior in FME is to either create an attribute or a list in each "logger" feature depending on the number (cardinality) of "message" objects. If there is only one "message", then the object is output as regular attributes, but if there are several "message" objects per "logger", then a list is output. This is a bit cumbersome because you will have to treat the two cases differently, unless you always have multiple messages per logger.
However, it is possible to tell FME to always use a list for a specific element, here's an example on how to force the "messages" as a list:
In the dialog "XML Flatten options" you will have to toggle advanced mode and type the following to specify the cardinality of the message objects:
cardinality="*/messages/message{}/+ /+"
This means that all "message" objects inside the "messages" element should be rendered as an FME list regardless of the number. You can then use a ListExploder to get all the messages per logger:
The messages are there, but the standard behavior in FME is to either create an attribute or a list in each "logger" feature depending on the number (cardinality) of "message" objects. If there is only one "message", then the object is output as regular attributes, but if there are several "message" objects per "logger", then a list is output. This is a bit cumbersome because you will have to treat the two cases differently, unless you always have multiple messages per logger.
However, it is possible to tell FME to always use a list for a specific element, here's an example on how to force the "messages" as a list:
In the dialog "XML Flatten options" you will have to toggle advanced mode and type the following to specify the cardinality of the message objects:
cardinality="*/messages/message{}/+ /+"
This means that all "message" objects inside the "messages" element should be rendered as an FME list regardless of the number. You can then use a ListExploder to get all the messages per logger:
For reference, here's the relevant part of the documentation: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_ReadersWriters/xml/structure_element.htm
The messages are there, but the standard behavior in FME is to either create an attribute or a list in each "logger" feature depending on the number (cardinality) of "message" objects. If there is only one "message", then the object is output as regular attributes, but if there are several "message" objects per "logger", then a list is output. This is a bit cumbersome because you will have to treat the two cases differently, unless you always have multiple messages per logger.
However, it is possible to tell FME to always use a list for a specific element, here's an example on how to force the "messages" as a list:
In the dialog "XML Flatten options" you will have to toggle advanced mode and type the following to specify the cardinality of the message objects:
cardinality="*/messages/message{}/+ /+"
This means that all "message" objects inside the "messages" element should be rendered as an FME list regardless of the number. You can then use a ListExploder to get all the messages per logger:
That's very helpful - thank you. I may be back with more questions tho (sorry in advance)
First of all, there's an invalid character on lines 59456 and 59473, e.g.:
<notes>IPID - 10762190
SS - 42109, 25614
Actuator Max 30m ? Min 15m</notes>
You might want to e.g. use the StringReplacer to fix those first.
You can then read the XML like this (or similar using the FeatureReader):
Result:
Hi @david_r,
Looks like I'm going to have to parameterize my api in order to fire a list of values into the url string. I can't get this to work with a feature reader - do I have to use a HTTPCaller and then use the XML fragmenter?
The messages are there, but the standard behavior in FME is to either create an attribute or a list in each "logger" feature depending on the number (cardinality) of "message" objects. If there is only one "message", then the object is output as regular attributes, but if there are several "message" objects per "logger", then a list is output. This is a bit cumbersome because you will have to treat the two cases differently, unless you always have multiple messages per logger.
However, it is possible to tell FME to always use a list for a specific element, here's an example on how to force the "messages" as a list:
In the dialog "XML Flatten options" you will have to toggle advanced mode and type the following to specify the cardinality of the message objects:
cardinality="*/messages/message{}/+ /+"
This means that all "message" objects inside the "messages" element should be rendered as an FME list regardless of the number. You can then use a ListExploder to get all the messages per logger:
Hi @david_r,
Looks like I'm going to have to parameterize my api in order to fire a list of values into the url string. I can't get this to work with a feature reader - do I have to use a HTTPCaller and then use the XML fragmenter?
Hi @david_r,
Looks like I'm going to have to parameterize my api in order to fire a list of values into the url string. I can't get this to work with a feature reader - do I have to use a HTTPCaller and then use the XML fragmenter?
It should work just fine using the FeatureReader. Consider posting a new question (for better visibility) and also post some screenshots / relevant bits from the log. That way you'll get more eyes on your question.