Solved

Extract text from (.json) to one single text / csv file


Badge

Hello Sir / Madam,

I have many files (more than a hundred of files in .json format - text). I would like to extract certain text from each of the .json files and put all the text in a single text / csv file for further study.

I am just wondering if it is possible to do it on FME? Or what transformer I would use?

Here is what I really want to get for my report.

File name; errors number (spacing for "no error"; it could even better if I can output "no error" instead of spacing)

Output sample as follow:

banana_report.json;102;203;204;303;305;306;307;

a_report.json; ;

My .json file sample is uploaded for your reference. Basically, the information that I need is at the "all_errors: []"

Hope somebody would give me a hand on this issue. I am really appreciate that.

Thank you very much!

Zach

icon

Best answer by chrisatsafe 9 June 2022, 18:02

View original

8 replies

Userlevel 4
Badge +30

Hi @zacharylee1204​ 

 

Please, could yu share your file?

Badge +2

Hi @zacharylee1204​ ,

You should be able to do this with the JSONExtractor using the following JSON Query:

json["all_errors"]

This will extract the JSON Array into a single attribute which you can then format as you wish. For example, if you use a StringPairReplacer with the following expression, it will format the array as semicolon delimited:

[ "" ] "" , ;

Here is the workspace I tested with (attached). I exposed fme_dataset in the reader parameters and passed that into the FilenamePartExtractor and concatenated the filename and errors into the text_line_data attribute in the AttributeManager. Hope that helps.image

Badge

Hi @zacharylee1204​ ,

You should be able to do this with the JSONExtractor using the following JSON Query:

json["all_errors"]

This will extract the JSON Array into a single attribute which you can then format as you wish. For example, if you use a StringPairReplacer with the following expression, it will format the array as semicolon delimited:

[ "" ] "" , ;

Here is the workspace I tested with (attached). I exposed fme_dataset in the reader parameters and passed that into the FilenamePartExtractor and concatenated the filename and errors into the text_line_data attribute in the AttributeManager. Hope that helps.image

Hi @chrisatsafe​ , I am really really appreciate your prompt response. Thank you so much. It works fine on your text. However, I am not sure why my .json file doesn't work as planned. It just straightly go to Rejected on JSONExtractor. I have attached my document in here. Would you mind to take a look for me?  Thank you!

Also, I have a hard time to to understand the coding that your wrote.

json["all_errors"]   

Does it means  everything inside the "all_errors": [] will be extracted?

[ "" ] "" , ;

May I ask what exactly all these symbols mean? 

Sorry for all the questions. But it would be great if I can get it work and understand it at the same time.

Again, I am really appreciated your help! :)​

Badge

Hi @chrisatsafe​ , I am really really appreciate your prompt response. Thank you so much. It works fine on your text. However, I am not sure why my .json file doesn't work as planned. It just straightly go to Rejected on JSONExtractor. I have attached my document in here. Would you mind to take a look for me? Thank you!

Also, I have a hard time to to understand the coding that your wrote.

json["all_errors"]

Does it means everything inside the "all_errors": [] will be extracted?

[ "" ] "" , ;

May I ask what exactly all these symbols mean?

Sorry for all the questions. But it would be great if I can get it work and understand it at the same time.

Again, I am really appreciated your help! :)​

Oh, sorry. I have another report that has no error in it. The report shows "all_errors" :[]

Is it still possible to show it in the report like this?

 

a_report.json;0

 

0 means that no error in it. It is just easier for me to see that in excel for analyzing purpose.

 

Sorry for all the questions. I am not good at programming. So it would be great if you can give me a hand.

 

Again, thank you.

 

Zach

Badge +2

Hi @chrisatsafe​ , I am really really appreciate your prompt response. Thank you so much. It works fine on your text. However, I am not sure why my .json file doesn't work as planned. It just straightly go to Rejected on JSONExtractor. I have attached my document in here. Would you mind to take a look for me? Thank you!

Also, I have a hard time to to understand the coding that your wrote.

json["all_errors"]

Does it means everything inside the "all_errors": [] will be extracted?

[ "" ] "" , ;

May I ask what exactly all these symbols mean?

Sorry for all the questions. But it would be great if I can get it work and understand it at the same time.

Again, I am really appreciated your help! :)​

Hi @zacharylee1204​ ,

Good question - I used a text file reader with the parameters set to read whole file at once to read in the JSON file in this case because you only care about that array. I prefer using the Text reader in this case because of how the array is formatted. If you have more needs, you could use the JSON reader as well and concatenate the array. In this example, the JSONExtractor is being used to take all of the array values and put them into a single attribute value (saving you a concatenating step).

 

Regarding your other questions:

json["all_errors"]

Does it means everything inside the "all_errors": [] will be extracted?

Yes, that is a perfect description of what is happening.

[ "" ] "" , ;

May I ask what exactly all these symbols mean?

This is a good question as the StringPairReplacer is likely new to you. The StringPairReplacer works by first stating the character you'd like to find followed by the character you'd like to replace it with (separated by a space). So in this case, we're searching for the square bracket [ and replacing it with nothing "". Similarly, we're replacing the closing square bracket with nothing and replacing commas , with semicolons ;

There are more examples in the linked documentation from the previous post as well.

 

The report shows "all_errors" :[]

Is it still possible to show it in the report like this?

Yes, this is totally possible by using conditional values the AttributeManager. ConditionalValue

Badge

@chrisatsafe​ You are really my life saver! I got it done, finally! Thank you so much for your help. I would like to buy you a dinner if I see you in person! :) And now I understand the concept much better!

 

There is one more thing I would like to add for my program though. And I need some guidance... again

As you can see for my banana_report.json, there are "code" states the error number. What I would like to do now is to extract all "code" and count the total number of the code.

 

For instance, code 102 might happen 10 times, code 203 might happen 5 times.

 

I tried "code" in JSONExtractor and it seems it cannot output all the codes as I wanted.

 

Thank you and sorry for so mannnnnnyyyyy questions.

 

Again, I am really appreciated your help.

 

Thank you! :)

Badge +2

@chrisatsafe​ You are really my life saver! I got it done, finally! Thank you so much for your help. I would like to buy you a dinner if I see you in person! :) And now I understand the concept much better!

 

There is one more thing I would like to add for my program though. And I need some guidance... again

As you can see for my banana_report.json, there are "code" states the error number. What I would like to do now is to extract all "code" and count the total number of the code.

 

For instance,  code 102 might happen 10 times, code 203 might happen 5 times. 

 

I tried "code" in JSONExtractor and it seems it cannot output all the codes as I wanted.

 

Thank you and sorry for so mannnnnnyyyyy questions.

 

Again, I am really appreciated your help.

 

Thank you! :)

Hi @zacharylee1204​ ,

Glad it helped. 

For counting the error codes, you could use a JSONFragmenter and a StatisticsCalculator. In the JSONFragmenter, set the JSON Query to:

json["features"][*]["primitives"][*]["errors"][*]

Set flatten query results into Attributes to Yes and type in "code" in the Attributes to Expose parameter. Next connect the JSONFragmenter to a StatisticsCalculator. Set the group by to code and calculate the total count for the code attribute.

imageYou could also use a JSON Reader for this, but since you've already read the JSON in as a text file, it's easier to fragment it with the JSONFragmenter (IMO). If you want to learn more about constructing the JSON Queries, add a JSON reader to the canvas, open the Reader parameters and set the Schema Scan Mode to JSON Query as it will show the tree structure.

imageThere's also a lot of great JSON specific tips in this article.

Badge

Hi @zacharylee1204​ ,

Glad it helped. 

For counting the error codes, you could use a JSONFragmenter and a StatisticsCalculator. In the JSONFragmenter, set the JSON Query to:

json["features"][*]["primitives"][*]["errors"][*]

Set flatten query results into Attributes to Yes and type in "code" in the Attributes to Expose parameter. Next connect the JSONFragmenter to a StatisticsCalculator. Set the group by to code and calculate the total count for the code attribute.

imageYou could also use a JSON Reader for this, but since you've already read the JSON in as a text file, it's easier to fragment it with the JSONFragmenter (IMO). If you want to learn more about constructing the JSON Queries, add a JSON reader to the canvas, open the Reader parameters and set the Schema Scan Mode to JSON Query as it will show the tree structure.

imageThere's also a lot of great JSON specific tips in this article.

Hi @chrisatsafe​ , it works perfect! Thank you for such a thorough answer. I understand the JSON concept much better now. Again, thank you sooooooooooooooo much! :)

Reply