Skip to main content
Solved

Read CSV File and split the content into lines and columns


dataman
Contributor
Forum|alt.badge.img+5

Hey there,

I have to download several CSV Files from a website. I do it with the HTTPCALLER Transformer. It works perfect. I get ALL the data (several columns and lines) in one "cell":

imageI need the informations in lines and columns, so that I can manipulate the data.

I have tried with Pythoncaller, ListExploder, AttributeCreator,... but it doesn't work.

Do you have any idea, how can I get that? I think the problem is the configuration of the HTTPCALLER or of the other Transformers but I have tried a lot of things and nothing works.

 

Many thanks in advance!

 

Best answer by nielsgerrits

Multiple ways:

  • Writing and reading as a file. Will cost some I/O, but is probably the fastest. You can write to temp files to avoid having temp files to clear afterwards using the TempPathnameCreator. This is the way I prefer.
    • Write as CSV using a FeatureWriter.
    • Write as CSV from HTTPCaller.
    • Read written file using a FeatureReader.
  • Splitting the attributes to a list. Create attributes from listelements.
    • Split file to lines based on newline.
    • Split lines to columns based on semicolon.
    • Create attributes from listelements.
View original
Did this help you find an answer to your question?

7 replies

nielsgerrits
VIP
Forum|alt.badge.img+54
  • Best Answer
  • July 18, 2023

Multiple ways:

  • Writing and reading as a file. Will cost some I/O, but is probably the fastest. You can write to temp files to avoid having temp files to clear afterwards using the TempPathnameCreator. This is the way I prefer.
    • Write as CSV using a FeatureWriter.
    • Write as CSV from HTTPCaller.
    • Read written file using a FeatureReader.
  • Splitting the attributes to a list. Create attributes from listelements.
    • Split file to lines based on newline.
    • Split lines to columns based on semicolon.
    • Create attributes from listelements.

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • July 18, 2023
nielsgerrits wrote:

Multiple ways:

  • Writing and reading as a file. Will cost some I/O, but is probably the fastest. You can write to temp files to avoid having temp files to clear afterwards using the TempPathnameCreator. This is the way I prefer.
    • Write as CSV using a FeatureWriter.
    • Write as CSV from HTTPCaller.
    • Read written file using a FeatureReader.
  • Splitting the attributes to a list. Create attributes from listelements.
    • Split file to lines based on newline.
    • Split lines to columns based on semicolon.
    • Create attributes from listelements.

No need for the FeatureWriter, you can choose save response body to file instead of attribute in the http caller


nielsgerrits
VIP
Forum|alt.badge.img+54
nielsgerrits wrote:

Multiple ways:

  • Writing and reading as a file. Will cost some I/O, but is probably the fastest. You can write to temp files to avoid having temp files to clear afterwards using the TempPathnameCreator. This is the way I prefer.
    • Write as CSV using a FeatureWriter.
    • Write as CSV from HTTPCaller.
    • Read written file using a FeatureReader.
  • Splitting the attributes to a list. Create attributes from listelements.
    • Split file to lines based on newline.
    • Split lines to columns based on semicolon.
    • Create attributes from listelements.

Attached workspace demonstrating this.

 


dataman
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • July 18, 2023
nielsgerrits wrote:

Attached workspace demonstrating this.

 

Hello,

thanks for the answer. If I save the URL as CSV with the featurewriter and after that I read it as featureReader, it goes so fast, that the informations won't be as attribute read... I suppouse that the reader is too fast and the file was not saved


dataman
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • July 18, 2023
nielsgerrits wrote:

Multiple ways:

  • Writing and reading as a file. Will cost some I/O, but is probably the fastest. You can write to temp files to avoid having temp files to clear afterwards using the TempPathnameCreator. This is the way I prefer.
    • Write as CSV using a FeatureWriter.
    • Write as CSV from HTTPCaller.
    • Read written file using a FeatureReader.
  • Splitting the attributes to a list. Create attributes from listelements.
    • Split file to lines based on newline.
    • Split lines to columns based on semicolon.
    • Create attributes from listelements.

The second way works perfect. Awesome! thanks


nielsgerrits
VIP
Forum|alt.badge.img+54
nielsgerrits wrote:

Attached workspace demonstrating this.

 

You need to expose the attributes manually, or using import.


nielsgerrits
VIP
Forum|alt.badge.img+54
dataman wrote:

The second way works perfect. Awesome! thanks

Cheers :)


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings