Skip to main content
Solved

Parsing a W3C Extended Log File


Forum|alt.badge.img

My goal is to read log files from Amazon 33 and parse them into a database

The log files look like this:-

#Version: 1.0

#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end

2020-07-03 13:39:54 LHR62-C3 1571 148.00.00.00 GET d36on651kzt577.cloudfront.net / 200 https://URL/2020/07/02/fooo/ Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/83.0.4103.116%20Safari/537.36 - - Hit gIanUGtmvquSunAiRJFbhFdPbexwpIV2DbtYUJ7XtVOKZopkUl1uEw== foo.com https 427 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 58606 0.001 Hit text/html;%20charset=utf-8 1234 - -

 

I am downloading the files locally and then reading them using a FeatureReader with CSV format. I have set the Dataset parameters to be tab delimited with no field names line and to read data from line 3 onward.

Is it possible for me to set the field headings manually? I can see the data at the "Generic" output port and if I inspect this and show all columns I can see col0 - col32 has been read in correctly. But I cannot figure out how to expose these columns so I can rename them and load them into the database?

 

To add a bit more information here is my flow

 

 

The files downloaded from S3 are in GZIP format

I could not find a way of getting the FULL path to the downloaded file to pass to the gzip decompressor so had to append the root path to the file name. As a result this appears to stop the FeatureReader from exposing the columns read. I need a way of telling it these manually.

I have confirmed that the dynamic file name is the problem

If I select a single GZ file for the Feature Reader then the columns are available.

Best answer by debbiatsafe

Hi @davebarter

The FeatureReader should have already exposed the coln attributes on features output from the CSV port. You should then be able to use an AttributeManager or AttributeRenamer to rename attributes.

AttributeManager may be easier as you will not have to add or import column names manually as you would with the AttributeRenamer.

View original
Did this help you find an answer to your question?

3 replies

debbiatsafe
Safer
Forum|alt.badge.img+20
  • Safer
  • Best Answer
  • July 7, 2020

Hi @davebarter

The FeatureReader should have already exposed the coln attributes on features output from the CSV port. You should then be able to use an AttributeManager or AttributeRenamer to rename attributes.

AttributeManager may be easier as you will not have to add or import column names manually as you would with the AttributeRenamer.


Forum|alt.badge.img
  • Author
  • July 8, 2020
debbiatsafe wrote:

Hi @davebarter

The FeatureReader should have already exposed the coln attributes on features output from the CSV port. You should then be able to use an AttributeManager or AttributeRenamer to rename attributes.

AttributeManager may be easier as you will not have to add or import column names manually as you would with the AttributeRenamer.

See additional information above. These attributes are not being exposed


debbiatsafe
Safer
Forum|alt.badge.img+20
davebarter wrote:

See additional information above. These attributes are not being exposed

There are two different methods to try depending on which output port you want features to be output from.

Since you are reading from CSV and CSV has the option of getting the feature type name from the format name (CSV parameters dialog > Dataset Parameters > Feature Type Names(s)), you can create an output port named CSV (Output > Output Ports = One per Feature Type).

To expose the attributes from this port, select a file of your W3 log in the Generating Output Ports dialog that appears after selecting OK in the FeatureReader parameter window.

You should now see a CSV port with coln attributes.

 

Alternatively, you can use the Generic port but manually expose the required attributes using the FeatureReader dialog (Output > Attribute and Geometry Handling > <Generic> Port).


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings