Solved

HTTPCaller getting data from a form

  • 4 December 2017
  • 6 replies
  • 13 views

Badge

Hello All,

Using for the first time knowledge center (to post questions). I'm little bit new on HTTPCaller, not on FME :). I'm trying to get data from this page (source): http://www1.kaiho.mlit.go.jp/TUHO/keiho/navarea11_en.html

So far I'm not able to get any data, probably I'm using the wrong method (POST Vs GET), I've been doing some research and checking similar questions like: https://knowledge.safe.com/questions/42386/how-to-get-information-from-website.html

My idea is to be able to download all those records under "NAVTEXT" section (see attached image) and get at the end a result like this: http://www1.kaiho.mlit.go.jp/TUHO/keiho/cgi/disp_warnings.cgi?TYPE=NAVAREA11&TANA;=170795⟨=EG

The above link is the "OnClick" response when you click on a record under the title column

I can't pass them manually because those records change in a daily basis. Any idea on how can I accomplish this?

Thanks in advance for your help

Cesar
icon

Best answer by csuarez 7 December 2017, 14:30

View original

6 replies

Userlevel 4
Badge +30

Hi @csuarez,

To read the link http://www1.kaiho.mlit.go.jp/TUHO/keiho/cgi/disp_warnings.cgi?TYPE=NAVAREA11&TANA;=170795⟨=EG and get the value, i started with transformer Creator and HTTPCaller to get the comunication with website, method GET.

A new attribute _response_body was created:

After i used the custom transformer HTMLStripper to clean the attribute _response_body.

In FME Data Inspector:

Thanks,

Danilo

Badge

Hi @danilo_fme, thanks a lot for your answer and the tip on the HTMLStripper. I was actually about to edit my question as realized I wasn't very clear on explain my issue, my end result is exactly what you have from the inspector, but to get there (see attached workspace) I have to pass some parameters in the url:

http://www1.kaiho.mlit.go.jp/TUHO/keiho/cgi/disp_warnings.cgi?TYPE=NAVAREA11&TANA;=17 +

Counter +

⟨=EG

Please noticed the "TANA" variable starts with 17 (year) follow by a 4 digits number. I have created a somehow working version, which creates 10k records, add a counter and concatenate that attribute with the url like the above to extract the value. For me the workaround I have created is not ok as it has to go and create 10k records, process them and then select the ones that actually exists.

The source url still is http://www1.kaiho.mlit.go.jp/TUHO/keiho/navtex_en.html

Any ideas?

Thanks

Cesar

xi-fme.fmw

Userlevel 2
Badge +17

Hi @danilo_fme, thanks a lot for your answer and the tip on the HTMLStripper. I was actually about to edit my question as realized I wasn't very clear on explain my issue, my end result is exactly what you have from the inspector, but to get there (see attached workspace) I have to pass some parameters in the url:

http://www1.kaiho.mlit.go.jp/TUHO/keiho/cgi/disp_warnings.cgi?TYPE=NAVAREA11&TANA;=17 +

Counter +

⟨=EG

Please noticed the "TANA" variable starts with 17 (year) follow by a 4 digits number. I have created a somehow working version, which creates 10k records, add a counter and concatenate that attribute with the url like the above to extract the value. For me the workaround I have created is not ok as it has to go and create 10k records, process them and then select the ones that actually exists.

The source url still is http://www1.kaiho.mlit.go.jp/TUHO/keiho/navtex_en.html

Any ideas?

Thanks

Cesar

xi-fme.fmw

Hi @csuarez, t seems that the URL with any TANA number always returns a valid HTML, so, if you want to ignore responses that don't have no actual contents, I think that you will have to test the contents after parsing the HTML document, unless you know the valid range of TANA.

 

Also the StringFormatter or the StringPadder may be useful to create 4 digits number with 0 padding.

 

Badge

Thanks a lot @danilo_fme and @takashi for your insights. For the TANA values, I don't have the way to retrieve them, and that was the reason I created those 10k records (Creator attribute). My main concern is if it is actually possible to use HTTPCaller to retrieve such data from this link:

 

http://www1.kaiho.mlit.go.jp/TUHO/keiho/navarea11_en.html. This link shows the only records I can (or should) read, and the processing time will be reduced

As opposed to go to the backend and get the data by passing those TANA values generated by the "Creator" transformer, this option works for sure, but takes a lot of time to process, hence my preference for the initial url (http://www1.kaiho.mlit.go.jp/TUHO/keiho/navarea11_en.html).

Once again, thank you,

Cesar

Badge

Hello All,

After getting a better understanding on how to pass specific values using HTTPCaller, I was able to pass and retrieved the right values.

The key issue was to identified the url for the form that was making the call after the initial url was accessed (http://www1.kaiho.mlit.go.jp/TUHO/keiho/navarea11_en.html). Once you had the correct url, there were some parameters that needed to be uploaded -Multiplat/Form Data- (Year and Type). This retrieved the TANA records, subsequently another HTTPCaller is made with the url which contains the TANA records.

 

Once the above is done (see attached workbench for reference to all out there), it comes the fun part, but that's for another adventure.

 

Thanks so much @danilo_fme and @takashi for your help on this.

 

Cesar

xi-fme-v3.fmw

Userlevel 4
Badge +30

Hello All,

After getting a better understanding on how to pass specific values using HTTPCaller, I was able to pass and retrieved the right values.

The key issue was to identified the url for the form that was making the call after the initial url was accessed (http://www1.kaiho.mlit.go.jp/TUHO/keiho/navarea11_en.html). Once you had the correct url, there were some parameters that needed to be uploaded -Multiplat/Form Data- (Year and Type). This retrieved the TANA records, subsequently another HTTPCaller is made with the url which contains the TANA records.

 

Once the above is done (see attached workbench for reference to all out there), it comes the fun part, but that's for another adventure.

 

Thanks so much @danilo_fme and @takashi for your help on this.

 

Cesar

xi-fme-v3.fmw

Hi @csuarez

 

I saw your Workspace and the configuration inside the transformer HTTPCaller using MultipartUpload. It was great solution!

 

 

Thanks,

 

Danilo

Reply