Hi @checcosisani, you can just set those CSS Selector strings to the "CSS Selector" column in the Extract Queries table of the HTMLExtractor parameters, if "body" stores an HTML document which contains elements pointed by the CSS Selectors.
For example:
Hi @checcosisani, you can just set those CSS Selector strings to the "CSS Selector" column in the Extract Queries table of the HTMLExtractor parameters, if "body" stores an HTML document which contains elements pointed by the CSS Selectors.
For example:
Hi Takashi
thx for quick reply but unfortunaltely id doesn't work
I sent you the wb with the url
I've tried a lot of combinations of css query but without success.
the goal at the end is to obtain also a list of data
Thx
Farncesco
test_css_tmp.fmw
Hi @checcosisani, you can just set those CSS Selector strings to the "CSS Selector" column in the Extract Queries table of the HTMLExtractor parameters, if "body" stores an HTML document which contains elements pointed by the CSS Selectors.
For example:
I looked at the value of "body" attribute (source HTML text) entered into the HTMLExtractor_5 in your workspace and found that the target <table> element doesn't contain <tbody>. That is, <tr> elements are direct children of <table>, so this CSS Selector should work theoretically.
#ctl00_mainIndexContent_pnlRisultati > table > tr:nth-child(3) > td:nth-child(1)
However, strangely, the HTMLExtractor_5 routed the input feature to the <Rejected> port when I ran the workspace with FME 2019.2. Then, I ran the same workspace with FME 2020.0 and successfully got this result. I think the value of "ufficio_tmp" is your desired one.
I suspect that FME 2019 has a defect on parsing HTML in some condition. I would recommend you to upgrade FME version to 2020.
Hi @checcosisani, you can just set those CSS Selector strings to the "CSS Selector" column in the Extract Queries table of the HTMLExtractor parameters, if "body" stores an HTML document which contains elements pointed by the CSS Selectors.
For example:
I'm wondering if FME 2019 may not support the selector "nth-child()".
This workspace is an alternative without using "nth-child()". If you cannot upgrade FME version for some reason, consider applying this approach as a workaround.
test-css-tmp-2.fmw (FME 2019.2)
Hi Takashi
thx for support
Yes FME 2020 support the css selector but now I have another stopper...
how can "transpose" the values present in the html table in separate fields ?
basically the idea is (like in python) to have all values belonging to the first column of html table in the same field
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(3) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(8) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(11) > td:nth-child(1)
I hope my explanation is clear
thx
Francesco
Hi Takashi
thx for support
Yes FME 2020 support the css selector but now I have another stopper...
how can "transpose" the values present in the html table in separate fields ?
basically the idea is (like in python) to have all values belonging to the first column of html table in the same field
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(3) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(8) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(11) > td:nth-child(1)
I hope my explanation is clear
thx
Francesco
Do you mean that the first column values in the row 3, 8, 11 should be concatenated and stored in a single attribute?
Hi Takashi
thx for support
Yes FME 2020 support the css selector but now I have another stopper...
how can "transpose" the values present in the html table in separate fields ?
basically the idea is (like in python) to have all values belonging to the first column of html table in the same field
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(3) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(8) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(11) > td:nth-child(1)
I hope my explanation is clear
thx
Francesco
Hi Takashi
attach you can find the wb that I use to extract the data (all data) from website but there are a lot's of workaround because of my inexperience ......my goal is to reach a clean and efficient process to extract data from the web
any suggestion are more than welcome
thx
Francesco
Milano_testcss_v1.zip
Hi Takashi
thx for support
Yes FME 2020 support the css selector but now I have another stopper...
how can "transpose" the values present in the html table in separate fields ?
basically the idea is (like in python) to have all values belonging to the first column of html table in the same field
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(3) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(8) > td:nth-child(1)
#ctl00_mainIndexContent_pnlRisultati > table > tbody > tr:nth-child(11) > td:nth-child(1)
I hope my explanation is clear
thx
Francesco
If I understand structure of the source HTML and your requirement correctly, this workspace example might help you. You just need to transform / rename some attributes to achieve the final goal.
milano-testcss-example.fmw (FME 2019.2)
Hi @checcosisani, you can just set those CSS Selector strings to the "CSS Selector" column in the Extract Queries table of the HTMLExtractor parameters, if "body" stores an HTML document which contains elements pointed by the CSS Selectors.
For example:
Super !
.. I have another question
Some websites are not "scrapable" because they are dynamic (java)
If I have a simple python script done with Selenium I can run this script using python caller ?
Thx
Super !
.. I have another question
Some websites are not "scrapable" because they are dynamic (java)
If I have a simple python script done with Selenium I can run this script using python caller ?
Thx
I don't think FME has capability to interpret JavaScript script to dynamically generate HTML document unfortunately.
I don't think FME has capability to interpret JavaScript script to dynamically generate HTML document unfortunately.
Hi Takashi
sorry for late reply and for "bad" request
What I want to know if it's possible to run a python script from FME (I mean using the python caller)
The script need to install selenium library so I don't know if this is possible .. not expert in python caller
In case I can share the script
thx again
Francesco
I don't think FME has capability to interpret JavaScript script to dynamically generate HTML document unfortunately.
Hi @checcosisani, in general, you can implement and run a Python script containing any external modules with a PythonCaller, if you have installed required modules into your FME Python environment. See here to learn how you can install an external module into FME Desktop.
Installing Python Packages to FME Desktop
However, I'm not sure if the selenium module provides classes and/or functions to get your desired result, since I've never used it.
I'd recommend you to post a new question if you want to hear a useful suggestion regarding use of the selenium module. Hopefully someone in the Community have experienced to use the module.