Skip to main content
Question

Using HTML Extractor

  • December 16, 2019
  • 1 reply
  • 16 views

Hi, I am trying to extract a HTML page.

I would appreciate if any of the expert can explain to me on how to extract :

<tr data-group="Batu Pahat" data-group-2="Sekolah Agama Seri Chomel">

<td data-field="Daerah" class="ew-rpt-grp-field-1">

<span data-class="tpx1_1_Maklumat_Bencana_Daerah_Aktif_Banjir_Johor_PusatPemindahan">Sekolah Agama Seri Chomel</span></td>

<td data-field="Keluarga" class="ew-table-alt-row"><span>6</span></td>

 

I would like to extract the bolded ones from the HTML page using HTML Extractor

Your help will be kindly appreciated

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

1 reply

takashi
Celebrity
  • 7843 replies
  • December 16, 2019

The HTMLExtractor with this setting populates values of all span elements into a list attribute _list{}. See the help on the transformer to learn more. Assuming that the attribute called "html" stores the source HTML document.