Skip to main content
Solved

Extract value from text

  • July 18, 2023
  • 2 replies
  • 158 views

mr_fme
Enthusiast
Forum|alt.badge.img+9

Hi,

 

How can I to extract Layer and CODIGO attributes from file below:

 

<html xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:msxsl="urn:schemas-microsoft-com:xslt">

 

<head>

 

<META http-equiv="Content-Type" content="text/html">

 

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

 

</head>

 

<body style="margin:0px 0px 0px 0px;overflow:auto;background:#FFFFFF;">

 

<table style="font-family:Arial.Verdana.Times;font-size:12px;text-align:left;width:100%;border-collapse:collapse;padding:3px 3px 3px 3px">

 

<tr style="text-align:center;font-weight:bold;background:#9CBCE2">

 

<td>232400481803401</td>

 

</tr>

 

<tr>

 

<td>

 

<table style="font-family:Arial.Verdana.Times;font-size:12px;text-align:left;width:100%;border-spacing:0px; padding:3px 3px 3px 3px">

 

<tr>

 

<td>Layer</td>

 

<td>232400481803401</td>

 

</tr>

 

<tr bgcolor="#D4E4F3">

 

<td>CODIGO</td>

 

<td>48</td>

 

</tr>

 

</table>

 

</td>

 

</tr>

 

</table>

 

</body>

 

</html>

 

 

Thank´s

Best answer by ebygomm

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

 

imageBut i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

 

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

2 replies

ebygomm
Influencer
Forum|alt.badge.img+44
  • Influencer
  • 3427 replies
  • Best Answer
  • July 18, 2023

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

 

imageBut i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

 


mr_fme
Enthusiast
Forum|alt.badge.img+9
  • Author
  • Enthusiast
  • 145 replies
  • July 18, 2023

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

 

imageBut i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

 

Thank´s Solve my problem!