Skip to main content
Solved

Extract value from text


mr_fme
Enthusiast
Forum|alt.badge.img+9

Hi,

 

How can I to extract Layer and CODIGO attributes from file below:

 

<html xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:msxsl="urn:schemas-microsoft-com:xslt">

 

<head>

 

<META http-equiv="Content-Type" content="text/html">

 

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

 

</head>

 

<body style="margin:0px 0px 0px 0px;overflow:auto;background:#FFFFFF;">

 

<table style="font-family:Arial.Verdana.Times;font-size:12px;text-align:left;width:100%;border-collapse:collapse;padding:3px 3px 3px 3px">

 

<tr style="text-align:center;font-weight:bold;background:#9CBCE2">

 

<td>232400481803401</td>

 

</tr>

 

<tr>

 

<td>

 

<table style="font-family:Arial.Verdana.Times;font-size:12px;text-align:left;width:100%;border-spacing:0px; padding:3px 3px 3px 3px">

 

<tr>

 

<td>Layer</td>

 

<td>232400481803401</td>

 

</tr>

 

<tr bgcolor="#D4E4F3">

 

<td>CODIGO</td>

 

<td>48</td>

 

</tr>

 

</table>

 

</td>

 

</tr>

 

</table>

 

</body>

 

</html>

 

 

Thank´s

Best answer by ebygomm

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

 

imageBut i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

 

View original
Did this help you find an answer to your question?

2 replies

ebygomm
Influencer
Forum|alt.badge.img+39
  • Influencer
  • Best Answer
  • July 18, 2023

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

 

imageBut i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

 


mr_fme
Enthusiast
Forum|alt.badge.img+9
  • Author
  • Enthusiast
  • July 18, 2023
ebygomm wrote:

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

 

imageBut i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

 

Thank´s Solve my problem!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings