Solved

Extract value from text

Forum|Forum|2 years ago
July 18, 2023
2 replies
158 views

+9

mr_fme
Enthusiast
145 replies

Hi,

How can I to extract Layer and CODIGO attributes from file below:

<head>

</head>

</tr>

<tr>

<td>

<tr>

<td>Layer</td>

</tr>

<td>CODIGO</td>

</tr>

</table>

</td>

</tr>

</table>

</body>

</html>

Thank´s

Best answer by ebygomm

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

But i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

+44

ebygomm
Influencer
3427 replies
Best Answer
Forum|Forum|2 years ago
July 18, 2023

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

But i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

+9

mr_fme
Author
Enthusiast
145 replies
Forum|Forum|2 years ago
July 18, 2023

If the format is absolutely consistent, then you could do something like this in the HTMLExtractor

But i'm not sure i'd trust it personally.

The HTML Extractor is based around BeautifulSoup and in that you find specific values and look for the next sibling but i'm not sure if that's possible in the HTML Extractor or what the syntax would be e.g.

import fme
import fmeobjects
from bs4 import BeautifulSoup
 
 
def FeatureProcessor(feature):
    html = feature.getAttribute('html_content')
    soup = BeautifulSoup(html)
    try:
        layer = soup.find("td", text="Layer").find_next_sibling("td").text
        feature.setAttribute("Layer",layer)
    except:
        feature.setAttribute("Layer","")
    try:
        codigo = soup.find("td", text="CODIGO").find_next_sibling("td").text
        feature.setAttribute("CODIGO",codigo)
    except:
        feature.setAttribute("CODIGO","")

Thank´s Solve my problem!

Extract value from text

2 replies

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded