Question

Decode clob

Forum|Forum|11 years ago
July 2, 2014
4 replies
215 views

aldursil

Hi

I need to decode a clob that I read from an Oracle database. The text stored in the clob, looks like html, and that is also what our db-admin told me it should be. So I thought I could use the TextDecoder in order to decode the html to plain text. I have tried this in both my main workspace and in a simple one where I just create a feature with one attribute with one clob. Unfortunately the TextDecoder doesnt seem to do anything with my clob. The data looks the same before and after the transformer.

Is it possibly a bug in FME or is there something else I need to consider since I am dealing with a clob? Anyone have a suggestion?

Best regards

Tobias P.

Sample clob:

<body>

<h3>32594. * (T) Oslofjorden. Oslo. Sjursøya. Lysbøyer. Nye posisjoner ( Light buoys. New positions).</h3>

Slett tidligere Efs (T) 09/441/09 (Delete former Efs (T) 09/441/09) På grunn av utfylling i sjø på nordsiden er følgende sjømerker flyttet: (Due to reclamation north of Sjursøya Mole the following light buoys has been moved): a) Grønn lysbøye fra posisjon (1) til (2): (Green light buoy from position (1) to (2)): WGS84 DATUM (1) 59° 53.223' N, 10° 44.607' E (2) 59° 53.242' N, 10° 44.611' E ED50 DATUM (1) 59° 53.250' N, 10° 44.693' E (2) 59° 53.269' N, 10° 44.697' E NGO DATUM (1) 59° 53.176' N, 10° 44.896' E (2) 59° 53.195' N, 10° 44.900' E b) Midlertidig utlagt gul lysbøye fra posisjon (1) til (2): (Temporary yellow light buoy from position (1) to (2)): WGS84 DATUM (1) 59° 53.222' N, 10° 44.637' E (2) 59° 53.246' N, 10° 44.659' E ED50 DATUM (1) 59° 53.249' N, 10° 44.723' E (2) 59° 53.273' N, 10° 44.745' E NGO DATUM (1) 59° 53.175' N, 10° 44.926' E (2) 59° 53.199' N, 10° 44.948' E c) Midlertidig utlagt gul lysbøye fra posisjon (1) til (2): (Temporary yellow light buoy from position (1) to (2)): WGS84 DATUM (1) 59° 53.252' N, 10° 44.761' E (2) 59° 53.252' N, 10° 44.777' E ED50 DATUM (1) 59° 53.279' N, 10° 44.847' E (2) 59° 53.279' N, 10° 44.863' E NGO DATUM (1) 59° 53.205' N, 10° 45.050' E (2) 59° 53.205' N, 10° 45.066' E Kart (Charts): 4, 401, 452. (KildeID 0). (Oslo Havn KF, 1. desember 2010).

</body>

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

david_r
Forum|Forum|11 years ago
July 2, 2014

Hi,

the answer depends a bit on what you need the results for. If you need to extract specific elements, I'l look into using Pyton (as you did for the RTF blocks), perhaps using the module htmlparser (https://docs.python.org/2/library/htmlparser.html), which is included in the standard install.

If you only need to strip off all HTML tages, you can do it with a StringReplacer, like this:

David

Upvote

A

aldursil
Author
Forum|Forum|11 years ago
July 2, 2014

Thanks David!

I was actually just trying out a regex (<[^>]*>) using the stringsearcher, but the stringreplacer seems to be the better since it actually worked when using that transformer.

I'll definately look into using python for this as well, since I'll probably end up wanting to extract specific elements in future, but for now the regex is sufficient I think.

Hyggelig å treffe på flere nordmenn innenfor FME-verdenen! :-)

mvh

Tobias P.

Upvote

+15

gio
Contributor
Forum|Forum|11 years ago
July 2, 2014

Hi,

The use of regular expressions in Stringseachers and the like is very limited.

What can be done in Ruby (or Tcl), can mostly not in those transformers.

In Rubulator <[^>]*> (or <[^>]*[$>] ) will catch the tags.

To do this in fme u can use a creator (tester etc.)

Attribute: Name Tags

Value: @Evaluate([regexp -all -inline {<[^>]*[$>]} {@Value(html_txt (#))}])

Notice the braces :{@Value(html_txt)}

This is because the HTML text has a lot of tcl reserved charactes, "@Value(html_txt)" (including parenthisis would yield an error.

@Value(html_txt) will certainly yield an error as it will try to parse it...)

{"@Value(html_txt)"} is also correct btw.

If u do the inline, you can identify the tags by using it in conjuncture with:

@Evaluate([regexp -all -iindices {<[^>]*[$>]} {@Value(html_txt ())}])

@Evaluate([regexp -all {<[^>]*[$>]} {@Value(html_txt ())}]) will tell you how many tags there are.

Upvote

+15

gio
Contributor
Forum|Forum|11 years ago
July 2, 2014

stringsearcher transformer can't handle the collation (tcl 8.1 eo.)

using regexp in creators gives u a vast realm of possiblities.

Upvote

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded