Question

Extract Javascript variables from script of html?

  • 6 January 2022
  • 6 replies
  • 62 views

Badge +1

I have two variables within a script tag in the head of an html document.

What's the best way to extract those two variables?

Both variables are json data.

 

In the end I want the geojsondata and I want to add the metadata to the geojson data.

 

I can extract the strings for the variables using regex, but is there something better to read javascript?

 

image


6 replies

Userlevel 2
Badge +17

Hi @Peter Timmers​ , I think it would be necessary to use regular expressions anyway, but it could become more elegant by using combination of some transformers including HTMLExtractor.

Can you share your current solution?

Badge +1

Hi @Peter Timmers​ , I think it would be necessary to use regular expressions anyway, but it could become more elegant by using combination of some transformers including HTMLExtractor.

Can you share your current solution?

Regular expression worked a treat.

 

I haven't got back to the problem yet, but when I do, my next task is to take the two json datasets, extract the geojson to features and join the metadata json to those features. It's been a while since I've done something like this.

 

image.png

Userlevel 2
Badge +17

Regular expression worked a treat.

 

I haven't got back to the problem yet, but when I do, my next task is to take the two json datasets, extract the geojson to features and join the metadata json to those features. It's been a while since I've done something like this.

 

image.png

Sorry, still unclear what the problem is. If you can show us entire content of the <script> and the JSON text(s) you need to extract from the script, we might be able to find out a solution.

Hello, You could read the html file with the text reader and use (read whole file at once = yes). With attributeManager find the position “<script>” in an attribute and in another attribute find the position “</script>”. In a third attribute use the @Substring command ex: @Substring(@Value(text_line_data),@Value(Start),@Value(End)-@Value(Start)+8)

 

* +8 is the length of the tag if you want to see it.

Badge +1

Regular expression worked a treat.   

 

I haven't got back to the problem yet, but when I do, my next task is to take the two json datasets, extract the geojson to features and join the metadata json to those features.  It's been a while since I've done something like this.  

 

image.png

I was mistaken, my regex is not working, but the amount of json is also too big to drop on here (200,000+ lines).

 

Regex might be too slow for the 200,000 lines also. Debugging takes about 3 min + to run the workbench.

 

In the end I want to use a featurereader on geojson text and add the metadata attributes to each feature.

<head><script>
         var metaData = { [150+ lines of json]
                                         }
         var geojsonData = { [200,000 + lines of geojson]
                                        }
     </script>
</head>

 

Badge +1

Hello, You could read the html file with the text reader and use (read whole file at once = yes). With attributeManager find the position “<script>” in an attribute and in another attribute find the position “</script>”. In a third attribute use the @Substring command ex: @Substring(@Value(text_line_data),@Value(Start),@Value(End)-@Value(Start)+8)

 

* +8 is the length of the tag if you want to see it.

Substring find without using regex is appealing right now.

Reply