Advertisement

Strip HTML tags in iMacros

 (Read 8067 times)

Skyla157

  • Global Moderator
  • Hero Member
  • ****
  • Posts: 23756
Strip HTML tags in iMacros
« on: Jun 28, 2018, 05:06 PM »
There are often times that you need to grab html DOM elements with javascript. But then it grabs all the elements inside the said element as well. If you extract the HTML but only require the text inside it without any additional formatting with HTML tags, you should use the following code to strip the tags and have a clean text you can use for your data needs.

The initial code to extract HTML code from an element in iMacros is like this:
Code: [Select]
TAG POS=1 TYPE=DIV ATTR=CLASS:allinfo EXTRACT=HTM

After you've extracted the code, the extracted string goes into the inbuilt "{{!EXTRACT}}" variable. Now you can set it to a different variable if you need multiple extraction.
Code: [Select]
SET !VAR1 {{!EXTRACT}}Here !VAR1 will aquire the extracted text and you can now extract additional elements and set it to !VAR2, !VAR3, and so on.

Now onto cleaning the string of all the HTML tags. This includes opening and closing tags.
Code: [Select]
SET !VAR3 EVAL("var s=\"{{!VAR1}}\"; s = s.replace(/<[ =\":;/0-9a-z]{1,100}>/g, \"\"); s;")
In the above code, a javascript evaluation is done to strip the string of HTML tags using regex. The value returned will be set to a iMacros variable of our choice. The !VAR3 will now hold the clean text without any HTML tags. The cleaned text will be readable with paragraph separation.

Now you can use the !VAR3 code to add data to file or move it to another location.

THE FULL CODE
Code: [Select]
TAG POS=1 TYPE=DIV ATTR=CLASS:allinfo EXTRACT=HTM
SET !VAR1 {{!EXTRACT}}
SET !VAR3 EVAL("var s=\"{{!VAR1}}\"; s = s.replace(/<[ =\":;/0-9a-z]{1,100}>/g, \"\"); s;")

ENJOY