REGEX for HTML

Hello! This REGEX formula works perfectly:

$tc(reg, “SUMMARY: Desired_text_here DTSTART”, “SUMMARY:\s(.*?)\sDTSTART”, “$1”)$

And I get “Desired text here” as output. But when I use the regex function with the WebGet function, I don’t get the desired text. Maybe is the HTML formatting, but I can’t find out how to fix it. See the whole formula and the URL:

$tc(reg, wg(“https://tasks.office.com/5b7921be-1ca9-4db8-9d2c-4de4071b1eca/Calendar/User/yzhMWHc7pUeAB40cBlTGDGUAKGdA?t=0_fe113c6a-43de-46bb-9d1a-c74fa5eb25c3_2024-05-05T18%3A25%3A56.9825830%2B00%3A00”, txt), “(?s)SUMMARY:\s*(.?)\sDTSTART”, “$1”)$

It outputs the whole HTML converted to string, without the “SUMMARY:” and “DTSTART”, instead of the text in the middle of both.

Please, could someone help me to correct?

That link seems to download a .ics file. And I’m getting the following format below which is not similar to your working test.

SUMMARY:A validar
com SEGES na próxi
ma reunião
DTSTART;VALUE=DATE

Exactly, I need the “A validar com SEGES na próxima reunião”, that is the title of the task from MS Planner. It’s between SUMMARY and DTSTART. Not sure if it’s those line breaks characters that Kustom REGEX doesn’t read…

With this formula, I get that output:

$tc(reg, (wg(“https://tasks.office.com/5b7921be-1ca9-4db8-9d2c-4de4071b1eca/Calendar/User/yzhMWHc7pUeAB40cBlTGDGUAKGdA?t=0_fe113c6a-43de-46bb-9d1a-c74fa5eb25c3_2024-05-05T18%3A25%3A56.9825830%2B00%3A00”, txt)), “SUMMARY:\s(.*?)\sDTSTART”, “$1”)$

This formula doesn’t seem to work for me at all. May I know which version of the Kustom app are you currently running?

Try this one. It removes the searching words. My KWGT version is 3.75b410013

$tc(
     reg, 
      
     wg("https://tasks.office.com/5b7921be-1ca9-4db8-9d2c-4de4071b1eca/Calendar/User/yzhMWHc7pUeAB40cBlTGDGUAKGdA?t=0_fe113c6a-43de-46bb-9d1a-c74fa5eb25c3_2024-05-05T18%3a25%3a56.9825830%2b00%3a00", txt), 

      "(?s)SUMMARY:\s*(.*?)\s*DTSTART", 
      "$1")$```

Can you set the wg type to raw and match that pattern instead?

Just changing from txt to raw? It did not solve. Or is anything here that I’m missing?

Regexp on multi line text is hard, i havent tried myself but you could check flows, with flows you can split this into multiple jobs, so maybe you can first use a regexp to find the right starting position and THEN split the text using another function to get the title

This seems to work for me granting there is a fixed number of items that you need to extract from that file. For simplicity, I placed the webget into a global variable.

$tc(split, tc(split, gv(content), “SUMMARY”, 1), “DTSTART”, 0)$

Awesome! I tried using wg(link,txt) as a global value, didn’t work. But putting it inside the formula is working!

Split is better than REGEX in this case. Brilliant.

Thank you.

If it isn’t asking too much, how did you download the file, from the link that I shared, and see its content? In Chrome, when I open the link, the empty page shows nothing. Saving it with CTRL + S as HTML or TXT also doesn’t give me anything.

EDIT: Forget it. Opening with Opera gives me the ics file.

One last try: this file has some non-breakable spaces. How to get rid of them?

$tc(reg,“text with non-bre akable spa ces”, " “,”")$

Tried this, no success.

How Kustom reads these non-breakable spaces?

Edit: first, sorry for the many editions. Trying something friday night isn’t a good idea. Then, I’m writing it here so maybe one day help someone.

I used tc(URL, “strange text”) to get the non-breakable spaces and replace it with empty text.

This topic was automatically closed 25 days after the last reply. New replies are no longer allowed.