Uncategorized

Struggling to Locate a Solution for Text Search Across Various Text Variations using Python Regex



Rules:

  1. “trr” must appear in the text
  2. “abc” and “cba” are representing as unknown characters or/and symbols and unknown amount of letters.
  3. “trr” leaves a copule of unknown characters or/and symbols behind or/and in front until the first “:” is found from either direction.
  4. The text formatted with “abc” and “cba”, and will be in the following structures: “trr:abc:cba” or “abc:cba:trr”
  5. The final goal is to find “abc” and “cba”

Here is what I tried to do:

import re

lines = [
    "ksad29://trr/h4fds1:askgh85k_Sg:J@(SK!)",
    "dfg326f://trr:S)@_Skasfm.c:js_1",
    "trr/(sa-):sj_14!:!lsx.1",
    "trr:js21s:x2491",
    "askgh85k_Sg:J@(SK!):ksad29://trr/h4fds1",
    "S)@_Skasfm.c:js_1:dfg326f://trr",
    "sj_14!:!lsx.1:trr/(sa-)",
    "js21s:x2491:trr",
    "ksad29://xee/h4fds1:askgh85k_Sg:J@(SK!)",
    "dfg326f://xee:S)@_Skasfm.c:js_1",
    "xee/(sa-):sj_14!:!lsx.1",
    "xee:js21s:x2491",
    "askgh85k_Sg:J@(SK!):ksad29://xee/h4fds1",
    "S)@_Skasfm.c:js_1:dfg326f://xee",
    "sj_14!:!lsx.1:xee/(sa-)",
    "js21s:x2491:xee"
]

for line in lines:
    search = "trr"
    match = re.search(fr"{search}.*?:([^:]+):([^:]+)", line)

    if match:
        print(f'"{line}" | Found | "{match.group(1)}", "{match.group(2)}"')
    else:
        print(f'"{line}" | Not Found')

Results:

"ksad29://trr/h4fds1:askgh85k_Sg:J@(SK!)" | Found | "askgh85k_Sg", "J@(SK!)"
"dfg326f://trr:S)@_Skasfm.c:js_1" | Found | "S)@_Skasfm.c", "js_1"
"trr/(sa-):sj_14!:!lsx.1" | Found | "sj_14!", "!lsx.1"
"trr:js21s:x2491" | Found | "js21s", "x2491"
"askgh85k_Sg:J@(SK!):ksad29://trr/h4fds1" | Not Found
"S)@_Skasfm.c:js_1:dfg326f://trr" | Not Found
"sj_14!:!lsx.1:trr/(sa-)" | Not Found
"js21s:x2491:trr" | Not Found
"ksad29://xee/h4fds1:askgh85k_Sg:J@(SK!)" | Not Found
"dfg326f://xee:S)@_Skasfm.c:js_1" | Not Found
"xee/(sa-):sj_14!:!lsx.1" | Not Found
"xee:js21s:x2491" | Not Found
"askgh85k_Sg:J@(SK!):ksad29://xee/h4fds1" | Not Found
"S)@_Skasfm.c:js_1:dfg326f://xee" | Not Found
"sj_14!:!lsx.1:xee/(sa-)" | Not Found
"js21s:x2491:xee" | Not Found

I’ve provided couple of versions of text that might appear, with the expected results:

  1. “ksad29://trr/h4fds1:askgh85k_Sg:J@(SK!)” | Found | “askgh85k_Sg”, “J@(SK!)”

  2. “dfg326f://trr:S)@_Skasfm.c:js_1” | Found | “S)@_Skasfm.c”, “js_1”

  3. “trr/(sa-):sj_14!:!lsx.1” | Found | “sj_14!”, “!lsx.1”

  4. “trr:js21s:x2491” | Found | “js21s”, “x2491”

  5. “askgh85k_Sg:J@(SK!):ksad29://trr/h4fds1” | Found | “askgh85k_Sg”, “J@(SK!)”

  6. “S)@_Skasfm.c:js_1:dfg326f://trr” | Found | “S)@_Skasfm.c”, “js_1”

  7. “sj_14!:!lsx.1:trr/(sa-)” | Found | “sj_14!”, “!lsx.1”

8 . “js21s:x2491:trr” | Found | “js21s”, “x2491”

  1. “ksad29://xee/h4fds1:askgh85k_Sg:J@(SK!)” | Not Found
  2. “dfg326f://xee:S)@_Skasfm.c:js_1” | Not Found
  3. “xee/(sa-):sj_14!:!lsx.1” | Not Found
  4. “xee:js21s:x2491” | Not Found
  5. “askgh85k_Sg:J@(SK!):ksad29://xee/h4fds1” | Not Found
  6. “S)@_Skasfm.c:js_1:dfg326f://xee” | Not Found
  7. “sj_14!:!lsx.1:xee/(sa-)” | Not Found
  8. “js21s:x2491:xee” | Not Found

A more visual version illustrating how the text is constructed using “abc”, “cba”, “bbb” and “txx” (“abc”, “cba”, “bbb” and “txx” are representing unknown characters or/and symbols and unknown amount of letters):

  1. “bbb://trr/txx:abc:cba” | Found | “abc”, “cba”
  2. “bbb://trr:abc:cba” | Found | “abc”, “cba”
  3. “trr/txx:abc:cba” | Found | “abc”, “cba”
  4. “trr:abc:cba” | Found | “abc”, “cba”
  5. “abc:cba:bbb://trr/txx” | Found | “abc”, “cba”
  6. “abc:cba:bbb://trr” | Found | “abc”, “cba”
  7. “abc:cba:trr/txx” | Found | “abc”, “cba”
  8. “abc:cba:trr” | Found | “abc”, “cba”
  9. “bbb://xee/txx:abc:cba” | Not Found | “abc”, “cba”
  10. “bbb://xee:abc:cba” | Not Found | “abc”, “cba”
  11. “xee/txx:abc:cba” | Not Found | “abc”, “cba”
  12. “xee:abc:cba” | Not Found | “abc”, “cba”
  13. “abc:cba:bbb://xee/txx” | Not Found | “abc”, “cba”
  14. “abc:cba:bbb://xee” | Not Found | “abc”, “cba”
  15. “abc:cba:xee/txx” | Not Found | “abc”, “cba”
  16. “abc:cba:xee” | Not Found | “abc”, “cba”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *