
Regex/algorithm in Python to extract comments from class attributes

Given the code for a class definition, I am trying to extract all attributes and their comments ("" empty string if no comments).

class Player(Schema):
    score = fields.Float()
    Total points from killing zombies and finding treasures

    name = fields.String()
    age = fields.Int()

    backpack = fields.Nested(
    Collection of items that a player can store in their backpack

In the above example, we expected the parsed result to be:

  ("score", "Total points from killing zombies and finding treasures"),
  ("name", ""),
  ("age", ""),
  ("backpack", "Collection of items that a player can store in their backpack")

In my attempt below, it is failing to extract the comments properly, giving an output:

  ('score', 'Total points from killing zombies and finding treasures'), 
  ('name', ''), 
  ('age', ''), 
  ('backpack', '')

How can the regex expression (or even the entire parsing logic) be fixed to handle the situations present in the example class code?


import re

code_block = '''class Player(Schema):
    score = fields.Float()
    Total points from killing zombies and finding treasures

    name = fields.String()
    age = fields.Int()

    backpack = fields.Nested(
    Collection of items that a player can store in their backpack

def parse_schema_comments(code):
    # Regular expression pattern to match field names and multiline comments
    pattern = r'(\w+)\s*=\s*fields\.\w+\([^\)]*\)(?:\n\s*"""\n(.*?)\n\s*""")?'

    # Find all matches using the pattern
    matches = re.findall(pattern, code, re.DOTALL)

    # Process the matches to format them as required
    result = []
    for match in matches:
        field_name, comment = match
        comment = comment.strip() if comment else ""
        result.append((field_name, comment))

    return result

parsed_comments = parse_schema_comments(code_block)

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *