I have a string of text that includes a bundle of comic series and issues all in one long block for each row.
Examples are the following:
Example 1
“Batman #323, 325, 335, 340, 368-369, 397-400, Amazing Spider-Man #13-17”
Example 2
“Amazing Spider-Man #nn, Amazing Spider-Man Annual #10, Amazing Spider-Man 174, 185, 213, 245, 326”
I would like to note that “#nn” should be retained as the series in the comic. If it makes it easier, I can replace the “#nn” with “#00”.
I have been trying to use regular expression (or regex) in Python. For instance, I have tried
r"([a-zA-Z\s\'-]+) #(\d+|\d+-\d+|\w+)"
The code I have written is as follows
import re
def separate_comic_books(comic_books_str):
series_issue_dict = {}
# Define a regular expression pattern to extract series and issue information
pattern = re.compile(r'([a-zA-Z\s\'-]+) #(\d+|\d+-\d+|\w+)')
# Split the string into individual comic book entries
comic_books_list = re.split(',\s*', comic_books_str)
# Iterate through the list of comic books
for comic_book in comic_books_list:
matches = pattern.findall(comic_book)
print(matches)
for match in matches:
series = match[0].strip()
issues = match[1].strip()
# Split the issues if it's a range
issues_list = [str(i) for i in range(int(issues.split('-')[0]), int(issues.split('-')[-1]) + 1)]
# Add the comic book to the dictionary based on series
if series in series_issue_dict:
series_issue_dict[series].extend(issues_list)
else:
series_issue_dict[series] = issues_list
# Create the final formatted string
formatted_comic_books = []
for series, issues in series_issue_dict.items():
formatted_issues=", ".join([f"{series} #{issue}" for issue in sorted(issues)])
formatted_comic_books.append(formatted_issues)
return ', '.join(formatted_comic_books)
# Provided string of comic books
comic_books_str = "Amazing Spider-Man #nn, Amazing Spider-Man Annual #10, Amazing Spider-Man 174, 185, 213, 245, 326"
result = separate_comic_books(comic_books_str)
print(result)
However, I am getting the following results
Example 1
"Batman #323, Amazing Spider-Man #13"
Example 2
ValueError: invalid literal for int() with base 10: 'nn'
However, I would like to get the following results
Example 1
Batman #323, Batman #325, Batman #335, Batman #340, Batman #368, Batman #369, Batman #397, Batman #398, Batman #399, Batman #400, Amazing Spider-Man #13, Amazing Spider-Man #14, Amazing Spider-Man #15, Amazing Spider-Man #16, Amazing Spider-Man #17
Example 2
Amazing Spider-Man #nn, Amazing Spider-Man Annual #10, Amazing Spider-Man 174, Amazing Spider-Man 185, Amazing Spider-Man 213, Amazing Spider-Man 245, Amazing Spider-Man 326
Is there a way to write a Python code that does this?
Thank you so much!!