Uncategorized

pymupdf – Extract Pie Chart (other similar charts) data from PDF using Python


This is one approach using the PyPDF2 library.

  1. Convert PDF to Image using PyMuPDF lib or pdf2image to convert the PDF pages to images.
  2. You could install pdf2image using: pip install pdf2image
  3. Use an image processing library like OpenCV or PIL to analyze the pie chart image and extract data, and finally once you’ve successfully extracted the image data, you can use image analysis techniques to determine the size or percentage of each slice in the pie chart.

Here’s a small example using OpenCV

#Load the image
image_path="path/to/your/image.png"
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

#Thresholding
_, thresh = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY_INV)

#Find contours
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

#Print the number of contours (slices in the pie chart)
print("Number of slices:", len(contours))

#Draw the contours on the image
cv2.drawContours(image, contours, -1, (0, 255, 0), 2)

#Display the image
cv2.imshow('Image with Contours', image)



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *