I’m currently trying to write a function in Python that will allow me to extract text from .docx files. For this I use the python-docx library. My program also does what it’s supposed to do, at least when I create a docx file in Python and then use my function on this file it returns the text to me.
However, for .docx files (word documents) that I have modified or created, it cannot find the path and returns PackageNotFoundError. I came across the Internet to check whether my file is a zip file. I did this with zipfile and in fact my saved word documents are not zipfiles. What’s going on? My python code again for verification:
from zipfile import is_zipfile
import docx
doc = docx.Document()
doc.add_paragraph(“Hello”)
doc.save(test_path)
print(is_zipfile(test_path))
//output = true
If I then go into this test_path, type a number and save ->
print(is_zipfile(test_path))
//output = false
Are modern .docx documents no longer zip files? Or what wrong here?
When googling everywhere is written that word documents/.docx files are zip files. I think that is the problem why the libary gives me the error code and cannot open the file.
I appreciate everyone trying to help. Thanks