fix read write tmpfile on windows#323
Conversation
| DocumentLayout by using a model identified by model_name.""" | ||
| with tempfile.NamedTemporaryFile() as tmp_file: | ||
| tmp_file.write(data.read()) | ||
| tmp_file.flush() # Make sure the file is written out |
There was a problem hiding this comment.
instead of creating a separate file, could you try with this smaller change:
with tempfile.NamedTemporaryFile() as tmp_file:
tmp_file.write(data.read())
tmp_file.flush()
tmp_file.seek(0)
layout = process_file_with_model(...)
Unfortunately I don't have a windows machine to check myself.
There was a problem hiding this comment.
Nope, probably it's not this issue - I'd try to create tempfile with options:
tempfile.NamedTemporaryFile(delete=True, delete_on_close=False)According to windows recommendations from here:
https://docs.python.org/3/library/tempfile.html
There was a problem hiding this comment.
Nope, probably it's not this issue - I'd try to create tempfile with options:
tempfile.NamedTemporaryFile(delete=True, delete_on_close=False)According to windows recommendations from here: https://docs.python.org/3/library/tempfile.html
Note that this only exists for Python 3.12, the docs for even Python 3.11 do not have the delete_on_close parameter: https://docs.python.org/3.11/library/tempfile.html#tempfile.NamedTemporaryFile
The reason I'm commenting is because I'm also running into this issue on windows
There was a problem hiding this comment.
And wound it defeat the purpose of a temp file to leave it on device after things have been done with it?
I guess you could do something like this if you dont want to use a temp folder
def process_data_with_model(
data: BinaryIO,
model_name: Optional[str],
**kwargs,
) -> DocumentLayout:
"""Processes pdf file in the form of a file handler (supporting a read method) into a
DocumentLayout by using a model identified by model_name."""
file_name = ''
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
tmp_file.write(data.read())
tmp_file.flush() # Make sure the file is written out
file_name = tmp_file.name
try:
layout = process_file_with_model(
file_name,
model_name,
**kwargs,
)
finally:
os.remove(file_name)
return layout
This is a low impact fix for reading still open temporary files on windows.
#303