Skip to content

file.seek(file.tell()) needed: bad docs or a bug? #93079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Akuli opened this issue May 22, 2022 · 1 comment
Open

file.seek(file.tell()) needed: bad docs or a bug? #93079

Akuli opened this issue May 22, 2022 · 1 comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-IO type-bug An unexpected behavior, bug, or error

Comments

@Akuli
Copy link
Contributor

Akuli commented May 22, 2022

The following produces a file containing b'hello\n ', i.e. hello followed by a newline and a space:

with open("asd.txt", "w") as file:
    file.write("hello\n")

with open("asd.txt", "r+") as file:
    file.read(4)
    file.write(" ")

But if I add file.seek(file.tell()), then it instead produces b'hell \n' as I would expect:

with open("asd.txt", "w") as file:
    file.write("hello\n")

with open("asd.txt", "r+") as file:
    file.read(4)
    file.seek(file.tell())
    file.write(" ")

Is this correct behaviour? I can't find any mention of it in the docs. (I can't use file.seek(4) because that "produces undefined behaviour" according to docs.)

@Akuli Akuli added the type-bug An unexpected behavior, bug, or error label May 22, 2022
@duaneg
Copy link

duaneg commented Apr 27, 2025

This is caused by the TextIOBase wrapper, which is used because the file is opened as a text file. Since its read method returns characters, not bytes, it needs to do read ahead and decode the results. This sets the stream position in the underlying file further than might be expected. The subsequent write is then at the wrong position (in this case, the end of the file).

There is a bunch of logic in _io_TextIOWrapper_tell_impl to adjust the position backwards to take into account data that has been read ahead into the decode buffer but not decoded and read by the user, hence why the seek(tell()) works as expected. Arguably something equivalent should be done when writing. I note the _io_TextIOWrapper_write_impl function has a comment "XXX What if we were just reading?", which suggests the original authors were aware this was a potential bug, but didn't get around to fixing it.

I haven't tried yet, but this might be able to be fixed without too much trouble by discarding the decode buffer and doing an implicit seek(tell()) on write if a decode buffer exists.

However...writing to an existing text file with a variable-length encoding is just inherently unsafe! It might overwrite the first byte(s) of a multibyte character. Writing anywhere but the end of a text file should arguably be considered undefined behaviour, in the same way that seeking anywhere but the end, beginning, or tell result is. It will work if you know what you are doing and are very careful, but otherwise you will likely end up with a file containing invalid unicode data:

with open("asd.txt", "r+") as file: # On my machine this uses utf-8 encoding, YMMV
    file.write("āēīōū\n")
    file.seek(0)
    file.write(' ')
    file.seek(0)
    print(file.read()) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte

I'd be interested in hearing what the maintainer (ping @benjaminp) thinks, but personally I'd be inclined to treat this as a documentation issue rather than something to be fixed in code.

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-IO type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants