Closed
Description
Hi @Byron !
So, someone find this bug using my tool, and it's propagated to GitPython. The problem is in the decoding of a non-ascii character.
repo: https://github.com/gentoo/gentoo
commit: 13e644bb36a0b1f3ef0c2091ab648978d18f369d
code:
from git import Repo, Commit
gr = Repo('/tmp/gentoo')
c = gr.commit('13e644bb36a0b1f3ef0c2091ab648978d18f369d')
print(c.authored_date)
This returns:
Traceback (most recent call last):
File "/Users/dspadini/Documents/pydriller/tmp.py", line 341, in <module>
print(c.authored_date)
File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/gitdb/util.py", line 253, in __getattr__
self._set_cache_(attr)
File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 144, in _set_cache_
self._deserialize(BytesIO(stream.read()))
File "/Users/dspadini/Documents/pydriller/venv/lib/python3.7/site-packages/git/objects/commit.py", line 502, in _deserialize
self.gpgsig = sig.rstrip(b"\n").decode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 75: ordinal not in range(128)
The problem is in this line. The line to be decoded is the following:
b'-----BEGIN PGP SIGNATURE-----\nVersion: GnuPG v2.1\nComment: Signed-off-by: J\xc3\xb6rg Bornkessel <hd_brummy@gentoo.org>\n\n.........\n-----END PGP SIGNATURE-----\n'
As you can see, at the beginning we have J\xc3\xb6rg
. This fails the decoding.
So, I tried to change .decode('ascii')
to .decode('UTF-8')
and it works.
Also, changing .decode('ascii')
to .decode('ascii', 'ignore')
works.
However, I am not sure whether I should do it. Why is ascii
in the first place (instead of UTF-8)?
Are we gonna break tests with this change?