Try to decode attachment filenames when escaped
authorMagnus Hagander <magnus@hagander.net>
Mon, 10 Apr 2017 12:16:46 +0000 (14:16 +0200)
committerMagnus Hagander <magnus@hagander.net>
Mon, 10 Apr 2017 12:16:46 +0000 (14:16 +0200)
Some MUAs (notably gmail at least) can generate header-escaped filenames
for attachments, if non-ascii characters are included. If this happens,
decode them and try to use that rather than generating filenames with
escaping in them.

loader/lib/parser.py

index 770db6ab59b695cf90eed3feb9f0092d509515ca..d8c8cc3c288ee560cbcb0c8799feffeb9fc000f4 100644 (file)
@@ -242,8 +242,12 @@ class ArchivesParser(object):
                self.attachments_found_first_plaintext = False
                self.recursive_get_attachments(self.msg)
 
+       # Clean a filenames encoding and return it as a unicode string
        def _clean_filename_encoding(self, filename):
-               # Clean a filenames encoding and return it as a unicode string
+               # If this is a header-encoded filename, start by decoding that
+               if filename.startswith('=?'):
+                       decoded, encoding = email.header.decode_header(filename)[0]
+                       return unicode(decoded, encoding, errors='ignore')
 
                # If it's already unicode, just return it
                if isinstance(filename, unicode):