summaryrefslogtreecommitdiff
path: root/loader/lib/parser.py
AgeCommit message (Collapse)Author
2023-04-11Add JIS encodings to message loadingCélestin Matte
2022-04-01Fix bytes/str handling of secondary text parts in messagesMagnus Hagander
This was broken in the python 2->3 migration, but is apparently an uncommon enough case that it wasn't properly spotted until now. Reported and pointers in the right direction from Andres Freund
2020-08-11Updaate regexp escaping syntax to please pep8Magnus Hagander
2020-08-11Update exception catching syntax to please pep8Magnus Hagander
2020-04-01Mark all ForeignKeys as on_delete=CASCADEMagnus Hagander
2019-01-08Fix date parsing to be even more forgivingMagnus Hagander
In particular, if parsing the date either fails or if it results in a date that's in the future, fall back to parsing the dates out of the Received: headers instead, because at some point there we will find a parsable date for sure (if not before then when it hit one of our own servers)
2019-01-04Fix comparison operatorsMagnus Hagander
2019-01-04Fix bad multi-command linesMagnus Hagander
2019-01-04Whitespace fixesMagnus Hagander
2019-01-04Fix indentationMagnus Hagander
Per pep8 warnings, adjust indentation for consistency
2019-01-04Tabs to 4 spacesMagnus Hagander
pep8 standard for indentation
2019-01-04Trap internal AssertionError from python librariesMagnus Hagander
For some really broken messages, we end up in a cannot-happen codepath. Trap this one and just consider that MIME part empty, and try again later. In passing, also change it so we continue loading after failures of parsing. We continued in the mode where we just generated diffs, but not when making updates. Now continue in both cases, but of course don't do the actual update if the parsing failed.
2019-01-03Update loader scripts to use python3 syntaxMagnus Hagander
Some minor cleanups as well, but mostly just the output of the 2to3 tool and some manual changes.
2019-01-03Use "in" syntax instead of has_key()Magnus Hagander
has_key() has been deprecated for a while and will be gone in Python3. The in syntax is available in both the old and the new versions.
2018-11-30Switch to using tidylib rather than tidyMagnus Hagander
tidylib (http://countergram.github.io/pytidylib/) is maintained, the old tidy one (https://cihar.com/software/utidylib/) is not. And in particular, python3 support is in the new one. Generates some minor changs in the existing archives, but it seems to be just whitespace and some actual incorrectness in the old output.
2018-11-20Remove spaces in messageidsMagnus Hagander
They shouldn't be there in the first place. Sigh. But if they're there just pretend they don't exist, so we get a working messageid.
2018-11-20One more round of header replacement fixesMagnus Hagander
2018-11-20Add another strange timezone offset formatMagnus Hagander
2017-04-10Fix silly overlook in importsMagnus Hagander
2017-04-10Try to decode attachment filenames when escapedMagnus Hagander
Some MUAs (notably gmail at least) can generate header-escaped filenames for attachments, if non-ascii characters are included. If this happens, decode them and try to use that rather than generating filenames with escaping in them.
2016-12-17Exclude pkcs7 signatures in attachmentsMagnus Hagander
Treat them the same way we do with detatched pgp-signatures, which is simply don't process them as attachments.
2016-03-02Forcibly remove \0 at the end of a decoded messageMagnus Hagander
This happens fairly commonly with some broken MUAs it seems.
2016-02-14Actually store the raw data in rawtxtMagnus Hagander
Previously, we would parse the message and then reconstruct it. This refolds the headers, as well as breaks From rows in the body. Oops. Instead, materalize the data into rawtxt and then parse that, instead of the other order.
2013-08-17Properly recurse into multipart/sign:ed email partsCédric Villemain
Previously we'd only recurse into multipart/mixed, but this would miss PGP-signed attachments sent by some MUAs.
2013-01-09Turn any non-first text/plain parts into attachmentsMagnus Hagander
Instead of ignoring them because they're text/plain, only ignore the first one and specifically the one matching our footers. This should deal with the case when there is a textfile attached that has no name.
2013-01-05Properly parse attachments of type=text/plain, content-disposition=attachmentMagnus Hagander
Previously we'd only parse them if they were given an explicit name, which is not required - instead, they can have a filename...
2012-08-12Another typo, i thinkMagnus Hagander
2012-08-12Missing object referenceMagnus Hagander
2012-08-09More forgiving parsing of emails with broken header encodingMagnus Hagander
2012-07-10Turn rawtxt into a bytea, since w edon't know the encodingMagnus Hagander
2012-07-09Fail date parsing on empty datesMagnus Hagander
2012-07-09Store the raw text of messages.Magnus Hagander
Also add deferred loading of all large (possibly TOASTable) columns not needed in the django views
2012-07-07Oops.. Misisng a reference there, are we.. :)Magnus Hagander
2012-07-07Attempt to get rid of the postgresql specific mail footer on all listsMagnus Hagander
2012-07-06Parsing didn't work, and it's not enough messages to really care about..Magnus Hagander
2012-07-06Specify encoding of fileMagnus Hagander
2012-07-06Badly encoded name of encoding :OMagnus Hagander
2012-07-06Support filtering a single email out of mbox/directory for reloadingMagnus Hagander
2012-07-06Add parameter to override the date of a messageMagnus Hagander
When they're so badly formatted we can't figure out a way to clean it..
2012-07-06typoMagnus Hagander
2012-07-06more date stuffMagnus Hagander
2012-07-06Strange spelling of gmt..Magnus Hagander
2012-07-06More timezonesMagnus Hagander
2012-07-06Silly - needs to be lowercase :SMagnus Hagander
2012-07-06Remove dead codeMagnus Hagander
2012-07-06Handle empty bodies instead of giving an errorMagnus Hagander
typical case: someone sends an attachment with just a subject. this is not an error, but also has no body..
2012-07-06Work around more broken datesMagnus Hagander
2012-07-06One more round of encodingsMagnus Hagander
2012-07-05Don't crash on non-multipart messages that appeaer to be multipartMagnus Hagander
2012-07-05Clean up UTF surrogate points in unicode dataMagnus Hagander
They shouldn't be therein the first place, but when theyd o show up, there's a bug in python 2 (fixed in py 3) that lets them through, and postgresql barfs on them...