summaryrefslogtreecommitdiff
path: root/loader
AgeCommit message (Collapse)Author
2025-06-16Add subscriber_access field to schema.sqlMagnus Hagander
There is still a weird mix between what's in migrations and what's in schema.sql, but at least they should try to be mostly in sync. Author: Jelte Fennema-Nio <github-tech@jeltef.nl>
2025-06-16Allow passing filenames to --mbox that contain parenthesesJelte Fennema-Nio
Without this you would get an error like this: Failed to parse mbox: b'/bin/sh: 1: Syntax error: "(" unexpected\n' This especially matters when loading files downloaded with a browser, since those often contain (1) or (2) if a file with the same name was downloaded earlier.
2023-04-11Add JIS encodings to message loadingCélestin Matte
2023-03-23Fix typoCélestin Matte
2022-04-01Fix bytes/str handling of secondary text parts in messagesMagnus Hagander
This was broken in the python 2->3 migration, but is apparently an uncommon enough case that it wasn't properly spotted until now. Reported and pointers in the right direction from Andres Freund
2022-01-30Ensure pglister_sync includes a value for subscriber_accessMagnus Hagander
This has a default=False set in the django model, but django does not propagate that into the database which would cause the insert of new lists to fail with a not-null-violation. Spotted by Célestin Matte
2021-11-24Add pglister section in archives.ini.sampleCélestin Matte
This section is expected by pglister_sync.py but missing from the sample file
2021-11-04Fix ancient error fromt he 2to3 conversionMagnus Hagander
Spotted by Célestin Matte
2021-10-23Clean up accidentally commited filesMagnus Hagander
Back in 2018, commit 4d159ca accidentally included unrelated functionality which was not completed. This causes the database to be out of sync with the models defined in the code. This commit reverts those parts that were not supposed to be included and leaves the changes that actually were. The code can be re-added once completed... Spotted by Célestin Matte
2020-08-11Updaate regexp escaping syntax to please pep8Magnus Hagander
2020-08-11Update exception catching syntax to please pep8Magnus Hagander
2020-06-01Support overwriting messages in load_messages.pyMagnus Hagander
Previously overwriting was only allowed from reparse_messages.py, in which case it would only reparse the existing message. For the usecase of overwriting the raw contents and then also reparsing the result, the --overwrite switch can now be passed to load_messages.py.
2020-04-01Mark all ForeignKeys as on_delete=CASCADEMagnus Hagander
2020-03-30Set a timeout for sendinv Varnish purge requestsMagnus Hagander
Timeout error is better than hanging forever. Set a timeout for 30 seconds which is with a big margin longer than should ever be needed.
2020-01-29Fix logging of load errorsMagnus Hagander
Broken in python3 migration, shows how often we have load errors these days.
2020-01-29Fixes for newer pep8Magnus Hagander
2019-06-19Remove extra : at end of xkeyMagnus Hagander
2019-06-19Fix typoMagnus Hagander
2019-06-19Use xkey instead of regexp when purging threads and listsMagnus Hagander
2019-06-18Implement email resending in the list archivesMagnus Hagander
This allows a logged-in user to get an email delivered to their mailbox, thereby making it easy to reply to even if they haven't got it already (and don't have a MUA capable of handling mbox files). The email body will go out unmodified (including any list headers that are stored in the archives, but this does not include for example the unsubscribe link). Envelope sender is set to one configured in the ini file, and envelope recipient is set to the email address of the user.
2019-06-18Remove settings entry from sample that are not usedMagnus Hagander
2019-05-17Ensure array of usernames are text[]Magnus Hagander
Empty arrays otherwise have no types in PostgreSQL, which would cause an error.
2019-01-08Fix date parsing to be even more forgivingMagnus Hagander
In particular, if parsing the date either fails or if it results in a date that's in the future, fall back to parsing the dates out of the Received: headers instead, because at some point there we will find a parsable date for sure (if not before then when it hit one of our own servers)
2019-01-07Detect and show date changes in reparse messageMagnus Hagander
2019-01-07Remove clean_date.py, because it relied on the old archivesMagnus Hagander
2019-01-04Fix comparison operatorsMagnus Hagander
2019-01-04Fix incorrect importMagnus Hagander
2019-01-04Fix bad multi-command linesMagnus Hagander
2019-01-04Whitespace fixesMagnus Hagander
2019-01-04Fix indentationMagnus Hagander
Per pep8 warnings, adjust indentation for consistency
2019-01-04Tabs to 4 spacesMagnus Hagander
pep8 standard for indentation
2019-01-04Trap internal AssertionError from python librariesMagnus Hagander
For some really broken messages, we end up in a cannot-happen codepath. Trap this one and just consider that MIME part empty, and try again later. In passing, also change it so we continue loading after failures of parsing. We continued in the mode where we just generated diffs, but not when making updates. Now continue in both cases, but of course don't do the actual update if the parsing failed.
2019-01-03Update loader scripts to use python3 syntaxMagnus Hagander
Some minor cleanups as well, but mostly just the output of the 2to3 tool and some manual changes.
2019-01-03Use "in" syntax instead of has_key()Magnus Hagander
has_key() has been deprecated for a while and will be gone in Python3. The in syntax is available in both the old and the new versions.
2018-12-03Track load date of messagesMagnus Hagander
2018-11-30Switch to using tidylib rather than tidyMagnus Hagander
tidylib (http://countergram.github.io/pytidylib/) is maintained, the old tidy one (https://cihar.com/software/utidylib/) is not. And in particular, python3 support is in the new one. Generates some minor changs in the existing archives, but it seems to be just whitespace and some actual incorrectness in the old output.
2018-11-29Use advisory lock around load_message.pyMagnus Hagander
Avoid loading two messages at the same time. In particular this can cause issues if it's two copies of the same message on different lists, which can cause a UNIQUE violation in the loader. It could also be a problem if two messages on a new thread arrives in parallel, which could cause two separate threads to be created. This could be made more efficient by properly ordering the operations on storage and using ON CONFLICT, but it's a very rare occassion and it doesn't matter that we have to wait for a second or two for a previous storage to complete.
2018-11-20Remove spaces in messageidsMagnus Hagander
They shouldn't be there in the first place. Sigh. But if they're there just pretend they don't exist, so we get a working messageid.
2018-11-20Proper attempt at correctly updating header fieldsMagnus Hagander
2018-11-20Revert "Actually update header fields when they have changed"Magnus Hagander
That commit was backwards. Oops.
2018-11-20Fix accidental reversing of printed manual header diffsMagnus Hagander
2018-11-20Actually update header fields when they have changedMagnus Hagander
2018-11-20One more round of header replacement fixesMagnus Hagander
2018-11-20When reparsing, show both header and body changesMagnus Hagander
Previously, only the body changes would show up in the diff, but we'd actually make updates on the headers as well.
2018-11-20Add another strange timezone offset formatMagnus Hagander
2018-11-20Use the tsparser text search parser by defaultMagnus Hagander
2018-07-06Make error message on messageid-refind-failure more helpfulMagnus Hagander
2018-07-06Prompt before committing the reparse transactionMagnus Hagander
2018-07-06Don't change messages if they haven't changedMagnus Hagander
Created a *lot* of unnecessary I/O
2018-06-20Show progress in percent when reparsing large sets of emailsMagnus Hagander