Batch jobs
==========

The system relies on a number of batch jobs that run on the webserver
or on another machine, feeding data into the system. These batch jobs
should generally be run under a user account that does *not* have write
permissions in the web directories - any exceptions should be clearly
noted here. Most of the jobs should run regularly from cron, some
should be run manually when required.

All batch jobs are located in the directory tools/.

docs/docload.py
---------------
This script will load a new set of documentation. Simply specify the
version to load and point out the tarball to load from. The script
will automatically decompress the tarball as necessary, and also
perform HTML tidying of the documentation (since the HTML generated by
the PostgreSQL build system isn't particularly standards-conforming or
nice-looking).

ftp/spider_ftp.py
-----------------
This script needs to be run on the *ftp server*, not on the
webserver. It will generate a python pickle file that is then automatically
uploaded to the webserver, which will write it down (thus, this is
the one directory where the webserver does need write permissions).
The IP address of the machine(s) allowed to upload the ftp pickle
are defined in settings.FTP_MASTERS.

moderation/moderation_report.py
-------------------------------
This script enumerates all unmoderated objects in the database and
generates an email to the NOTIFICATION_EMAIL address if there are any
pending, to prod the moderators to do their job.

rss/fetch_rss_feeds.py
----------------------
This script will connect to all the RSS feeds registered in the RSS
application and fetch their articles into the database. It's not very
accepting of strange RSS feeds - it requires them to be "nicely
formatted". Usually that's not a problem since we only pull in the
headlines and not the contents. For a more complete RSS fetcher that
stores the data in a PostgreSQL database, see the "hamn" project that
powers planet.postgresql.org - also available on git.postgresql.org.