Batch jobs ========== The system relies on a number of batch jobs that run on the webserver or on another machine, feeding data into the system. These batch jobs should generally be run under a user account that does *not* have write permissions in the web directories - any exceptions should be clearly noted here. Most of the jobs should run regularly from cron, some should be run manually when required. All batch jobs are located in the directory tools/. docs/docload.py --------------- This script will load a new set of documentation. Simply specify the version to load and point out the tarball to load from. The script will automatically decompress the tarball as necessary, and also perform HTML tidying of the documentation (since the HTML generated by the PostgreSQL build system isn't particularly standards-conforming or nice-looking). ftp/spider_ftp.py ----------------- This script needs to be run on the *ftp server*, not on the webserver. It will generate a python pickle file that is then automatically uploaded to the webserver, which will write it down (thus, this is the one directory where the webserver does need write permissions). The IP address of the machine(s) allowed to upload the ftp pickle are defined in settings.FTP_MASTERS. moderation/moderation_report.py ------------------------------- This script enumerates all unmoderated objects in the database and generates an email to the NOTIFICATION_EMAIL address if there are any pending, to prod the moderators to do their job. rss/fetch_rss_feeds.py ---------------------- This script will connect to all the RSS feeds registered in the RSS application and fetch their articles into the database. It's not very accepting of strange RSS feeds - it requires them to be "nicely formatted". Usually that's not a problem since we only pull in the headlines and not the contents. For a more complete RSS fetcher that stores the data in a PostgreSQL database, see the "hamn" project that powers planet.postgresql.org - also available on git.postgresql.org.