
Hotels in Lliguimán, Chile.
Book your hotel now!Dockerising a Contao website II - 推酷
Dockerising a Contao website II
In aprevious post I showed how to run a
website in a
infrastructure. That was already a good opening. However, after working with that setup for some time I discovered a few issues…
A central idea of Docker is to install the application in an image and mount persistent files into a running container. Thus, you can just throw away an instance and start a new one very quickly. Unfortunately, using Contao that’s not that straight-forward – at least when using
Here I’m describing how I fought the issues:
Issue with Cron
The first issue was
. This cron works as follows:
The browser requests a file
, which is supposed to contain the timestamp of the last cron run.
If the timestamp is “too” old, the browser will also request a
, which then runs overdue jobs.
If a job was run, the timestamp in
will be updated, so
won’t be run every time.
Good, but that means, the
will only be written, when a cron job was executed. But let’s assume the next run of a cron job will only be next week end! Then, every user creates a
error, which is of course ugly and spams the logs..
Especially when using Docker you will hit this scenario every time when starting a new container. The last cron-run-time is stored in the database, but the
won’t exist by default. So typically you’ll see lot’s of 404s until the next cron job is executed.
I fixed that by
. The fix is already merged into the official
. In addition, I’m initialising the
in my Docker image with a time stamp of
Issues with Proxies
A typical Docker infrastructure (at least for me) consists of bunch containers orchestrated in various networks etc.. Usually, you’ll have at least one proxy, which distributes HTTP request to the containers in charge. However, I experienced a few issues with my proxy setup:
While the connection between client (user, web browser) and reverse proxy is SSL-encrypted, the proxy and the webserver talk plain HTTP. It’s the same machine, so there is no big need to waste time on encryption. Even though the reverse proxy properly sends the
, Contao only sees incomming HTTP and uses
-URLs in all documents… Even if you ignore the mixed-content issue and/or implement a rewrite of HTTP to HTTPS at the web-server-layer, this will produce twice as much connections as necessary!
The solution is however not that difficult. Contao does not understand
, but it recognises the
variable. Thus, to fix that issue you just need to add the following to your
$_SERVER['HTTPS'] = 1;
This will however generate URLs including the port number (e.g.
), but they are perfectly valid. (Not like
or something that I saw during my tests… ;-)
URL encodings in the Sitemap
The previous fix brought up just another issue: The URL encoding in the sitemap breaks when using the port component (
).. All URLs were
before writing them to the sitemap. However,
encodes quite a lot! Among others, it converts
. Thus, all URLs in my sitemap looked like this:
- which is obviously invalid.
instead to encode the URLs, but it was finally fixed by
and should be working in
Issues with Cache and Assets etc
A more delicate issue are cache and assets and sitemaps etc. Contao’s backend comes with convenient buttons to clear/regenerate these files and to create the search index. Yet, you don’t always want to login to the backend when recreating the Docker container.. Sometime you simply can’t - for example, if the container needs to be recreated for some reason over night.
Basically, that is not a big issue. Assets and cache will be regenerate once they are needed. But the sitemaps, for instance, will only be generated when interacting with the backend.
Thus, we need a solution to create these files as soon as possible, preferably in the background after a container is created. Most of the stuff can be done by the
, but I also have some personal scripts developed by a company, that require other mechanisms and are unfortunately not properly integrated into Contao’s hooks landscape. To generate all assets (images and scripts etc), we need to access every single page at the frontend. This will trigger Contao to create the assets and cache, and subsequent requests from real-life users will be much faster!
The best that I came up with so far looks like the following script, that I stored in the
directory of our Contao instance:
define ('TL_MODE', 'FE');
require __DIR__ . '/../system/initialize.php';
$THISDIR = realpath (dirname (__FILE__));
$auto = new \Automator ();
// purge stuff
$auto-&purgeSearchTables ();
$auto-&purgeImageCache ();
$auto-&purgeXmlFiles ();
// regenerate stuff
$auto-&generateXmlFiles ();
// get all fe pages
$pages = \Backend::findSearchablePages();
if (isset($GLOBALS['TL_HOOKS']['getSearchablePages']) && is_array($GLOBALS['TL_HOOKS']['getSearchablePages'])) {
foreach ($GLOBALS['TL_HOOKS']['getSearchablePages'] as $callback) {
$classname = $callback[0];
if (!is_subclass_of ($classname, 'Backend'))
(new $classname ())-&{$callback[1]} ($pages);
// request every fe page to generate assets and cache and search index
curl_setopt($ch, CURLOPT_USERAGENT, 'conato-cleaner');
# maybe useful to speed up:
#curl_setopt($ch, CURLOPT_MAXCONNECTS, 50);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_TIMEOUT_MS, 150);
#curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS, 150);
foreach ($pages as $page) {
curl_setopt($ch, CURLOPT_URL, $page);
The first 3 lines initialise the Contao environment. Here I assume the
exists (e.g. save script in the
directory). The next few lines purge existing cache using the Automator tool and subsequently regenerate the cache –-)
Finally, the script collects all “searchable pages” using the
, enriches the set with additional pages that may be hooked-in through
, and then uses
to iteratively request each page.
The first part should be reasonably fast, so clients may be willing to wait for the recreation of the cache stuff. Accessing every page, however, may require a significant amount of time! Especially for larger web pages.. Thus, I embedded everything in the following skeleton, which advises the browser to close the connection before we start the time-consuming tasks:
* start capturing output
ob_end_clean ();
ignore_user_abort ();
ob_start() ;
* run the tasks that you want your users to wait for
// e.g. purge and regenerate cache/sitemaps/assets
$auto = new \Automator ();
$auto-&purgeSearchTables ();
* flush the output and tell the browser to close the connection as soon as it received the content
$size = ob_get_length ();
header (&Connection: close&);
header (&Content-Length: $size&);
ob_end_flush ();
* from here you have some free computational time
// e.g. collect pages and request the web sites
// users will already be gone and the output will (probably) never show up in a browser.. (but don't rely on that! it's still sent to the client, it's just outside of content-length)
$pages = \Backend::findSearchablePages();
In addition, I created some
to automatically regenerate missing files. For example, for the sitemaps I added the following to the vhost config (or
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/share/(.*)\.xml.*$ https://example.com/files/SCRIPT_FROM_ABOVE.php?target=sitemap&sitemap=$1 [R=302,L]
That means, if for example
not yet exists, the user gets automagically redirected to the script from above! In addition, I added some request parameters (
), so that the script above
which file was requested. It can then regenerate everything and immediately output the new content! :)
For example, my snippet to regenerate and serve the sitemap looks similar to this:
$auto = new \Automator ();
$auto-&generateXmlFiles ();
if ($_GET['target'] == 'sitemap') {
$sitemaps = $auto-&purgeXmlFiles (true);
foreach ($sitemaps as $sitemap) {
if ((!isset ($_GET['sitemap']) || empty ($_GET['sitemap'])) || $_GET['sitemap'] == $sitemap) {
$xmlfile = $THISDIR . &/../share/& . $sitemap . &.xml&;
// if it still does not exists -& we failed...
if (!file_exists( $xmlfile )) {
// error handling
// otherwise, we'll dump the sitemap
header (&Content-Type: application/xml&);
readfile ($xmlfile);
if (!$found) {
// error handling
Thus, the request to
will never fail. If the file does not exist, the client will be redirected, the file will be regenerated, and the new contents will immediately be served.
Please be aware, that this script is easily
-able! Attackers may produce a lot of load by accessing the file. Thus, I added some simple DOS protection to the beginning of the script, which makes sure the whole script is not run more than once per hour:
$runcheck = &/tmp/.conato-cleaner-timestamp&;
if (file_exists ($runcheck)) {
if (filemtime ($runcheck) & time () - 3600) {
if (!$dryrun)
touch ($runcheck);
, it won’t regenerate cache etc, but still serve the sitemap and other files if requested..
As I said earlier, my version of the script contains plenty of personalised stuff. That’s why I cannot easily share it with you.. :(
However, if you have trouble implementing it yourself just let me know :)
权限设置: 公开
Хотели в Lliguimán, Чили.
Резервирай своя хотел сега!}


更多关于 功放分区是什么意思 的文章


