4.3 Access Gateway Content Settings

One of the major benefits of using an Access Gateway to protect Web resources is that it can cache the requested information and send it directly to the client browser rather than contacting the origin Web resource and waiting for the requested information to be sent. This can significantly accelerate access to the information.

IMPORTANT:For caching to work correctly, the Web servers must be configured to maintain a valid time. If possible, they should be configured to use an NTP server.

The object cache on an Access Gateway is quite different from a browser’s cache, which all users access when they click the Back button and which can serve stale content that doesn’t accurately reflect the fresh content on the origin Web server.

The Access Gateway caching system uses a number of methods to ensure cache freshness. Most time-sensitive Web content is flagged by Webmasters in such a way that it cannot become stale unless a caching system ignores the Webmaster’s settings. The Access Gateway honors all RFC 2616 directives that affect cache freshness such as Cache-Control, If-Modified-Since, and Expires.

The Access Gateway can be fine-tuned for cache freshness in the following ways:

  • Accelerated checking of objects that have longer than desirable Time to Expire headers

  • Delayed checking of objects that have shorter than desirable Time to Expire headers

  • Checking for freshness of objects that do not include Time to Expire headers

The Access Gateway follow RFC directives. In addition, the Access Gateway Service uses the “Apache Module mod_file_cache”.

The following sections describe the features available to fine-tune this process for your network:

4.3.1 Configuring Caching Options

The Cache Options allow you to control how the Access Gateway caches objects.

  1. Click Access Gateways > Edit > Cache Options.

  2. To disable caching of all Web server content, select Disable Caching.

    When this option is selected, all other caching options are disabled.

  3. Modify the Cache Freshness settings. Use the Reset button to return these settings to their default values.

    These options govern when the proxy service revalidates requested cached objects against those on their respective origin Web servers. If the objects have changed, the proxy service re-caches them.

    WARNING:Enter whole number values. Decimal values (2.5) are not supported and generate an XML validation error.

    HTTP Maximum: Specifies the maximum time the proxy service serves HTTP data from cache before revalidating it against content on the origin Web server. No object is served from cache after this value expires without being revalidated.

    This overrides a freshness or Time to Expire directive specified by the Webmasters if they specified a longer time.

    You use this value to reduce the maximum time the proxy service waits before checking whether requested objects need to be refreshed. The default is 6 hours.

    HTTP Default: Specifies the maximum time the proxy service serves HTTP data for which Webmasters have not specified a freshness or Time to Expire directive. The default is 2 hours.

  4. To save your changes to browser cache, click OK.

  5. To apply the changes, click the Access Gateways link, then click Update > OK.

4.3.2 Controlling Browser Caching

Webmasters control how browsers cache information by adding the following cache-control directives to the HTTP headers:

Cache-Control: no-store
Cache-Control: no-cache
Cache-Control: private
Cache-Control: public
Pragma: no-cache

You can configure how the proxy service responds to these directives in the HTTP header.

  1. In the Administration Console, click Devices > Access Gateways > Edit > [Name of Reverse Proxy] > [Name of Proxy Service] > HTTP Options.

  2. To mark all pages coming through this host as cacheable on the browser, select Allow Pages to be Cached by the Browser.

    When this option is enabled, the no-cache and no-store headers are not injected into the HTTP header.

    You need to select this option if you have a back-end application that updates the data in the Last-Modified or ETag HTTP headers. These changes are forwarded from the Web server to the browser only when this option is enabled.

    You need to select this option if you want the Expires HTTP header forwarded from the Web server to the browser.

    If this option is not selected, all pages are marked as non-cacheable on the browser. This forces the browser to request a resend of the data from the Access Gateway when a user returns to a previously viewed page.

  3. For the Access Gateway Service, it is always enabled. For information about this option, see Section 4.2.8, Configuring X-Forwarded-For Headers.

  4. Click OK.

  5. To apply the changes, click the Access Gateways link, then click Update > OK.

4.3.3 Configuring Custom Cache Control Headers

In addition to fine-tuning cache freshness by using the HTTP timers, as explained in Section 4.3.1, Configuring Caching Options, you can configure each proxy service to recognize custom headers in HTTP packets. Your Web server can then use these headers for transmitting caching instructions that only the Access Gateway can recognize and follow.

Understanding How Custom Cache Control Headers Work

Only the proxy service containing the custom header definition follows the cache policies specified in the custom headers.

All other proxy services, requesting browsers, and external proxy caches such as transparent caches and client accelerators do not recognize the custom headers. They follow only the cache policies specified by the standard cache control headers.

This means that you have the following options for configuring your Web server:

  • You can specify that browsers and external caches cannot cache the objects, but the proxy service can.

    This lets you off-load request processing from the origin Web server while still requiring that users return to the site each time they request an object.

  • You can also specify separate cache times for browsers, external caches, and the proxy service.

To implement custom cache control headers, you must do the following:

  • Configure a proxy service to use custom cache control headers by enabling the feature and specifying a header string such as MYCACHE (see Enabling Custom Cache Control Headers).

  • Configure the Web servers of the proxy service to send an HTTP header containing the defined string and the time in seconds that the object should be retained in cache (for example, MYCACHE: 60).

    If the number is non-zero, the Access Gateway treats the reply as if it has the following headers:

    Cache-Control: public
    Cache-Control: max-age=number
    

    If the number is zero (0), the Access Gateway treats the reply as if it has the following header:

    Cache-Control: no-cache
    
  • Ensure that the Web server continues to send standard HTTP cache-control headers so that browsers and external caches follow the caching policies you intend them to.

For example, you can configure the following:

  • Use an Expires or Cache-Control: Max-Age header to specify that browsers should cache an object for two minutes.

  • Use a Cache-Control: Private header to prevent external caches from caching the object at all.

  • Use a custom cache control header, such as MYCACHE: 1800, to indicate that the proxy service should cache the object for 30 minutes.

Custom Cache Control Headers override the following standard HTTP cache-control headers on the Access Gateway, but they do not affect how browsers and external caches respond to them:

Cache-Control: no-store
Cache-Control: no-cache
Cache-Control: max-age=number
Cache-Control: private
Cache-Control: public
Pragma: no-cache
Expires: date

Enabling Custom Cache Control Headers

  1. In the Administration Console, click Devices > Access Gateways > Edit > [Name of Reverse Proxy] > [Name of Proxy Service] > HTTP Options.

  2. To enable the use of custom headers, select Enable Custom Cache Control Header.

    With this option selected, the proxy service searches HTTP packets for custom cache control headers, and caches the objects according to its policies. The policy contains a timer, which specifies how long the object can be cached before checking with the Web server for updates.

  3. Select one of the following options to specify what occurs when the custom cache control expiration time expires.

    • Revalidate the object with a “Get-If-Modified”: Causes the proxy service to update the object in cache only if the object has been modified.

    • Always obtain a fresh copy of the object: Causes the proxy service to update the object in cache, even if the object has not been modified.

  4. In the Cache Control Header List, select New and specify a name for the header, for example MYCACHE.

  5. To save your changes to browser cache, click OK.

  6. To apply the changes, click the Access Gateways link, then click Update > OK.

  7. Modify the pages on the Web server that you want to the set custom caching intervals for the Access Gateway. To the HTTP header, add a string similar to the following:

    MYCACHE:600
    

    The numeric value indicates the number of seconds the Access Gateway can retain the object in cache. A value of zero prevents the Access Gateway from caching the object. This cache interval can be different than the value set for browsers (see Understanding How Custom Cache Control Headers Work).

  8. Ensure that the Web server continues to send the following standard HTTP cache-control headers:

    • Cache-Control: Max-Age headers that cause browsers to cache object for no longer than two minutes.

    • Cache-Control: Private headers that cause external caches to not cache the objects.

When your Web server sends an object with the MYCACHE header in response to a request made through the Access Gateway, the proxy service recognizes the custom header and caches the object for 10 minutes. Requesting browsers cache the object for only two minutes, and external caches do not cache the object.

Thus, the Access Gateway off-loads a processing burden from the Web server by caching the frequently requested objects for 10 minutes (the value you specified in Step 7). Browsers, on the other hand, must always access the Access Gateway to get the objects if their previous requests are older than two minutes. And the objects in the cache of the Access Gateway are kept fresh because of their relatively brief time-to-live value.

4.3.4 Configuring a Pin List

A pin list contains URL patterns for identifying objects on the Web. The Access Gateway uses the list to prepopulate the cache, before any requests have come in for the content. This accelerates user access to the content because it is retrieved from a local cache rather than from an exchange with the Web server, which would read it from disk.

You can use the pin list to specify the following:

  • Which objects you want to cache

  • Which objects you never want cached

The pin list is global to the Access Gateway and affects all protected resources. The objects remain in cache until their normal cache limits are reached or they are bumped out by more recently requested objects.

To configure a pin list:

  1. In the Administration Console, click Devices > Access Gateways > Edit > Pin List.

  2. Select the Enable Pin List option to enable the use of pinned objects. If this option is not selected, the pinned objects in the pin list are not used.

  3. In the Pin List section, click New.

  4. Fill in the following fields.

    URL Mask: Specifies the URL pattern to match. For more information, see URL Mask.

    Pin Type: Specifies how the URL is to be used to cache objects. Select from Normal and Bypass. For more information, see Pin Type.

  5. To save the list item, click OK.

  6. To save your changes to browser cache, click OK.

  7. To apply the changes, click the Access Gateways link, then click Update > OK.

URL Mask

The URL mask can contain complete or partial URL patterns. A single URL mask might apply to a large set of URLs, or it might be so specific that only a single file on the Web matches it.

The Access Gateway processes the masks in the pin list in order of specificity. A mask containing a hostname is more specific than a mask that specifies only a file type. The action taken for an object is the action specified for the first mask that the object matches.

The Access Gateways recognizes four levels of specificity, using the following format:

Level

Examples

hostname

http://www.foo.gov/documents/picture.gif
http://www.foo.gov/documents/*
http://www.foo.gov
foo.gov/documents/*
foo.gov/*

All of these are classified as hostnames, and they are ordered by specificity. The first item in the list is considered the most specific and is processed first. The last item is the most general and is processed last.

path

/documents/picture.gif
/documents/pictures.gif/*
/documents/*

Path entries are processed after hostnames. A leading forward slash must always be used when specifying a path, and the entry that follows must always reference the root directory of the Web server. In these examples, documents is the root directory.

The /* at the end of the path indicates that the entry is a directory. Its absence indicates that the entry is a file. In these examples, picture.gif is a file and pictures.gif/* and documents/* are directories.

If you enter a path without the trailing *, the path matches only the directory. With the trailing *, the path matches everything in the directory and its subdirectories.

These path entry examples are ordered by specificity. The objects in the /documents/picture.gifdirectory are processed before the objects in the /documents directory.

filename

/picture.gif
/widget.js
/widget.jp*g
/picture*group.gif
/DailyTask
/DailyTask*

Filenames are processed after paths. A leading forward slash must always be used when specifying a filename.

You can add asterisks in the file names.

file extension

/*.gif
/*.js
/*.htm

File extensions are processed last. They consist of a leading forward slash, an asterisk, a period, and a file extension.

NOTE:More than one wildcard is not allowed in a URL mask. For example, /*picture.g*f is not correct.

Also, the wildcard must be only in the last part of the path. For example:

Correct: /picture/*.gif

Incorrect: /documents/*/picture.gif

Specific rules have precedence over less specific rules. Thus, objects matched by a more specific rule are always processed according to its conditions. If a less specific rule also matches the object, the less specific rule is ignored for the object. For example, assume the following two entries are in the pin list:

URL Mask

Pin Type

Pin Links

http://www.foo.gov/documents/*

normal

1

www.foo*

bypass

N/A

The first entry, because it is most specific, caches the pages in the documents directory and follows any links on those pages and caches the linked pages. The second entry does not affect what the first entry caches, but it prevents any other domain extensions such as .com,.net, or .org whose DNS names begin with www.foo from being cached.

Pin Type

The pin type specifies how the Access Gateway caches objects that match the URL mask.

  • Normal: The Access Gateway handles objects matching the mask in the same way it handles any other requested objects. In other words, the objects are cached but not pinned.

    Administrators often use this pin type in combination with a broad URL mask that has a bypass pin type. This allows them to insulate specific objects from the effects of the bypass rule.

    For example, you could specify a URL mask of /*.jpg with a pin type of bypass and a second URL mask of www.foo.gov/graphics/* with a pin type of normal. This causes all files, including .jpg files, in the graphics directory on the foo.gov Web site to be cached as requested. Assuming there are no other URL masks in the pin list, all other JPG graphics are not cached because of the /*.jpg mask.

  • Bypass: The Access Gateway does not cache the objects. In other words, you can use this option to prevent objects from being cached.

4.3.5 Configuring a Purge List

The purge list is global to the Access Gateway and affects all protected resources. This option allows you to specify URL patterns or masks for the pages and sites whose objects you want to purge from cache.

When you specify the URL mask, do not specify a port. Ports are not stored in the cache file that is used to match the URLs that should be purged.

When defining the masks, keep in mind that the Access Gateway interprets everything in the URL mask between the asterisk wildcard (*) and the following delimiter as a wildcard. Delimiters include the forward slash (/), the period (.), and the colon (:) characters. For example:

URL Mask

Effects

/*.pdf

Causes all PDF files to be purged from cache.

www.foo.gov/contracts/*

Causes all objects in the contracts directory and beyond to be purged from cache.

This option also allows you to purge cached objects whose URL contains a specified query string or cookie. This mask is defined by placing a question mark (?) at the start of the mask followed by text strings and wildcards as necessary. String comparisons are not case sensitive. For example, ?*=SPORTS purges all objects with the text =SPORTS or any other combination of uppercase and lowercase letters for =SPORTS following the question mark in the URL.

IMPORTANT:If you also configure a pin list, carefully select the objects that you add to the pin and purge lists. Make sure you don’t configure a pin list that adds objects to the cache and a purge list that removes the same objects.

  1. In the Administration Console, click Devices > Access Gateways > Edit > Purge List.

  2. Click New, enter a URL pattern, then click OK.

  3. (Optional) Repeat Step 2 to add additional URL patterns.

  4. To save your changes to browser cache, click OK.

  5. To apply the changes, click the Access Gateways link, then click Update > OK.

4.3.6 Purging Cached Content

You can select to purge the content of the purge list or all content cached on the server.

  1. In the Administration Console, click Devices > Access Gateways.

  2. Select the name of the server, then click Actions.

  3. Select one of the following actions:

    Purge List Now: Click this action to cause all objects in the current purge list to be purged from the cache.

    Purge All Cache: Click this action to purge the server cache. All cached content, including items cached by the pin list, is purged.

  4. Click either OK or Cancel.

When you make certain configuration changes such as updating or changing certificates, changing the IP addresses of Web servers, or modifying the rewriter configuration, you are prompted to purge the cache. The cached objects must be updated for users to see the effects of such configuration changes. If your Access Gateways are in a cluster, you need to manage the purge process so your site remains accessible to your users. You should apply the configuration changes to one member of a cluster. When its status returns to healthy and current, issue the command to purge its cache. Then apply the changes to the next cluster member.

IMPORTANT:Do not issue a purge cache command when an Access Gateway has a pending configuration change. Wait until the configuration change completes.

4.3.7 Apache htcacheclean Tool

If you have caching issues with inodes, disk space, and cache corruption in the Access Gateway, use Apache htcacheclean tool which is used to keep the size of mod_disk_cache's storage within a certain limit. This tool can run either manually or in daemon mode. When running in daemon mode, it sleeps in the background and checks the cache directories at regular intervals for cached content to be removed.

The htcacheclean utility tool is located at:

On Linux: /opt/novell/apache2/sbin

The default cache location is:

On Linux: /var/cache/novell-apache2

Example: To clear 1024 MBytes of cache, run the following command:

On Linux: ./htcacheclean -v -t -p/var/cache/novell-apache2 -l1024M

For more information, see Apache htcacheclean tool.