4.3 Access Gateway Content Settings

One of the major benefits of using an Access Gateway to protect Web resources is that it can cache the requested information and send it directly to the client browser rather than contacting the origin Web resource and waiting for the requested information to be sent. This can significantly accelerate access to the information.

IMPORTANT:For caching to work correctly, the Web servers must be configured to maintain a valid time. If possible, they must be configured to use an NTP server.

The object cache on an Access Gateway is quite different from a browser’s cache, which all users access when they click the Back button and which can serve stale content that does not’ accurately reflect the fresh content on the origin Web server.

Access Gateway caching system uses a number of methods to ensure cache freshness. Most time-sensitive Web content is flagged by Webmasters in such a way that it cannot become stale unless a caching system ignores the Webmaster’s settings. Access Gateway honors all RFC 2616 directives that affect cache freshness such as Cache-Control, If-Modified-Since, and Expires.

Access Gateway can be fine-tuned for cache freshness in the following ways:

  • Accelerated checking of objects that have longer than desirable Time to Expire headers

  • Delayed checking of objects that have shorter than desirable Time to Expire headers

  • Checking for freshness of objects that do not include Time to Expire headers

Access Gateway follow RFC directives. In addition, Access Gateway Service uses the “Apache Module mod_file_cache”.

The following sections describe the features available to fine-tune this process for your network:

4.3.1 Configuring Caching Options

The Cache Options allow you to control how Access Gateway caches objects.

  1. Click Access Gateways > Edit > Cache Options.

  2. To disable caching of all Web server content, select Disable Caching.

    When this option is selected, all other caching options are disabled.

  3. Modify the Cache Freshness settings. Use the Reset button to return these settings to their default values.

    These options govern when the proxy service revalidates requested cached objects against those on their respective origin Web servers. If the objects have changed, the proxy service re-caches them.

    WARNING:Enter whole number values. Decimal values (2.5) are not supported and generate an XML validation error.

    HTTP Maximum: Specifies the maximum time the proxy service serves HTTP data from cache before revalidating it against content on the origin Web server. No object is served from cache after this value expires without being revalidated.

    This overrides a freshness or Time to Expire directive specified by the Webmasters if they specified a longer time.

    You use this value to reduce the maximum time the proxy service waits before checking whether requested objects need to be refreshed. The default is 6 hours.

    HTTP Default: Specifies the maximum time the proxy service serves HTTP data for which Webmasters have not specified a freshness or Time to Expire directive. The default is 2 hours.

  4. To save your changes to browser cache, click OK.

  5. To apply the changes, click the Access Gateways link, then click Update > OK.

4.3.2 Controlling Browser Caching

Webmasters control how browsers cache information by adding the following cache-control directives to the HTTP headers:

Cache-Control: no-store
Cache-Control: no-cache
Cache-Control: private
Cache-Control: public
Pragma: no-cache

You can configure how the proxy service responds to these directives in the HTTP header.

  1. Click Devices > Access Gateways > Edit > [Name of Reverse Proxy] > [Name of Proxy Service] > HTTP Options.

  2. To mark all pages coming through this host as cacheable on the browser, select Allow Pages to be Cached by the Browser.

    When this option is enabled, the no-cache and no-store headers are not injected into the HTTP header.

    You need to select this option if you have a back-end application that updates the data in the Last-Modified or ETag HTTP headers. These changes are forwarded from the Web server to the browser only when this option is enabled.

    You need to select this option if you want the Expires HTTP header forwarded from the Web server to the browser.

    If this option is not selected, all pages are marked as non-cacheable on the browser. This forces the browser to request a resend of the data from Access Gateway when a user returns to a previously viewed page.

  3. Click OK.

  4. To apply the changes, click the Access Gateways link, then click Update > OK.

4.3.3 Configuring a Pin List

A pin list contains URL patterns for identifying objects on the Web. Access Gateway uses the list to prepopulate the cache, before any requests have come in for the content. This accelerates user access to the content because it is retrieved from a local cache rather than from an exchange with the Web server, which would read it from disk.

You can use the pin list to specify the following:

  • Which objects you want to cache

  • Which objects you never want cached

The pin list is global to Access Gateway and affects all protected resources. The objects remain in cache until their normal cache limits are reached or they are bumped out by more recently requested objects.

To configure a pin list:

  1. Click Devices > Access Gateways > Edit > Pin List.

  2. Select the Enable Pin List option to enable the use of pinned objects. If this option is not selected, the pinned objects in the pin list are not used.

  3. In the Pin List section, click New.

  4. Fill in the following fields.

    URL Mask: Specifies the URL pattern to match. For more information, see URL Mask.

    Pin Type: Specifies how the URL is to be used to cache objects. Select from Normal and Bypass. For more information, see Pin Type.

  5. To save the list item, click OK.

  6. To save your changes to browser cache, click OK.

  7. To apply the changes, click the Access Gateways link, then click Update > OK.

URL Mask

The URL mask can contain complete or partial URL patterns. A single URL mask might apply to a large set of URLs, or it might be so specific that only a single file on the Web matches it.

Access Gateway processes the masks in the pin list in order of specificity. A mask containing a hostname is more specific than a mask that specifies only a file type. The action taken for an object is the action specified for the first mask that the object matches.

Access Gateways recognizes four levels of specificity, using the following format:

Level

Examples

hostname

http://www.foo.gov/documents/picture.gif
http://www.foo.gov/documents/*
http://www.foo.gov
foo.gov/documents/*
foo.gov/*

All of these are classified as hostnames, and they are ordered by specificity. The first item in the list is considered the most specific and is processed first. The last item is the most general and is processed last.

path

/documents/picture.gif
/documents/pictures.gif/*
/documents/*

Path entries are processed after hostnames. A leading forward slash must always be used when specifying a path, and the entry that follows must always reference the root directory of the Web server. In these examples, documents is the root directory.

The /* at the end of the path indicates that the entry is a directory. Its absence indicates that the entry is a file. In these examples, picture.gif is a file and pictures.gif/* and documents/* are directories.

If you enter a path without the trailing *, the path matches only the directory. With the trailing *, the path matches everything in the directory and its subdirectories.

These path entry examples are ordered by specificity. The objects in the /documents/picture.gifdirectory are processed before the objects in the /documents directory.

filename

/picture.gif
/widget.js
/widget.jp*g
/picture*group.gif
/DailyTask
/DailyTask*

Filenames are processed after paths. A leading forward slash must always be used when specifying a filename.

You can add asterisks in the file names.

file extension

/*.gif
/*.js
/*.htm

File extensions are processed last. They consist of a leading forward slash, an asterisk, a period, and a file extension.

NOTE:More than one wildcard is not allowed in a URL mask. For example, /*picture.g*f is not correct.

Also, the wildcard must be only in the last part of the path. For example:

Correct: /picture/*.gif

Incorrect: /documents/*/picture.gif

Specific rules have precedence over less specific rules. Thus, objects matched by a more specific rule are always processed according to its conditions. If a less specific rule also matches the object, the less specific rule is ignored for the object. For example, assume the following two entries are in the pin list:

URL Mask

Pin Type

Pin Links

http://www.foo.gov/documents/*

normal

1

www.foo*

bypass

N/A

The first entry, because it is most specific, caches the pages in the documents directory and follows any links on those pages and caches the linked pages. The second entry does not affect what the first entry caches, but it prevents any other domain extensions such as .com,.net, or .org whose DNS names begin with www.foo from being cached.

Pin Type

The pin type specifies how Access Gateway caches objects that match the URL mask.

  • Normal: Access Gateway handles objects matching the mask in the same way it handles any other requested objects. In other words, the objects are cached but not pinned.

    Administrators often use this pin type in combination with a broad URL mask that has a bypass pin type. This allows them to insulate specific objects from the effects of the bypass rule.

    For example, you could specify a URL mask of /*.jpg with a pin type of bypass and a second URL mask of www.foo.gov/graphics/* with a pin type of normal. This causes all files, including .jpg files, in the graphics directory on the foo.gov Web site to be cached as requested. Assuming there are no other URL masks in the pin list, all other JPG graphics are not cached because of the /*.jpg mask.

  • Bypass: Access Gateway does not cache the objects. In other words, you can use this option to prevent objects from being cached.

4.3.4 Configuring a Purge List

The purge list is global to Access Gateway and affects all protected resources. This option allows you to specify URL patterns or masks for the pages and sites whose objects you want to purge from cache.

When you specify the URL mask, do not specify a port. Ports are not stored in the cache file that is used to match the URLs that must be purged.

When defining the masks, keep in mind that Access Gateway interprets everything in the URL mask between the asterisk wildcard (*) and the following delimiter as a wildcard. Delimiters include the forward slash (/), the period (.), and the colon (:) characters. For example:

URL Mask

Effects

/*.pdf

Causes all PDF files to be purged from cache.

www.foo.gov/contracts/*

Causes all objects in the contracts directory and beyond to be purged from cache.

This option also allows you to purge cached objects whose URL contains a specified query string or cookie. This mask is defined by placing a question mark (?) at the start of the mask followed by text strings and wildcards as necessary. String comparisons are not case sensitive. For example, ?*=SPORTS purges all objects with the text =SPORTS or any other combination of uppercase and lowercase letters for =SPORTS following the question mark in the URL.

IMPORTANT:If you also configure a pin list, carefully select the objects that you add to the pin and purge lists. Make sure you don’t configure a pin list that adds objects to the cache and a purge list that removes the same objects.

  1. Click Devices > Access Gateways > Edit > Purge List.

  2. Click New, enter a URL pattern, then click OK.

  3. (Optional) Repeat Step 2 to add additional URL patterns.

  4. To save your changes to browser cache, click OK.

  5. To apply the changes, click the Access Gateways link, then click Update > OK.

4.3.5 Purging Cached Content

You can select to purge the content of the purge list or all content cached on the server.

  1. Click Devices > Access Gateways.

  2. Select the name of the server, then click Actions.

  3. Select one of the following actions:

    Purge List Now: Click this action to cause all objects in the current purge list to be purged from the cache.

    Purge All Cache: Click this action to purge the server cache. All cached content, including items cached by the pin list, is purged.

  4. Click either OK or Cancel.

When you make certain configuration changes such as updating or changing certificates, changing the IP addresses of Web servers, or modifying the rewriter configuration, you are prompted to purge the cache. The cached objects must be updated for users to see the effects of such configuration changes. If your Access Gateways are in a cluster, you need to manage the purge process so your site remains accessible to your users. You must apply the configuration changes to one member of a cluster. When its status returns to healthy and current, issue the command to purge its cache. Then apply the changes to the next cluster member.

IMPORTANT:Do not issue a purge cache command when an Access Gateway has a pending configuration change. Wait until the configuration change completes.

4.3.6 Apache htcacheclean Tool

If you have caching issues with inodes, disk space, and cache corruption in Access Gateway, use Apache htcacheclean tool which is used to keep the size of mod_disk_cache's storage within a certain limit. This tool can run either manually or in daemon mode. When running in daemon mode, it sleeps in the background and checks the cache directories at regular intervals for cached content to be removed.

The htcacheclean utility tool is located at:

Linux: /opt/novell/apache2/sbin

The default cache location is:

Linux: /var/cache/novell-apache2

Example: To clear 1024 MBytes of cache, run the following command:

Linux: ./htcacheclean -v -t -p/var/cache/novell-apache2 -l1024M

For more information about this tool, see htcacheclean - Clean up the disk cache.