16.4 Object Pinning

This section contains the following topics:

16.4.1 The Pin List

The pin list contains URL patterns for identifying objects on the Web. You configure each URL pattern in the list with specific handling instructions as explained in the following sections.

Pinned objects remain in the cache indefinitely unless it fills up. This ensures that the lists are available from cache and are not bumped out by more recently requested objects.

URL Mask

The URL mask can contain complete or partial URL patterns. A single URL mask might apply to a large set of URLs, or it might be so specific that only a single file on the Web matches it. For more information, see Pin List Examples.

The appliance processes the masks in the pin list in order of specificity. A mask containing a host name is more specific than a mask that specifies only a file type. The action taken for an object is the action specified for the first mask that the object matches. For more information, see Processing URL Masks.

If the mask contains an asterisk, only the pin type can be specified. The Pin Links, Pin Images, and Refresh Frequency/Time options are not available for URLs containing this wildcard. Objects matching a mask with an asterisk are not automatically downloaded, but are pinned in cache only as individually requested.

Pin Type

The pin type specifies whether and how the appliance caches objects that match the URL mask.

  • Normal: iChain Proxy Services handles objects matching the mask in the same way it handles any other requested objects. In other words, objects are cached but not pinned.

    Administrators often use this pin type in combination with a broad URL mask that has a bypass pin type. This allows them to insulate specific objects from the effects of the bypass rule.

    For example, you could specify a URL mask of /*.jpg with a pin type of bypass and a second URL mask of www.foo.gov/graphics/* with a pin type of normal. This causes all files, including .jpg files, in the graphics directory on the foo.gov Web site to be cached as requested. They are not, however, pinned in cache because of the normal pin type. Assuming there are no other URL masks in the pin list, all other JPG graphics are not cached because of the /*.jpg mask.

  • Cache: iChain Proxy Services keeps the pinned objects in cache as long as possible, although they might be written to the appliance's hard disk.

  • Memory: iChain Proxy Services keeps the pinned objects in memory as long as possible, writes them to disk when memory gets too full, and places them back in memory as soon as they are requested by a user of the cache.

  • Bypass: iChain Proxy Services does not cache the objects. In other words, you can use this option to prevent objects from being cached.

    For more information about the cache bypass list, see TID 10097536.

Pin Links

This specifies how many link levels iChain Proxy Services will follow for the pin type rule you've established. Selecting levels 1 or 2 causes all linked objects, including the images on the host, to be downloaded and cached when the pin list is applied to the appliance configuration, and then to be periodically refreshed as specified.

For example, if the requested object is an HTML page and you have specified a pin links level of 1, the HTML page is downloaded and cached when the pin list is applied along with all the items linked from the page. These cached objects are also refreshed at the frequency and time specified.

To use levels 1 or 2 you must specify an absolute address, including the scheme, host, and path for the URL mask, for example, http://www.foo.gov/documents/. The tool lets you insert masks that do not meet this requirement, but the entries are removed when you click Apply.

Attempting to include an asterisk wildcard immediately hides this option.

Pin Images

This option is used to pin image files that reside on a different host than the page requested. It works in conjunction with the Pin Links option, which specifies how many levels of links iChain Proxy Services will follow when downloading a page.

For example, if the requested HTML page uses images that reside on another host and you have selected this option, the HTML page is cached along with all the image files associated with the page, including those on the other host. If you have also specified a pin link level, images on the linked pages that reside on another host are also pinned.

On the other hand, if the Pin Images option is not checked, iChain Proxy Services only pins the images that reside on the same host as the requested page.

Refresh Frequency/Time

This lets you specify a refresh frequency and time for the URL that is different from the default values shown above the pin list.

Processing URL Masks

There are four basic types of URL masks you can enter in the pin list. The following table lists each type, provides a few examples of each, and provides information on how they are processed by iChain Proxy Services.

Type

URL Mask Examples by Specificity

Notes

Hostname

http://www.foo.gov/documents/picture.gif

http://www.foo.gov/documents/

http://www.foo.gov

foo.gov/documents/

foo.gov/

*.foo.gov/

Although these entries can include the protocol or scheme, the DNS name, the path, and the filename, only the DNS or hostname must be present in the mask. All DNS label portions must be indicated, if only by an asterisk wildcard.

iChain Proxy Services processes hostname entries before it processes other mask types. It also processes the most specific URL mask entries first.

When an object match occurs, iChain Proxy Services applies the pin type rule, and processing of the object is finished.

For example, if the first URL mask in the examples column has a pin type rule of bypass, picture.gif is not cached regardless of the pin type rules for the other URL masks.

Hostname entries can have a dramatic impact on object pinning and cache bypassing.

For example, if the first two URL masks in the examples column were not present, a pin type of Bypass on the third URL mask would prevent caching of all objects delivered through HTTP on the www.foo.gov Web site.

If no scheme (HTTP, FTP, etc.) is indicated, the mask applies to all schemes. The last three masks would apply to objects delivered through any Web protocol.

Finally, Configure interprets hostnames literally. For example, the sixth entry would cover www.foo.gov, ww1.foo.gov, army.foo.gov, etc., but the fourth and fifth entries would not, because a scheme is assumed to immediately precede the hostname.

Path

/documents/picture.gif

/documents/picture.gif/

/documents/

iChain Proxy Services processes path entries after all hostname entries have been considered. It assumes that the first forward slash immediately follows a hostname.

A leading forward slash must always be used when specifying a directory. The leading slash always references the root directory of the Web server.

For example, the first entry applies only to a graphics file named picture.gif that is located in a documents directory at the root of the host.

The forward slash in the second entry causes iChain Proxy Services to assume that picture.gif is a directory. The pin type rules associated with this entry would apply to any matched objects that have a URL directory path that starts with a documents directory followed by a subdirectory named picture.gif.

The third entry applies to any matched objects that contain a documents directory at the Web server's specified root directory.

Filename

/picture.gif

/widget.js

/default.htm

After the path entries have all been processed, iChain Proxy Services looks for specific filenames.

A leading forward slash must be used and, as opposed to a path-based mask, does not reference the root directory of the Web server.

For example, if requested files named picture.gif, widget.js, and default.htm have not been covered by one of the hostname or path entries above, the files have the pin type rule for their respective filename mask applied to them.

If the first entry carries a pin type rule of Bypass, all picture.gif files that didn't match previously processed hostname or path masks are not cached.

File Extension

/*.gif

/*.js

/*.htm

File extension entries are processed last.

These are simply filename entries with the root of the filename replaced by an asterisk, which makes them less specific that complete filenames.

A leading forward slash must be used and, as opposed to a path-based mask, does not reference the root directory of the Web server.

For example, If the examples shown all had pin types of Bypass, then only those .gif, .js, and .htm files that had been cached and pinned because of hostname, path, or filename masks would be stored in cache. All other files with the named extensions would not be cached.

Wildcards in Pin Lists

Only the asterisk (*) wildcard is allowed in pin list entries.

iChain Proxy Services interprets everything between an asterisk and the next delimiter to the right (a forward slash [/], a period[.], or a colon [:]) as a wildcard. This effectively allows only one asterisk between delimiters.

Pin List Examples

The following table provides brief examples of sample pin list entries and their effects on appliance caching.

URL Mask

Pin Type

Pin Links

Pin Images

Effect on Cache

http://www.foo.gov/documents/

cache

1

Yes

As a general rule, you should always include fully qualified DNS or hostnames in the pin list. iChain Proxy Services resolves these more quickly than other masks, and you will be able to track the effects on pinning more easily.

For this URL mask, iChain Proxy Services downloads, caches, and pins all objects whose URL starts with the mask. In other words, all objects below the documents directory are downloaded, cached, and pinned. Also, all objects that are linked from one of the pinned objects are downloaded, cached, and pinned. And finally, images that reside on other hosts are downloaded, cached, and pinned.

Objects are refreshed according to the refresh settings (default or specific) as specified in the pin list entry.

www.foo.gov/groups.html

cache

1

No

iChain Proxy Services downloads, caches, and pins objects (including images) in the groups.html page and in pages linked from that page. Any images referenced from other hosts, however, are not included.

www.foo.gov/groups.html/

normal

1

Yes

iChain Proxy Services downloads and caches objects in the subdirectory named groups.html and in pages linked from any of those objects.

The forward slash at the end of the path tells iChain Proxy Services that this is a directory rather than a file.

Objects are cached but not pinned in cache, meaning they might be bumped by more frequently accessed objects or objects that are pinned.

Images linked from other hosts are downloaded and cached.

www.foo.*

bypass

n/a

n/a

iChain Proxy Services doesn't cache objects from any URLs whose DNS names begin with www.foo.

All domain extensions (.com, .net, .org, etc.) are covered by the asterisk wildcard.

Link and image pinning is not available for bypass pin types.

If this entry appeared in a pin list with either of the previous two entries, it does not prevent caching of objects covered by them because it is less specific than they are.

w*.f*.com

bypass

n/a

n/a

iChain Proxy Services doesn't cache objects for any URLs whose first domain label begins with w and second domain label begins with f, providing the domain extension is .com.

This mask doesn't prevent caching of objects on other domains such as .net, .gov, etc.

w*.f*.*

bypass

n/a

n/a

This mask functions like the previous entry, but the domain is not limited to .com.

*.foo.*

cache

n/a

n/a

This causes all objects on any Web server whose second domain label is foo to be pinned in cache.

Link and image pinning are not available because the mask contains asterisks.

This mask does not cover DNS names that don't have a domain label before foo. For example, foo.gov would not normally be covered. However, if foo.gov happens to resolve in DNS to the same IP address as www.foo.gov, the iChain Proxy Server applies the pinning rules specified for www.foo.gov to foo.gov. To understand more about IP addresses and URL masks, see Section 16.5, Using the Proxy Server to Record IP Addresses When Resolving URL Masks.