7.6 Troubleshooting Crashes and Hangs

7.6.1 Enable the Access Gateway Monitor Service for the Core Dump Logic to Work Correctly

When the Access Gateway monitor service is running in the Access Gateway appliance, it monitors the available disk space in the root partition (/) before dumping core. When the disk space goes below 3GB, the Access Gateway prevents dumping core files.

In the Novell Access Manager 3.1 SP2 release, monitor service is disabled by default. To enable this, execute the following command as:

/etc/init.d/lagmonitor start

For more information about this service, see Section 7.1.2, Using the Linux Access Gateway Monitor Service.

7.6.2 The Access Gateway Hangs When the Audit Server Comes Back Online

When the Platform Agent loses its connection to the audit server, it enters caching mode. The default size of the audit cache file is unlimited. This means that if the connection is broken for a long time and traffic is high, the cache file can become quite large. When the connection to the audit server is re-established, the Platform Agent becomes very busy while it tries to upload the cached events to the audit server and still process new events. When coming out of caching mode, the Platform Agent appears unresponsive because it is so busy and because it holds application threads that are logging new events for a long period of time. If it holds too many threads, the system can appear to hang. You can minimize the effects of this scenario by configuring two parameters in the logevent file.

Table 7-4 Parameters for the logevent File

Parameter

Description

LogMaxCacheSize

Sets a limit to the amount of cache the Platform Agent can consume to log events when the audit server is unreachable. The default is Unlimited.

LogCacheLimitAction

Specifies what the Platform Agent should do with incoming events when the maximum cache size limit is reached. You can select one of the following actions:

  • Delete the current cache file and start logging events in a new cache file.

  • Stop logging, which preserves all entries in the cache and stops the collecting of new events.

When you set a finite cache file size, it limits the number of events that must be uploaded to the audit server when caching mode is terminated and keeps the Platform Agent responsive to new audit events that are registered.

For more information about the logevent file and these parameters, see Logevent.

7.6.3 The Access Gateway Crashes When Log Files Are Removed

If you have enabled the debug level of logging for the laghttpheaders and the lagsoapmessages log files and these files grow to be over 200 MB, manually deleting these files can cause the Access Gateway to crash.

To solve the problem, restart the Access Gateway after manually deleting the files.

7.6.4 Troubleshooting a Failed Access Gateway Configuration

If the IP address and other network configurations are not reflected in the installed Access Gateway, log in as a root user and run the following commands:

rm /opt/novell/legacy/etc/proxy/.novell_lag_lock
/etc/init.d/novell-vmc stop
/etc/init.d/novell-vmc start 

7.6.5 Troubleshooting an Access Gateway Crash

The Access Gateway might crash because of the following reasons:

  • SIGSEGV

  • ASSERT (for a debug build only)

The following sections explain how to gather the files that need to be sent to Novell for a resolution of the problem.

Access Gateway Logs

  1. Enter the following command from the bash shell to collect the debug log files that are generated:

    /chroot/lag/opt/novell/bin/getlaglogs.sh
    
  2. The laglogs.tgz tar file is located in the /var/log directory.

  3. Send this tar file to Novell Support.

Event Log

By default the event log size is 15 MB. The size of event log can be controlled by configuring the required event log size in the eventlogsize.cfg file, located at the /chroot/lag/etc/opt/novell directory. For example, if you specify 350 in the file, the Access Gateway can have an event log of 350 MB. This file should contain only the file size information, without any other characters or new lines.

The procedure for obtaining the event log depends upon the build type:

Event Log for a Production Build

To get the event log for the production build:

  1. Log in as the root user.

  2. To disconnect all instances of Access Gateway, enter the following command:

    /etc/init.d/novell-vmc stop

  3. Enter the following command to change the root environment:

    chroot /chroot/lag

  4. To start the process, enter the following command:

    gdb /opt/novell/bin/ics_dyn 2>/var/log/ics_dyn.log

  5. At the GDB prompt, run the following command:

    run -m <memory>

    Replace <memory> with the percentage of total memory to be used for the ics_dyn process. You should set this value between 20 to 30 per cent.

  6. Repeat the scenarios to reproduce the issue:

    1. If you are trying to reproduce the proxy crash, you see the GDB prompt as soon as the crash is reproduced.

    2. If you are trying to reproduce a functionality issue, press Crtl+C to enter the GDB prompt as soon as the issue is reproduced.

      For a list of commands that can be entered in the debugger, see Useful Debugger Commands.

  7. To save event logs to a file, enter the following command:

    d ,save 1
    

    This stores all the events in the /chroot/lag/opt/novell/debug/<pid>all_events.0.txt file.

  8. Tar or zip this file and send it to Novell Support.

Event Log for a Debug Build

To get the event log:

  1. Log in as the root user.

  2. To stop all instances of Access Gateway, enter the following command:

    /etc/init.d/novell-vmc stop

  3. To start the Novell Access Gateway in debugging mode, enter the following command:

    /etc/init.d/novell-vmc gdb

  4. To run the Access Gateway process, enter the following command at the GDB prompt:

    run -m <memory> 2>/var/log/ics_dyn.log

    Replace <memory> with the percentage of total memory to be used for the ics_dyn process. You should set this value between 20 to 30 per cent.

  5. Repeat the scenarios to reproduce the issue.

    1. If you are trying to reproduce the proxy crash, you will enter the GDB prompt as soon as the crash is reproduced.

    2. If you are trying to reproduce a functionality issue, enter the following command to enter the GDB prompt as soon as the issue is reproduced:

      Crtl+C

      For a list of commands that can be entered in the debugger, see Useful Debugger Commands.

  6. To save all event logs to a file, enter the following command:

    d ,save 1
    

    This stores all the events in the /chroot/lag-debug/opt/novell/debug/<pid>all_events.0.txt file.

  7. Tar or zip this file and send it to Novell Support.

Useful Debugger Commands

Table 7-5 GDB Commands

Command

Function

gcore

Generate core file

k

Kill process

q

Quit GDB prompt

bt

Print the back trace

Core Dump

Before you begin, make sure there is free space in root to hold the core file and that the space is at least equal to the RAM size

To collect a core dump:

  1. Log in as the root user.

  2. To disconnect all instances of the Access Gateway, enter the following command:

    /etc/init.d/novell-vmc stop

  3. At the bash prompt, specify the following command:

    touch /tmp/.dumpcore

  4. Enter the following command to start the Access Gateway:

    /etc/init.d/novell-vmc start

  5. Repeat the scenarios to reproduce the issue.

    The core is dumped to the /chroot/lag core.<pid> file.

    <pid> is the process ID of the ics_dyn process.

    After the core is dumped, the Access Gateway restarts.

  6. Tar or zip the core dump and send it to Novell Support.

Proxy Hang Core

To analyze the proxy hang and create a core file:

  1. Enter the following command to change the root environment:

    chroot /chroot/lag

  2. Enter the following command to attach the ics_dyn process to the debugger:

    gdb /opt/novell/bin/ics_dyn <pid>

    Replace <pid> with the process ID of the ics_dyn process. You can get the process ID by entering the following command:

    pgrep ics_dyn

  3. At the GDB prompt, enter the following command:

    set logging on <filename>

    Replace <filename> with the name of the file that will store the output of the executed debugger commands.

  4. Enter the following command to collect a stack trace of all threads:

    thread apply all bt

  5. Enter the following command to turn off logging:

    set logging off

  6. Enter the following command to save the core dump in the /chroot/lag directory.

    gcore

    The core dump is saved as core.<pid>.

  7. Tar or zip this file and send it to Novell Support.

Packet Capture

The tcpdump utility allows you to capture network trace packets.

  1. Log in as the root user.

  2. Enter the following command:

    tcpdump -s0 -n -t -p -i ‘any’ -w filename.cap

  3. Tar or zip this file and send it to Novell Support.

7.6.6 Access Gateway Not Responding

  1. Enter the following command to change to the root environment:

    chroot /chroot/lag

  2. Enter the following command to attach the ics_dyn process to the debugger:

    gdb /opt/novell/bin/ics_dyn <pid>

    Replace <pid> with the process ID of the ics_dyn process. You can get the process ID by entering the following command:

    pgrep ics_dyn

  3. At the GDB prompt, enter the following command:

    set logging file <filename>

    Where <filename> specifies the name of the file that will store the output of the executed debugger commands.

  4. Enter the following command to start logging:

    set logging on

  5. Enter the following command to collect a stack trace of all threads:

    thread apply all bt full

  6. Enter the following command to turn off logging:

    set logging off

  7. Enter the following command to save the core dump in the /chroot/lag directory.

    gcore

    The core dump is saved as core.<pid>.

  8. Tar or zip this file and send it to Novell Support.

7.6.7 Access Gateway Dumps Core After 10 Minutes When Non-Redirected Login Is Enabled

In a clustered Novell Access Manager deployment setup, if non-redirected login is enabled, equal load balancing across the Identity Servers might not happen. This might result in Access Gateway dumping core after approximately 10 minutes.

This happens because browsers connect to the Identity Server through an L4 switch and not directly.

For example:

  • If you have four Access Gateways in a cluster, then the L4 switch in front of the Identity Server receives connections from four Access Gateways.

  • If you have configured the sticky bit for the L4 switch based on the client IP address, then all the connections from one Access Gateway go to one Identity Server. You must configure the sticky bit based on your requirement.

7.6.8 Linux Access Gateway Crashes When a Change Is Applied to the Server

Sometimes, after upgrading from 3.1 SP1 to 3.1 SP2, SLES 9 based Linux Access Gateway might crash when you apply changes the server. It happens because of an open issue in underlying SLES 9 operating system. To work around this issue, download and install the following patch file from SLES 9 channel:

glibc-2.3.3-98.111.i586.rpm (patch 12527 )

For more information on downloading and updating the patch, see Installing or Updating the Security Patches on the SLES 9 Linux Access Gateway Appliance in the NetIQ Access Manager 3.1 SP5 Installation Guide.