Interesting eDirectory Driver Startup Problem

geoffc

By: geoffc

December 8, 2009 4:19 pm

Reads: 306

Comments:0

Rating:0

Interesting eDirectory driver startup problem:

Novell Identity Manager has many drivers available for use. Some are interesting and complex systems like PeopleSoft or SAP HR. Others are more standard, like Active Directory. Some are very generic like the JDBC driver.

Intriguingly enough, the eDirectory driver, used to connect two eDirectory trees, is actually one of the more complex drivers.

If you were starting with Identity Manager from Novell, you might think that the easiest driver, and best one to start learning is the eDirectory driver. After all, that is one of Novell’s core product lines, so it must be easy. Well in this case, you would be wrong.

The eDirectory driver is not that hard, but it is quite different than most other drivers. In the case of other drivers there is eDirectory, where the engine is running, and a connected application. (Be that SAP, Active Directory, Lotus Notes/Domino or something else). When you are working with the eDirectory driver, there is another instance of eDirectory on the side, as the connected application.

Thus the driver is actually split in half, with half of it (the Publisher channel) residing in each tree that is involved. This can be very confusing, as both drivers look like full drivers, it just turns out all the work should be done on the Publisher channel in each tree. That is, all events should look like they are inbound to the Identity Vault tree.

Once you get past that bit of mental gymnastics, it is basically the same as other drivers, but you ignore the Subscriber channels for the most part.

The other difficulty is that you have to look at trace in two different places, because there are still two drivers involved.

If you are not aware of how to read DSTrace, then you should definitely stop now and read Fernando Frietas from Novell Tech Support’s article on the topic. It is the best guide to reading and understanding DSTrace in Identity Manager I have seen to date! Well done Fernando!

If the discussion of Publisher and Subscriber channels is not clear to you, then I highly recommend you read David Gersic’s very well done series on event flows in Identity Manager:

David does a truly excellent job of walking through what happens, where, when, and why. A must read for all people learning Novell Identity Manager!

I have been doing a series of articles on error messages that can occur in the various drivers I have worked with, and have done some for Active Directory:

I also did a series for the JDBC driver:

One for SAP HR (for now, more in the works):
Error Codes of the SAP HR driver for Identity Manager – Part 1

I also have one up with common eDirectory driver error codes:
Error Codes of the eDirectory Driver for Identity Manager – Part 1

This article is sort of a continuation of the error code series, with the exception that there really seems to be no error code to talk about. In this case, I had restarted eDirectory (ndsd process on SLES 10 SP2 Linux) and the driver would start, and immediately shut down.

This is a hard one to troubleshoot, since as you look through the trace output, you end up seeing no errors.

The trace of the driver start, and yes, it is a bit long, but since there is no specific error to cut out and highlight, it is the entire process that needs to be looked at to try and find the problem.

Usually the Subscriber channel starts up first, and once it is done, that process is used to start the Publisher thread, thus you can see that all the events are tagged with a DriverName ST: tag. Once the Publisher thread starts sending documents as part of the process, then the tag switches to DriverName PT:

[11/19/09 08:35:45.568]:From ACME-META ST:Creating publisher.
[11/19/09 08:35:45.568]:From ACME-META ST:Loading Publisher input transformation policies.
[11/19/09 08:35:45.569]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/pub-itp-EmailOnFailedPasswordSub#XmlData.
[11/19/09 08:35:45.571]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.573]:From ACME-META ST:Loading Publisher output transformation policies.
[11/19/09 08:35:45.573]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/sub-otp-EmailOnFailedPasswordPub#XmlData.
[11/19/09 08:35:45.575]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.576]:From ACME-META ST:Loading Publisher schema mapping policies.
[11/19/09 08:35:45.577]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/%5BAcme%7D+Special+mapping+for+SID#XmlData.
[11/19/09 08:35:45.578]:From ACME-META ST:Found schema map.
[11/19/09 08:35:45.578]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/%5BAcme%7D+Special+mapping+for+SID#XmlData.
[11/19/09 08:35:45.579]:From ACME-META ST:Found schema map.
[11/19/09 08:35:45.580]:From ACME-META ST:Loading Publisher event transformation policies.
[11/19/09 08:35:45.581]:From ACME-META ST:Policy not found.
[11/19/09 08:35:45.581]:From ACME-META ST:Loading Publisher object matching policies.
[11/19/09 08:35:45.582]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-mp-MatchingRule#XmlData.
[11/19/09 08:35:45.583]:From ACME-META ST:Global Configuration Value replacements made in vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-mp-MatchingRule#XmlData:
[11/19/09 08:35:45.584]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.584]:From ACME-META ST:Loading Publisher object creation policies.
[11/19/09 08:35:45.584]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/%5BACME%5D+Create+Rule#XmlData.
[11/19/09 08:35:45.586]:From ACME-META ST:Global Configuration Value replacements made in vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/%5BACME%5D+Create+Rule#XmlData:
[11/19/09 08:35:45.587]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.587]:From ACME-META ST:Loading Publisher object placement policies.
[11/19/09 08:35:45.588]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-pp-PlacementRule#XmlData.
[11/19/09 08:35:45.590]:From ACME-META ST:Global Configuration Value replacements made in vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-pp-PlacementRule#XmlData:
[11/19/09 08:35:45.590]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.592]:From ACME-META ST:Loading Publisher command transformation policies.
[11/19/09 08:35:45.592]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-ctp-CheckPasswordGCV#XmlData.
[11/19/09 08:35:45.594]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.595]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-ctp-PublishDistributionPassword#XmlData.
[11/19/09 08:35:45.596]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.597]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-ctp-PublishNDSPassword#XmlData.
[11/19/09 08:35:45.598]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.599]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-ctp-AddPasswordPayload#XmlData.
[11/19/09 08:35:45.600]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.601]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/pub-ctp-PasswordExpirationTime#XmlData.
[11/19/09 08:35:45.603]:From ACME-META ST:Found DirXMLScript policy.
[11/19/09 08:35:45.603]:From ACME-META ST:Reading XML attribute vnd.nds.stream://ACME-IDV/Acme/Drivers/IDM/From+ACME-META/Publisher/%5BAcme%5D+Pub-Command#XmlData.
[11/19/09 08:35:45.605]:From ACME-META ST:Found DirXMLScript policy.

You can see that it has successfully loaded all the policy objects, done some Global Configuration Variable text string replacements for Publisher channel objects. When you use the format ~GCVName~ then as the driver loads, the GCVName variable is replaced with the value stored in the GCV. (If it is set on the Driver set, then that is used. If there is the same GCV defined on the driver, then it over rides the value set on the driver set. Thus you could have a value set for all drivers in one place, and handle the exceptions only, where needed). You can see a funny error that can occur if you use the literal string ~GCVName~ even in the comments of a driver, and GCVName does not exist, the driver will not start. I wrote about that here: Discussing GCV’s in Comments field in Identity Manager

So far so good, this is all looking good. If you had referenced a GCV that did not exist, then the driver would have stopped with a fatal error at this point.

[11/19/09 08:35:45.606]:From ACME-META ST:Creating publisher thread.
[11/19/09 08:35:45.606]:From ACME-META ST:Publisher thread created.

Ok, so the publisher thread looks like it started ok. Since the two channels are handed as different threads, and each starts separately, and once they start sending documents there is a delay for responses, so it often looks close enough in time that they would appear interleaved, which makes reading it hard at times.

[11/19/09 08:35:45.610]:From ACME-META ST:Starting event loop.
[11/19/09 08:35:45.612]:From ACME-META ST:Received state change event.
[11/19/09 08:35:45.613]:From ACME-META ST:Transitioned from state '%+C%14CStopped%-C' to state '%+C%14CStarting%-C'.
[11/19/09 08:35:45.616]:From ACME-META ST:Successfully processed state change event.
[11/19/09 08:35:45.617]:From ACME-META ST:Submitting identification query to subscriber shim:
[11/19/09 08:35:45.617]:From ACME-META ST:
<nds dtdversion="3.5" ndsversion="8.x">
  <source>
    <product version="3.6.1.4427">DirXML</product>
    <contact>Novell, Inc.</contact>
  </source>
  <input>
    <query event-id="query-driver-ident" scope="entry">
      <search-class class-name="__driver_identification_class__"/>
      <read-attr/>
    </query>
  </input>
</nds>
[11/19/09 08:35:45.619]:From ACME-META ST:SubscriptionShim.execute() returned:
[11/19/09 08:35:45.620]:From ACME-META ST:
<nds dtdversion="3.5">
  <source>
    <product instance="From ACME-META" version="3.6.0.4294">DirXML Driver for eDirectory</product>
    <contact>Novell, Inc.</contact>
  </source>
  <output>
    <instance class-name="__driver_identification_class__">
      <attr attr-name="driver-id">
        <value type="string">EDIR</value>
      </attr>
      <attr attr-name="driver-version">
        <value type="string">3.6.0.4294</value>
      </attr>
      <attr attr-name="min-activation-version">
        <value type="int">4</value>
      </attr>
      <attr attr-name="query-ex-supported">
        <value type="state">true</value>
      </attr>
    </instance>
  </output>
</nds>

This so far is great news. The Subscriber channel sends its startup query that then correctly returns the name of the driver. EDIR. The version, 3.6.0.4294, the activation version of 4. Each version of Identity Manager and the various drivers needs an activation credential. In order to allow some upgrades to proceed without needing a new license, and to share driver shims across some Identity Manager version, (like the Active Directory driver shim, addriver.dll which can be used with Identity Manager 3.5 and 3.6 with the same activation credential) the shim specifies what the minimum activation credential it can use.

Query-ex is a pretty cool feature, that if supported makes a big performance difference. This allows you page query results, for example if you query for all objects in the tree, and wish to work with the results, you need to store all that data in memory, and it can eat up all the Java heap space. If you want to do it nonetheless, a query token, that specifies a maximum number of results (say 50), will load the results in pages of fifty, and you can loop through them in a For Each structure to process it. This way you only need enough Java heap to store fifty nodes at a time.

You can read more about query-ex and the types of tasks it is useful for, in this series of articles:

This and some other query events that are part of the driver startup process, are the kind of things that an indiscriminate Veto token in a rule will cancel and cause you all sorts of troubles. Some examples of Veto events that cause this sort of issue are in these articles:

But in this case, we got past it, and here comes the only hint that we have of the coming troubles.

[11/19/09 08:35:45.628]:From ACME-META ST:Received state change event.
[11/19/09 08:35:45.629]:From ACME-META ST:Transitioned from state '%+C%14CStarting%-C' to state '%+C%14CShutdown Pending%-C'.
[11/19/09 08:35:45.629]:From ACME-META ST:Successfully processed state change event.

The driver started, everything looked good, and yet gets shut down right at the end.

[11/19/09 08:35:45.630]:From ACME-META ST:Leaving event loop.
[11/19/09 08:35:45.630]:From ACME-META ST:Waiting for driver to fully initialize before shutting down...

You can see that the Subscriber shim is waiting for the Publisher thread to finish starting up, before shutting it down.

[11/19/09 08:35:45.632]:From ACME-META PT:Initializing publisher shim.
[11/19/09 08:35:45.633]:From ACME-META PT:
<nds dtdversion="3.5" ndsversion="8.x">
  <source>
    <product version="3.6.1.4427">DirXML</product>
    <contact>Novell, Inc.</contact>
  </source>
  <input>
    <init-params src-dn="\ACME-IDV\Acme\Drivers\IDM\From ACME-META">
      <authentication-info>
        <server>10.123.123.123:8198</server>
        <user>From ACME-META</user>
      </authentication-info>
      <driver-filter>
        <allow-class class-name="Group">
          <allow-attr attr-name="accessCardNumber"/>
          <allow-attr attr-name="ACL"/>
          <allow-attr attr-name="assistant"/>
          <allow-attr attr-name="assistantPhone"/>
          <allow-attr attr-name="businessCategory"/>
          <allow-attr attr-name="city"/>
          <allow-attr attr-name="CN"/>
          <allow-attr attr-name="co"/>
          <allow-attr attr-name="company"/>
          <allow-attr attr-name="costCenter"/>
          <allow-attr attr-name="costCenterDescription"/>
          <allow-attr attr-name="departmentNumber"/>
          <allow-attr attr-name="Description"/>
          <allow-attr attr-name="destinationIndicator"/>
          <allow-attr attr-name="directReports"/>
          <allow-attr attr-name="EMail Address"/>
          <allow-attr attr-name="employeeStatus"/>
          <allow-attr attr-name="employeeType"/>
          <allow-attr attr-name="Equivalent To Me"/>
          <allow-attr attr-name="Facsimile Telephone Number"/>
          <allow-attr attr-name="Full Name"/>
          <allow-attr attr-name="gecos"/>
          <allow-attr attr-name="Generational Qualifier"/>
          <allow-attr attr-name="gidNumber"/>
          <allow-attr attr-name="Given Name"/>
          <allow-attr attr-name="Group Membership"/>
          <allow-attr attr-name="Higher Privileges"/>
          <allow-attr attr-name="homeDirectory"/>
          <allow-attr attr-name="Initials"/>
          <allow-attr attr-name="instantMessagingID"/>
          <allow-attr attr-name="internationaliSDNNumber"/>
          <allow-attr attr-name="Internet EMail Address"/>
          <allow-attr attr-name="jackNumber"/>
          <allow-attr attr-name="jobCode"/>
          <allow-attr attr-name="L"/>
          <allow-attr attr-name="Language"/>
          <allow-attr attr-name="loginShell"/>
          <allow-attr attr-name="Mailbox ID"/>
          <allow-attr attr-name="Mailbox Location"/>
          <allow-attr attr-name="mailstop"/>
          <allow-attr attr-name="manager"/>
          <allow-attr attr-name="managerWorkforceID"/>
          <allow-attr attr-name="Member"/>
          <allow-attr attr-name="mobile"/>
          <allow-attr attr-name="NSCP:employeeNumber"/>
          <allow-attr attr-name="O"/>
          <allow-attr attr-name="otherPhoneNumber"/>
          <allow-attr attr-name="OU"/>
          <allow-attr attr-name="pager"/>
          <allow-attr attr-name="personalTitle"/>
          <allow-attr attr-name="photo"/>
          <allow-attr attr-name="Physical Delivery Office Name"/>
          <allow-attr attr-name="platformSetName"/>
          <allow-attr attr-name="Postal Address"/>
          <allow-attr attr-name="Postal Code"/>
          <allow-attr attr-name="Postal Office Box"/>
          <allow-attr attr-name="preferredDeliveryMethod"/>
          <allow-attr attr-name="preferredName"/>
          <allow-attr attr-name="Private Key" is-sensitive="true"/>
          <allow-attr attr-name="Public Key"/>
          <allow-attr attr-name="registeredAddress"/>
          <allow-attr attr-name="roomNumber"/>
          <allow-attr attr-name="S"/>
          <allow-attr attr-name="SA"/>
          <allow-attr attr-name="DirXML-ADAliasName"/>
          <allow-attr attr-name="Security Equals"/>
          <allow-attr attr-name="Security Flags"/>
          <allow-attr attr-name="See Also"/>
          <allow-attr attr-name="siteLocation"/>
          <allow-attr attr-name="Surname"/>
          <allow-attr attr-name="Telephone Number"/>
          <allow-attr attr-name="teletexTerminalIdentifier"/>
          <allow-attr attr-name="telexNumber"/>
          <allow-attr attr-name="Timezone"/>
          <allow-attr attr-name="Title"/>
          <allow-attr attr-name="tollFreePhoneNumber"/>
          <allow-attr attr-name="UID"/>
          <allow-attr attr-name="uidNumber"/>
          <allow-attr attr-name="uniqueID"/>
          <allow-attr attr-name="vehicleInformation"/>
          <allow-attr attr-name="DirXML-ADContext"/>
          <allow-attr attr-name="workforceID"/>
         <allow-attr attr-name="x121Address"/>
          <allow-attr attr-name="x500UniqueIdentifier"/>
        </allow-class>
        <allow-class class-name="User">
          <allow-attr attr-name="accessCardNumber"/>
          <allow-attr attr-name="assistant"/>
          <allow-attr attr-name="assistantPhone"/>
          <allow-attr attr-name="businessCategory"/>
          <allow-attr attr-name="city"/>
          <allow-attr attr-name="CN"/>
          <allow-attr attr-name="co"/>
          <allow-attr attr-name="company"/>
          <allow-attr attr-name="costCenter"/>
          <allow-attr attr-name="costCenterDescription"/>
          <allow-attr attr-name="departmentNumber"/>
          <allow-attr attr-name="Description"/>
          <allow-attr attr-name="destinationIndicator"/>
          <allow-attr attr-name="directReports"/>
          <allow-attr attr-name="EMail Address"/>
          <allow-attr attr-name="employeeStatus"/>
          <allow-attr attr-name="employeeType"/>
          <allow-attr attr-name="Equivalent To Me"/>
          <allow-attr attr-name="Facsimile Telephone Number"/>
          <allow-attr attr-name="Full Name"/>
          <allow-attr attr-name="gecos"/>
          <allow-attr attr-name="Generational Qualifier"/>
          <allow-attr attr-name="gidNumber"/>
          <allow-attr attr-name="Given Name"/>
          <allow-attr attr-name="Group Membership"/>
          <allow-attr attr-name="Higher Privileges"/>
          <allow-attr attr-name="Home Directory"/>
          <allow-attr attr-name="homeDirectory"/>
          <allow-attr attr-name="Initials"/>
          <allow-attr attr-name="instantMessagingID"/>
          <allow-attr attr-name="internationaliSDNNumber"/>
          <allow-attr attr-name="Internet EMail Address"/>
          <allow-attr attr-name="jackNumber"/>
          <allow-attr attr-name="jobCode"/>
          <allow-attr attr-name="L"/>
          <allow-attr attr-name="Language"/>
          <allow-attr attr-name="Last Login Time"/>
          <allow-attr attr-name="Locked By Intruder"/>
          <allow-attr attr-name="Login Disabled"/>
          <allow-attr attr-name="Login Grace Limit"/>
          <allow-attr attr-name="Login Grace Remaining"/>
          <allow-attr attr-name="Login Time"/>
          <allow-attr attr-name="loginShell"/>
          <allow-attr attr-name="Mailbox ID"/>
          <allow-attr attr-name="Mailbox Location"/>
          <allow-attr attr-name="mailstop"/>
          <allow-attr attr-name="manager"/>
          <allow-attr attr-name="managerWorkforceID"/>
          <allow-attr attr-name="memberUid"/>
          <allow-attr attr-name="mobile"/>
          <allow-attr attr-name="NGW: GroupWise ID"/>
          <allow-attr attr-name="NSCP:employeeNumber"/>
          <allow-attr attr-name="O"/>
          <allow-attr attr-name="otherPhoneNumber"/>
          <allow-attr attr-name="OU"/>
          <allow-attr attr-name="pager"/>
          <allow-attr attr-name="Password Allow Change"/>
          <allow-attr attr-name="Password Expiration Interval"/>
          <allow-attr attr-name="Password Expiration Time"/>
          <allow-attr attr-name="Password Minimum Length"/>
          <allow-attr attr-name="Password Required"/>
          <allow-attr attr-name="Password Unique Required"/>
          <allow-attr attr-name="Passwords Used" is-sensitive="true"/>
          <allow-attr attr-name="personalTitle"/>
          <allow-attr attr-name="photo"/>
          <allow-attr attr-name="Physical Delivery Office Name"/>
          <allow-attr attr-name="Postal Address"/>
          <allow-attr attr-name="Postal Code"/>
          <allow-attr attr-name="Postal Office Box"/>
          <allow-attr attr-name="preferredDeliveryMethod"/>
          <allow-attr attr-name="preferredName"/>
          <allow-attr attr-name="registeredAddress"/>
          <allow-attr attr-name="roomNumber"/>
          <allow-attr attr-name="S"/>
          <allow-attr attr-name="SA"/>
          <allow-attr attr-name="DirXML-ADAliasName"/>
          <allow-attr attr-name="Security Equals"/>
          <allow-attr attr-name="Security Flags"/>
          <allow-attr attr-name="See Also"/>
          <allow-attr attr-name="objectSID"/>
          <allow-attr attr-name="siteLocation"/>
          <allow-attr attr-name="Surname"/>
          <allow-attr attr-name="Telephone Number"/>
          <allow-attr attr-name="teletexTerminalIdentifier"/>
          <allow-attr attr-name="telexNumber"/>
          <allow-attr attr-name="Timezone"/>
          <allow-attr attr-name="Title"/>
          <allow-attr attr-name="tollFreePhoneNumber"/>
          <allow-attr attr-name="UID"/>
          <allow-attr attr-name="uidNumber"/>
          <allow-attr attr-name="uniqueID"/>
          <allow-attr attr-name="vehicleInformation"/>
          <allow-attr attr-name="DirectDial"/>
          <allow-attr attr-name="DirXML-ADContext"/>
          <allow-attr attr-name="workforceID"/>
          <allow-attr attr-name="x121Address"/>
          <allow-attr attr-name="x500UniqueIdentifier"/>
        </allow-class>
      </driver-filter>
      <publisher-options>
        <pub-heartbeat-interval display-name="Publisher heartbeat interval">1</pub-heartbeat-interval>
      </publisher-options>
    </init-params>
  </input>
</nds>
[11/19/09 08:35:45.690]:From ACME-META PT:: Connection parameters: port = 8198 KMO = 'From ACME-META'

You can see the Publisher channel connect and begin to start up. All is looking good, but alas, we cannot get any further, since the Subscriber channel has already decided to shutdown, it is all for nought.

You can see that it is connecting on a specific port and key material object (KMO) which should be what is set in the configuration of the driver, so you get a chance to confirm all is well here. Well that looks good too.

[11/19/09 08:35:45.691]:From ACME-META PT:PublicationShim.init() returned:
[11/19/09 08:35:45.691]:From ACME-META PT:
<nds dtdversion=”3.5″>
<source>
<product instance=”From ACME-META” version=”3.6.0.4294″>DirXML Driver for eDirectory</product>
<contact>Novell, Inc.</contact>
</source>
<output>
<status level=”success”/>
</output>
</nds>

Even though the Publisher thread looks like it has started, the shutdown in the Subscriber thread is going to cancel this success.

So what went wrong? Well we do not get a lot of hints. There is no error code to try and research and trouble shoot. That makes this a tough one.

The good news is, we have at least one minor hint. The Publisher thread seems to start ok, which means it is probably not an issue on the other side of the driver. It is probably a problem on this side, the Subscriber channel side.

For no real reason I can explain, I tried starting the driver via dxcmd, the cross platform Java command line applications (well it has menus as well, so not exactly command line) to try and restart the driver one more time.

You can read more about dxcmd in these articles:

Anyway, when I looked at the drivers status in dxcmd, it reported it as "Overflow cache".

Now this starts to make sense, and starts to be something we can troubleshoot. What had also happened around the same time, is that the disk ran out of space, since we have DSTrace running from a bunch of drivers, and really need to allocate more disk space for it. Our plan is to stand up a spare server in a VM, with a bunch more disk allocated to it, and mount this servers space into all our Identity Manager servers in the directory we are using for Dstrace logging.

This way, we could look at all the driver logs in one place, even though there are four or more engine servers involved running drivers.

But until we get there, we occasionally run into an issue where we fill the volume. Not the greatest situation, but tolerable. Well we missed it that time, and the disk filled, so we had to shut down ndsd and clear some space.

Looks like the TAO file, that holds the cached events in queue for each driver, that will be sent to the application (does that sound familiar? That should be, it describes the Subscriber channels functionality) some how got corrupted as the disk ran out of space. Well actually that makes a lot of sense, since every event that occurs in eDirectory that meets the requirements of the filter, get written to disk into the TAO file. If the disk ran out of space, it makes sense that a TAO file might get corrupted.

The fix was pretty easy, after all that troubleshooting, since I did not hugely care about any missed events, but did want to avoid a full resync of the driver. Basically disable the driver, which clears the current cache file (aka throws away any cached events) and then when you re-enable it, be sure to select the tick box "Do not resync on driver restart".

The logic behind the resync is that the two systems have clearly fallen out of sync while a driver is disabled. This is because events must have been missed, since while the driver is disabled, no events are being cached, and events are always happening. Usually we would prefer to be perfectly in sync so the default behavior of the driver is to force a resync after a driver returns from a disabled state.

However, when you have thousands of accounts, a resync can takes up to hours to process, since every object in both systems, need to be read, and compared and differences sent to each system. As you can imagine this takes measurable time.

In this case, once I cleared the cache file, via the Disable driver approach, the driver started perfectly fine and life went on. I thought this was a good troubleshooting exercise to try and work through, since the only real clue was pretty subtle and buried in trace. Hope this helps someone who finds themselves in a similarly inexplicable troubleshooting incident.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

Tags: ,
Categories: Identity Manager, Technical Solutions

Disclaimer: As with everything else at NetIQ Cool Solutions, this content is definitely not supported by NetIQ, so Customer Support will not be able to help you if it has any adverse effect on your environment.  It just worked for at least one person, and perhaps it will be useful for you too.  Be sure to test in a non-production environment.

Comment