Today I was again totally amazed by the power of xpath when working on a SAP HR driver. The SAP system publishes data as IDocs, which are basically textfiles where every line contains one record, with the value fields defined by their position in the line.

Funny enough every line can be different in structure, which only works because the first part in every line gives a hint on how to decode the rest. I think I’ve read that even SAP has started using XML for data exchange nowadays, nevertheless, the SAP HR driver still has to deal with traditional IDocs.

I was looking for a way to obtain an employee’s exit date, which is not directly available as an IDoc field but needs to be calculated as the day before the begin of the validity period of the action record of type “10” with the highest validity end date. Know what I mean?

Let’s look at the first lines of an example IDoc for workforceID 1234:

EDI_DC40                      1000000000000123456700 3012  HRMD_A07...
E2PLOGI001                    10000000000001234560716140000000201P 00001234 U
E2PITYP001                    10000000000001234560716150716140301P 000012340000    1800010199991231
E2P0000001                    100000000000012345607161607161504000012340000       200112312000011700020000125SOMEONE                     01  031
E2P0000001                    100000000000012345607161707161504000012340000       200211302002010100020020121SOMEONE                     02  031
E2P0000001                    100000000000012345607161807161504000012340000       200312312002120100020021122SOMEONE                     02  031
E2P0000001                    100000000000012345607161707161504000012340000       200411302004010100020020121SOMEONE                     10  000
E2P0000001                    100000000000012345607161807161504000012340000       200412312004120100020021122SOMEONE                     12  031
E2P0000001                    100000000000012345607161907161504000012340000       200712312005010100020040719SOMEONE                     02  031
E2P0000001                    100000000000012345607162007161504000012340000       999912312008010100020080123SOMEONE                     1017031

The lines starting with “E2P0000001” are all action records and the last line here is the record showing the relevant exit action. The first two char blocks in that line are mostly header info, the third one is a concatenation of validity end date, validity begin date, sequence number, modification date and modifiers name, and the first two chars of last block give the action type “10” (take a look at Geoffrey’s excellent article about IDoc structures if you want your brain twisted a bit: Decoding iDOCs with the IDM SAP Driver).

When I map the driver’s action type attribute (MASSN at position 74 with field length 2 in InfoType 0000 for the initiated 😉 against our custom string attribute auxEmployeeExitDate and put it into the driver publisher filter, I end up seeing it in a trace coming in like this:

<modify-attr attr-name="P0000:MASSN:none:74:2">
	<remove-all-values/>
	<add-value>
		<value seqnr="000" timestamp="19990117-20001231" type="string">01</value>
		<value seqnr="000" timestamp="20020101-20021130" type="string">02</value>
		<value seqnr="000" timestamp="20021201-20031231" type="string">02</value>
		<value seqnr="000" timestamp="20040101-20041130" type="string">10</value>
		<value seqnr="000" timestamp="20041201-20041231" type="string">12</value>
		<value seqnr="000" timestamp="20050101-20071231" type="string">02</value>
		<value seqnr="000" timestamp="20080101-99991231" type="string">10</value>
	</add-value>
</modify-attr>

Again: the exit date I am looking for is the day before the begin of the validity period (the part before the hyphen in @timestamp) of the value that is “10”. If there are more than one of these values (in case someone got hired & fired multiple times), I want the one with the highest validity end date (the part after the hyphen in @timestamp). In the above example the exit date I am looking for would be Dec. 31st, 2007, right?

Finding the values with the correct action code in a first step is easy:

<token-xpath expression='modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value[text()="10"]'>

But how to select the value with the highest validity end date? I’ve been thinking about some complicated for-each looping with lots of xpath and variables to store temp values, maybe, or even writing an extension function in ecmascript first. But after a lot of trial and error and re-reading the xpath 1.0 specs I found that it’s basically a one-liner with the help of the preceding-sibling/following-sibling axes.

As an exercise I tried to grab the highest value of the attribute first (leaving the additional xpath necessary to access the date stored in @timestamp to be added later) and found it could be done this way:

<token-xpath expression='modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value[not(. < preceding-sibling::value or . < following-sibling::value)][1]'/>

Wouldn’t have thought implementing the max() function in xpath 1.0 would be so easy after all 😉 Similarly, the minimum value can be found:

<token-xpath expression='modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value[not(. > preceding-sibling::value or . > following-sibling::value)][1]'/>

Now, how to modify this to compare the validity end dates instead of the attribute values? We need to compare substrings of the timestamp XML attributes here:

<token-xpath expression='modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value[not(substring(@timestamp,10,8) < substring(preceding-sibling::value/@timestamp,10,8) or substring(@timestamp,10,8) < substring(preceding-sibling::value/@timestamp,10,8))][1]'/>

In my case I found it best to strip the values I do not need instead of selecting the one I want, leaving only the relevant value in the operation for further processing:

<do-strip-xpath expression='modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value[not(text()="10")']/>
<do-strip-xpath expression='modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value[substring(@timestamp,10,8) < substring(following-sibling::value/@timestamp,10,8) or substring(@timestamp,10,8) < substring(preceding-sibling::value/@timestamp,10,8)]'/>
<do-strip-xpath expression='modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value[not(1)]'/>

The last do-strip-xpath makes sure we end up with a single value and should not be needed as long as the data in the SAP system is all perfect (no two exit actions should exist with the same end date). But then, you never know and from bad experience I like to make my rules as fault-tolerant as possible.
The example would now look like this, much closer to what I need:

<modify-attr attr-name="P0000:MASSN:none:74:2">
	<remove-all-values/>
	<add-value>
		<value seqnr="000" timestamp="20080101-99991231" type="string">10</value>
	</add-value>
</modify-attr>

To replace the action type “10” with the date before the validity begin date, a simple reformat operation attribute will do:

<do-reformat-op-attr name="P0000:MASSN:none:74:2">
	<arg-value>
		<token-convert-time dest-format="yyyyMMdd" offset="-1" offset-unit="day" src-format="yyyyMMdd">
			<token-xpath expression="substring($current-value/@timestamp,1,8)"/>
		</token-convert-time>
	</arg-value>
</do-reformat-op-attr>

The offset in the convert time token only works in IDM 3.6, so in older versions I would’ve had to convert to !CTIME format, substract 86400 and convert back to yyyyMMdd format. Finally I get what I want:

<modify-attr attr-name="P0000:MASSN:none:74:2">
	<remove-all-values/>
	<add-value>
		<value>20071231</value>
	</add-value>
</modify-attr>

Note that do-reformat-op-attr rebuilds that value node from scratch so you loose all XML attributes at this point. If you need to evaluate them, always do this before reformatting!

The SAP HR driver is somewhat special, as all changes on the publisher side come in as modifies and you cannot query the SAP system directly (only the currently processed IDoc), but other drivers may see add and instance operations as well. Of course you can copy and adapt the xpath above to handle those cases as well, but it would be much nicer to have it all in one set of do-strip-xpath tokens, wouldn’t it?

Two more axes come into the mix at this stage: self and ancestor. Replace the

modify-attr[@attr-name="P0000:MASSN:none:74:2"]//value

with

self::*//value[ancestor::*/@attr-name="P0000:MASSN:none:74:2"]

and there we are! Possible operations that could match self::*//value[ancestor::*/@attr-name=”…”] according to the NDS.DTD (http://www.novell.com/documentation/idm36/policy_dtd/data/dtdndsvalue.html) are add/modify/instance/query, so the do-strip-xpath tokens operate exactly on the same value elements as the do-reformat-op-attr token here. The final result is a short rule to solve a quite comlex task:

<rule>
	<description>Reformat auxEmployeeExitDate (P0000:MASSN:none:74:2)</description>
	<comment xml:space="preserve">Determine the day before the begin of the validity period of the action record of type "10" with the highest validity end date</comment>
	</conditions>
	<actions>
		<do-strip-xpath expression='self::*//value[ancestor::*/@attr-name="P0000:MASSN:none:74:2"][not(text()="10")]'/>
		<do-strip-xpath expression='self::*//value[ancestor::*/@attr-name="P0000:MASSN:none:74:2"][substring(@timestamp,10,8) < substring(following-sibling::value/@timestamp,10,8) or substring(@timestamp,10,8) < substring(preceding-sibling::value/@timestamp,10,8)]'/>
		<do-strip-xpath expression='self::*//value[ancestor::*/@attr-name="P0000:MASSN:none:74:2"][not(1)]'/>
		<do-reformat-op-attr name="P0000:MASSN:none:74:2">
			<arg-value>
				<token-convert-time dest-format="yyyyMMdd" offset="-1" offset-unit="day" src-format="yyyyMMdd">
					<token-xpath expression="substring($current-value/@timestamp,1,8)"/>
				</token-convert-time>
			</arg-value>
		</do-reformat-op-attr>
	</actions>
</rule>

Xpath rules!

0 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 5 (0 votes, average: 0.00 out of 5)
You need to be a registered member to rate this post.
Loading...
Categories: Uncategorized

Disclaimer: As with everything else at NetIQ Cool Solutions, this content is definitely not supported by NetIQ, so Customer Support will not be able to help you if it has any adverse effect on your environment.  It just worked for at least one person, and perhaps it will be useful for you too.  Be sure to test in a non-production environment.

Leave a Reply

2 Comments

  • geoffc says:

    Lothar,

    Great stuff! We need more of these excellent articles on XPATH!

    Will definitely be using this one soon enough!

    • Alexander McHugh Alexander McHugh says:

      Whilst this is very useful for small data sets, this type of XPath expression you used has O(N2) performance, where N is the number of nodes. So it doesn’t scale very well.

      My testing has shown that where it is possible a carefully constructed for-each that only iterates over each node once is far faster than this

      I tested a sample that looped over all timestamps in a large (10k nodes) document.
      – for-each 0.227 seconds
      – xpath: 70.148 seconds

      Another alternative is an ecmascript function.

lhaeger
By: lhaeger
Dec 4, 2008
4:28 pm
Reads:
5,482
Score:
Unrated
Active Directory Authentication Automation Cloud Computing Cloud Security Configuration Customizing Data Breach DirXML Drivers End User Management Identity Manager Importing-Exporting / ICE/ LDIF Intelligent Workload Management IT Security Knowledge Depot LDAP Monitoring Open Enterprise Server Passwords Reporting Secure Access Supported Troubleshooting Workflow