A.4 Search Query Syntax

Change Guardian uses the Lucene query language for searching events. This section provides an overview of how to use the Lucene query language to perform searches in Change Guardian. For more advanced features, see Apache Lucene - Query Parser Syntax.

For information on the event fields in Change Guardian, click Tips on the top right corner in the Administration Console. A table is displayed that lists the event names and their IDs.

A.4.1 Basic Search Query

A basic query is a search for a value on a field. The syntax is as follows:

msg:<value>

The field name (msg) is separated from the value by a colon.

For example, to search for a phrase that includes the word “authentication,” you can specify the search query as follows:

msg:authentication

Or, to search for events of severity 5, you can specify the search query as follows:

sev:5

If the value has spaces or other delimiters in it, you should use quotation marks. For example:

msg:"value with spaces"

Change Guardian classifies event fields as either tokenized fields or non-tokenized fields. A tokenized field is indexed and is searched differently than a non-tokenized field.

Case Insensitivity

Indexing and searching in Change Guardian is not case-sensitive. For example, the following queries are all equivalent:

msg:AdMin
msg:admin
msg:ADMIN

Special Characters

If you include special characters as part of a search, the special characters must be escaped. These characters are as follows:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

Use “ \” before the character you want to escape. For example, to search for ISO/IEC_27002:2005 in the rv145 (Tag) field, use the following query:

rv145:ISO\/IEC_27002\:2005

You can also use quotation marks around the query:

rv145:"ISO/IEC_27002:2005"

If the value contains quotation marks, you must escape it by using the “\” character instead of quotation marks. For example, to search for “system “mail” service” in the initiatorservicename field, you must specify the query as follows:

sp:"system \"mail\" service"

For more information on quoting wildcard characters, see Quoted Wildcards.

Operators

Lucene supports AND, OR, and NOT Boolean operators, which allow words to be combined. Boolean operators must be always capitalized.

OR Operator

The OR operator is the default conjunction operator. If there is no Boolean operator between two clauses, the OR operator is used. The OR operator links two clauses and finds a matching event if either of the clauses is satisfied. The symbol || can be used in place of the word OR. For example, consider the following query:

sun:admin OR dun:admin

This query finds events whose initiator username or target username is “admin.” The following query produces the same result because OR is used by default:

sun:admin dun:admin

AND Operator

The AND operator links two clauses and finds a matching event only if both clauses are satisfied. The symbol && can be used in place of the word AND. For example, consider the following query:

sun:admin AND dun:tester

This query finds events whose initiator username is admin and the target username is tester.

NOT Operator

The NOT operator excludes events that match the clause after the NOT. The symbol ! can be used in place of the word NOT. For example, consider the following query:

sev:[0 TO 5] NOT st:I NOT st:A NOT st:P

This query matches all events whose severity is between 0 and 5, but excludes those whose sensor type is I (internal), A (audit), or P (performance); that is, it excludes Change Guardian internal events.

The NOT operator cannot be used by itself because it is a way to exclude events from a set that has been found by other search terms. For example, consider the following query:

NOT st:I NOT st:A NOT st:P

This query might seem like it should return all events where the sensor type is not I, A, or P. However, it is an invalid query because a query cannot begin with the NOT operator.

Operator Precedence

Parentheses can be used in the usual way to change operator precedence. They can be nested to any depth, as shown in the following examples:

(sun:admin OR dun:admin) AND (sip:10.0.0.1 OR sip:10.0.0.2)

((sun:admin OR dun:admin) AND (sip:10.0.0.1 OR sip:10.0.0.2)) OR (msg:user AND evt:authentication)

The Default Search Field

Lucene uses a default search field, which is the field that is searched if no field is specified. In Change Guardian, _data is the default search field. By default, the default search field is a concatenation of the following event fields:

evt,msg,sun,iuid,dun,tuid,sip,sp,dip,dp,rv42,shn,rv35,rv41,dhn,rv45,obsip,sn,obsdom,obssvcname,ttd,ttn,rv36,fn,ei,rt1,rv43,rv40,isvcc

The default search field is indexed and searched as a tokenized field. The result is that you can search for words that might appear in any event field.

You can also customize the set of event fields that are concatenated in the default search field by adding the indexedlog.datafield.ids property in the configuration.properties file.

For example, suppose you have two non-tokenized fields in an event, sun (initiatorusername) and dun (targetusername). The sun field has the following value:

report-administrator

The dun field has the following value:

system-tester

The _data field contains the concatenation of these fields separated by a single space character:

report-administrator system-tester

Because the _data field is a tokenized field, the words “report,” “administrator,” “system,” and “tester” are indexed and searchable. The following queries would find this event:

report

_data:report

report-administrator

_data:report-administrator

report tester

In addition, the following queries also find this event:

sun:report-administrator

dun:system-tester

Tokenized Fields

Fields that are classified as tokenized fields are parsed into individual words for indexing. Therefore, a search occurs only on words within the field value. Characters that are considered to be word delimiters are not searchable, nor are words that are considered to be stop words. Lucene removes extremely common words to save disk space and speed up searching. These words are ignored in search filters. Currently, the following stop words are removed:

a
an
and
are
as
at
be
but
by
for
if
in
into
is
it
no
not
of
on
or
such
that
the
their
then
there
these
they
this
to
was
will
with

When it does a search, Lucene examines all of the words in a field and tries to match words in the search value. For example, suppose that you specify a search for messages containing the following value:

msg:"user-authentication failed on the server"

The words that are parsed within this value are “user,” “authentication,” “failed,” and “server.” These are the only search words that would match this value. “On” and “the” are omitted because they are stop words.

The value has the hyphen character (-) between some words. Hyphens are treated as word delimiters, so Lucene does not search for hyphens. Consider, the following query:

msg:"user-authentication"

The results might not be exactly what you expect. The query search value matches the value, but not because it is matching the hyphen. It matches because Lucene first parses the words in the search value and identifies the words “user” and “authentication.” Lucene then matches those words against values that have the words “user” and “authentication” with no intervening words in between. This query would also match the following value, even though there is no hyphen between “user” and “authentication”:

user authentication has failed on the server

Consider the following query:

msg:"failed on server"

This query has the stop word, "on," which is ignored. However, the stop word does affect the relative positioning that is expected to be between words when evaluating a value to see if it matches. The “failed on server” search matches any phrase where the words failed and server are separated by exactly one word. It does not matter what the word is because the separating word is a stop word and is ignored. Thus, the above query would match all of the following:

failed on server

failed-on server

failed a server

failed-a-server

Proximity indicators created by using the ~ character followed by a value, make this more complicated. The query dictates an expected distance between words. In the “failed on server” query, the expected distance between “failed” and “server” is one word. The proximity indicator specifies how much variance there can be from the expected distance. For example, consider the following query, where a proximity indicator of one (~1) is specified:

msg:"failed on server"~1

This query indicates that the distance between “failed” and “server” could be plus or minus one from the expected distance, which is one because of the stop word “on.” Thus, the distance could be 1, 1-1 (0), or 1+1 (2). Thus, all of the following would match:

failed on server

failed on the server

failed finance server

As of Lucene version 3.1, word parsing is done according to word break rules outlined in the Unicode Text Segmentation algorithm. For more information, see Unicode Text Segmentation.

For information on tokenized fields in Change Guardian, in the Administration Console click Tips on the top right corner of the Administration Console. A table is displayed that lists all the event fields and whether an event field is searchable or not.

Non-Tokenized Fields

Fields that are classified as non-tokenized fields are parsed fully for indexing. Thus, a search occurs on full field values. For example, to search events whose initiatoruserfullname (iufname) field has the value “Bob White”, you must specify the query as follows:

iufname:"Bob White"

A.4.2 Wildcards in Search Queries

Change Guardian supports wildcards in search values but not in regular expressions:

The asterisk (*) matches zero or more characters.
The questions mark (?) matches any one character.

For example:

adm*test: Matches admtest, ADMTEST, admintest, adMINtEst (note the lack of case sensitivity).
adm?test: Matches adm1test and AdMatest. Does not match admtest or ADMINTEST because it must have exactly one character between "adm" and "test."

Wildcards in Tokenized Fields

Wildcards are applied differently to tokenized fields and non-tokenized fields. Wildcards for tokenized fields match only words that were parsed from the value and not the entire value. For example, if you specify the search query msg:authentication*failed to search for the message The user authentication has failed on the server, it does not return the events with this message. This is because “*” does not match anything between “authentication” and “failed.” However, it matches any words that begin with “authentication” and end with “failed.” For example, it returns results if any of the following words are used: “authenticationhasfailed,” “authenticationuserfailed,” and “authenticationserverfailed.” For tokenized fields, all matching that uses wildcard searches is done on the words within the value and not on the full value.

Quoted Wildcards

Tokenized Fields

When wildcards are quoted, they are not treated as wildcards, but as word delimiters. For example, consider the following query:

msg:"user* fail*"

The search value "user* fail*" is parsed into two words, “user” and “fail.” The semantic is "find any event where the msg field contains “user” AND “fail” words in that order, and there are no intervening words between them.” Thus, it does not match the following value:

The user authentication has failed on the server.

This is because the wildcard is not treated as a wildcard but as a word delimiter.

Non-Tokenized Fields

When wildcards are quoted, they are treated as literal characters to search. For example, if the query is: sun:"adm*," it returns the following values:

adm*

ADM* (case-insensitive)

The query does not return the following values:

admin

ADMIN

Leading Wildcards

Leading wildcards are not valid in searches because Lucene does not allow the * or ? characters to be the first character of a search value. For example, the following queries are invalid:

sun:*adm* The semantic is “find any event whose initiator username value contains the letters a, d, and m in sequence.“
sun:*tester The semantic is “find any event whose initiatorusername value ends with “tester.”
sun:* The semantic is “find any event whose initiator username field is non-empty.”

Because this is an important type of query, Change Guardian provides an alternative way to accomplish this. For more information, see The notnull Query.

A.4.3 The notnull Query

You might need to find events where some field is present, or non-empty. For example, to find all events that have a value in the sun field, you can specify the query as sun:*

The query does not return the expected results because Lucene does not support wildcards to be the first character of a search value. However, Change Guardian provides an alternate solution. For every event, Change Guardian creates a special field called notnull. The notnull field is a list of all fields in the event that are not null (not empty). For example, if there is an event that has values in the evt, msg, sun, and xdasid fields, the notnull field contains the following value:

evt msg sun xdasid

The notnull field is a tokenized field, so the following kinds of queries are possible:

notnull:sun Finds all events whose sun field has a value.
notnull:xdas* Finds all events where any field beginning with the name "xdas" has a value.

When a notnull field is added in Lucene, creating, indexing, and storing this field adds a cost to processing each event as CPU needs to create and index the field and it also requires additional storage space. If you want to disable storing the list of non-empty fields in the notnull field, set the following property in the /etc/opt/novell/sentinel/config/configuration.properties file:

indexedlog.storenotnull=false

Save the file and restart the Change Guardian server. All events received after this property was set do not have a notnullfield associated.

NOTE:If you disable the notnull field, do not use the notnull field in search filters, rule filters, or policy filters because the results might be incorrect and unpredictable.

A.4.4 Tags in Search Queries

The Tag field (rv145) is a tokenized field that has special parsing rules for words. The parsing rules enable you to search on tags that include non-alphanumeric characters. However, the only word delimiters are white space characters such as the blank and the tab. This is because tags do not include white space in their names. For example, the following queries find the event if the event is tagged with the ISO/IEC_27002:2005 tag and the NIST_800-53 tag:

rv145:"ISO/IEC_27002:2005"

rv145:"iso/iec_27002:2005"

rv145:"ISO/IEC_27002*"

rv145:nist_*

The slash (/), hyphen (-), and colon (:) characters are significant in the search value because, unlike other tokenized fields, the parsing rules for rv145 do not treat them as a word delimiter. Also, the search is not case sensitive.

The following queries would not find the event:

rv145:"ISO IEC_27002 2005"

rv145:"iso *"

A.4.5 Regular Expression Queries

Regular expression queries allow you to search events that match a pattern. These queries must be enclosed in quotation marks (“ “) and forward slash (/). For example, to search for an initiator user name that ends with the character "a”, you can specify the search query as follows:

sun:"/.*a/"

If you need to include special characters in your query, you must escape special characters by preceding them with the backslash (\) character. For example, to search for an initiator user name that ends with the character “$”, you can specify the search query as follows:

sun:"/.*\$/"

For more information about using special characters, see Special Characters.

NOTE:Regular expression queries utilize significantly more system resources than other kinds of queries because they are unable to leverage the more efficient data structures available in the index. Executing regular expression queries take longer than other kinds of queries and potentially pull system resources from other components of the system. Therefore, use regular expression queries carefully and narrow the breadth of the search as much as possible by using time range and non-regular expression criteria terms.

A.4.6 Range Queries

Range queries allow you to find events where a field value is between a lower bound and an upper bound. Range queries can be inclusive or exclusive of the upper and lower bounds. Whether a particular value falls in the specified range is based on lexicographic character sorting. Inclusive ranges are denoted by square brackets []. Exclusive ranges are denoted by curly brackets {}.

For example, consider the following query:

sun:[admin TO tester]

This query finds events whose sun field has values between admin and tester, inclusive. Note that "TO" is capitalized.

However, if you change the query as follows:

sun:{admin TO tester}

The query now finds all events whose sun field is between admin and tester, not including admin and tester.

Some event fields such as sev and xdasid are numeric. In Change Guardian, range queries on numeric fields are based on numeric sorting and not on lexicographic character sorting. For example, consider the following query:

xdasid:[1 TO 7]

This query returns events whose xdasid value is 1, 2, 3, 4, 5, 6, or 7. If the range evaluation was based on lexicographic sorting, it would incorrectly match 10, 101, 100001, 200, and so on.

A.4.7 IP Addresses Query

There are several extensions that Change Guardian has implemented for searching on IP addresses. Specifically, there are a number of convenient ways to specify IP address ranges. These are explained in the following sections:

CIDR Notation

Change Guardian supports the Classless Inter-Domain Routing (CIDR) notation as a search value for IP address fields such as sip (initiator IP) and dip (target IP) for specifying an IP address range. The notation uses a combination of an IP address and a mask, as follows:

"xxx.xxx.xxx.xxx/n"

In this notation, n is the number of high order bits in the value to match. For example, consider the following query:

sip:"10.0.0.0/24"

This query returns events whose sip field is an IPv4 address ranging from 10.0.0.0 to 10.0.0.255.

The same notation works for IPv6 addresses. For example, consider the following query:

sip:"2001:DB8::/48"

This query returns events whose sip field is an IPv6 address ranging from 2001:DB8:: to 2001:DB8:0:FFFF:FFFF:FFFF:FFFF:FFFF.

Wildcards in IP Addresses

You can use only the asterisk character (*) in the IP address search values to specify ranges of IP addresses. You cannot use the question mark (?) character.

In IPv4 addresses, an asterisk (*) can be used at any of the positions in the quad format. In IPv6 addresses, an asterisk (*) can be used between colons to specify a 16-bit segment. For example, all of the following queries are valid on the sip field:

sip:10.*.80.16

sip:10.02.*.*

sip:10.*.80.*

sip:"CAFE:*::FEED"

sip:"CAFE:*:FADE:*::FEED"

If an asterisk (*) is used in one of the quad positions in an IPv4 address or between colons in an IPv6 address, it cannot be combined with other digits. For example, all of the following queries are invalid:

sip:10.*7.80.16

sip:10.10*.80.16

sip:"CAFE:FA*::FEED"

sip:"CAFE:*DE::FEED"

Because the question mark (?) is not allowed, the following queries are invalid:

sip:10.10?.80.16

sip:10.?.80.16

sip:"CAFE:FA??::FEED"

sip:"CAFE:??DE::FEED"