The `sd_pattern` IPS option provides detection and filtering of Personally
Identifiable Information (PII).  This information includes credit card
numbers, U.S. Social Security numbers, and email addresses.  A rich regular
expression syntax is available for defining your own PII. 

==== Hyperscan

The `sd_pattern` rule option is powered by the open source Hyperscan
library from Intel.  It provides a regex grammar which is mostly PCRE
compatible. To learn more about Hyperscan see
https://intel.github.io/hyperscan/dev-reference/

==== Syntax 

Snort provides `sd_pattern` as IPS rule option with no additional inspector
overhead.  The Rule option takes the following syntax.

    sd_pattern: "<pattern>"[, threshold <count>];

===== Pattern

Pattern is the most important and is the only required parameter to
`sd_pattern`. It supports 3 built in patterns which are configured by name:
"credit_card", "us_social" and "us_social_nodashes", as well as user
defined regular expressions of the Hyperscan dialect (see
https://intel.github.io/hyperscan/dev-reference/compilation.html#pattern-support).

    sd_pattern:"credit_card";

When configured, Snort will replace the pattern 'credit_card' with the built in
pattern. In addition to pattern matching, Snort will validate that the matched
digits will pass the Luhn-check algorithm.  Currently the only pattern that
performs extra verification.

    sd_pattern:"us_social";
    sd_pattern:"us_social_nodashes";

These special patterns will also be replaced with a built in pattern.
Naturally, "us_social" is a pattern of 9 digits separated by `-`'s in the
canonical form.

    sd_pattern:"\b\w+@ourdomain\.com\b"

This is a user defined pattern which matches what is most likely email
addresses for the site "ourdomain.com". The pattern is a PCRE compatible
regex, '\b' matches a word boundary (whitespace, end of line, non-word
characters)  and '\w+' matches one or more word characters. '\.' matches
a literal '.'.

The above pattern would match "a@ourdomain.com", "aa@ourdomain.com" but would
not match `1@ourdomain.com` `ab12@ourdomain.com` or `@ourdomain.com`.

Note: This is just an example, this pattern is not suitable to detect many
correctly formatted emails.

===== Threshold

Threshold is an optional parameter allowing you to change built in default
value (default value is '1').  The following two instances are identical.
The first will assume the default value of '1' the second declaration
explicitly sets the threshold to '1'.

    sd_pattern:"This rule requires 1 match";
    sd_pattern:"This rule requires 1 match", threshold 1;

That's pretty easy, but here is one more example anyway.

    sd_pattern:"This is a string literal", threshold 300;

This example requires 300 matches of the pattern "This is a string literal"
to qualify as a positive match. That is, if the string only occurred 299 times
in a packet, you will not see an event.

===== Obfuscating Credit Cards and Social Security Numbers

Snort provides discreet logging for the built in patterns "credit_card",
"us_social" and "us_social_nodashes". Enabling `output.obfuscate_pii` makes
Snort obfuscate the suspect packet payload which was matched by the
patterns. This configuration is disabled by default.

    output =
    { 
        obfuscate_pii = true
    }

==== Example

A complete Snort IPS rule

    alert tcp ( sid:1; msg:"Credit Card"; sd_pattern:"credit_card"; )

Logged output when running Snort in "cmg" alert format. 

    02/25-21:19:05.125553 [**] [1:1:0] "Credit Card" [**] [Priority: 0] {TCP} 10.1.2.3:48620 -> 10.9.8.7:8
    02:01:02:03:04:05 -> 02:09:08:07:06:05 type:0x800 len:0x46
    10.1.2.3:48620 -> 10.9.8.7:8 TCP TTL:64 TOS:0x0 ID:14 IpLen:20 DgmLen:56
    ***A**** Seq: 0xB2  Ack: 0x2  Win: 0x2000  TcpLen: 20
    - - - raw[16] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    58 58 58 58 58 58 58 58 58 58 58 58 39 32 39 34              XXXXXXXXXXXX9294
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

==== Caveats

1. Snort currently requires setting the fast pattern engine to use
"hyperscan" in order for `sd_pattern` ips option to function correctly.

    search_engine = { search_method = 'hyperscan' }

2. Log obfuscation is only applicable to CMG and Unified2 logging formats.

3. Log obfuscation doesn't support user defined PII patterns. It is
currently only supported for the built in patterns for Credit Cards and US
Social Security numbers.

4. Log obfuscation doesn't work with stream rebuilt packet payloads.  (This
is a known bug).

