The `sd_pattern` IPS option provides detection and filtering of Personally
Identifiable Information (PII).  This information includes credit card
numbers, U.S. Social Security numbers, phone numbers, and email addresses.
A rich regular expression syntax is available for defining your own PII. 

==== Hyperscan

The `sd_pattern` rule option is powered by the open source Hyperscan
library from Intel.  It provides a regex grammar which is mostly PCRE
compatible. To learn more about Hyperscan see
https://intel.github.io/hyperscan/dev-reference/

==== Syntax 

Snort provides `sd_pattern` as IPS rule option with no additional inspector
overhead.  The Rule option takes the following syntax.

    sd_pattern: "<pattern>"[, threshold <count>];

===== Pattern

Pattern is the most important and is the only required parameter to
`sd_pattern`. It supports 5 built-in patterns which are configured by name:
"credit_card", "us_social", "us_social_nodashes", "email", and "us_phone" as
well as user defined regular expressions of the Hyperscan dialect (see
https://intel.github.io/hyperscan/dev-reference/compilation.html#pattern-support).

    sd_pattern:"credit_card";

When configured, Snort will replace the pattern 'credit_card' with the built-in
pattern. In addition to pattern matching, Snort will validate that the matched
digits will pass the Luhn-check algorithm.

    sd_pattern:"us_social";
    sd_pattern:"us_social_nodashes";

These special patterns will also be replaced with a built-in pattern.
Naturally, "us_social" is a pattern of 9 digits separated by `-`'s in the
canonical form. For this pattern, some validation of compliance with the
Social Security Numbers randomization rules is also performed.

    sd_pattern:"email";

This pattern will be replaced with a built-in pattern created to match email.
The regex implements the “preferred” syntax from RFC 1035 which is one of the
recommendations in RFC 5322.

    sd_pattern:"us_phone";

This pattern will match U.S. phone numbers in different formats with or without
country code.

    sd_pattern:"\b\w+@ourdomain\.com\b"

This is a user defined pattern which matches what is most likely email
addresses for the site "ourdomain.com". The pattern is a PCRE compatible
regex, '\b' matches a word boundary (whitespace, end of line, non-word
characters)  and '\w+' matches one or more word characters. '\.' matches
a literal '.'.

The above pattern would match "a@ourdomain.com", "aa@ourdomain.com" but would
not match `1@ourdomain.com` `ab12@ourdomain.com` or `@ourdomain.com`.

Note: This is just an example, this pattern is not suitable to detect many
correctly formatted emails.

===== Threshold

Threshold is an optional parameter allowing you to change built-in default
value (default value is '1').  The following two instances are identical.
The first will assume the default value of '1' the second declaration
explicitly sets the threshold to '1'.

    sd_pattern:"This rule requires 1 match";
    sd_pattern:"This rule requires 1 match", threshold 1;

That's pretty easy, but here is one more example anyway.

    sd_pattern:"This is a string literal", threshold 300;

This example requires 300 matches of the pattern "This is a string literal"
to qualify as a positive match. That is, if the string only occurred 299 times
in a packet, you will not see an event.

===== Obfuscating built-in patterns

Snort provides discreet logging for the built-in patterns "credit_card",
"us_social", "us_social_nodashes", "us_phone" and "email". Enabling
`ips.obfuscate_pii` makes Snort obfuscate the suspect packet payload which
was matched by the patterns. This configuration is enabled by default.

    ips =
    { 
        obfuscate_pii = true
    }

==== Example

A complete Snort IPS rule

    alert tcp ( sid:1; msg:"Credit Card"; sd_pattern:"credit_card"; )

Logged output when running Snort in "cmg" alert format. 

    02/25-21:19:05.125553 [**] [1:1:0] "Credit Card" [**] [Priority: 0] {TCP} 10.1.2.3:48620 -> 10.9.8.7:8
    02:01:02:03:04:05 -> 02:09:08:07:06:05 type:0x800 len:0x46
    10.1.2.3:48620 -> 10.9.8.7:8 TCP TTL:64 TOS:0x0 ID:14 IpLen:20 DgmLen:56
    ***A**** Seq: 0xB2  Ack: 0x2  Win: 0x2000  TcpLen: 20
    - - - raw[16] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    58 58 58 58 58 58 58 58 58 58 58 58 39 32 39 34              XXXXXXXXXXXX9294
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

==== Caveats

1. Snort currently requires setting the fast pattern engine to use
"hyperscan" in order for `sd_pattern` ips option to function correctly.

    search_engine = { search_method = 'hyperscan' }

2. Log obfuscation is only applicable to CMG and Unified2 logging formats.

3. Log obfuscation doesn't support user defined PII patterns. It is
currently only supported for the built-in patterns for Credit Cards and U.S.
Social Security numbers.

4. Log obfuscation doesn't work with stream rebuilt packet payloads.  (This
is a known bug).

