The `sd_pattern` IPS option provides detection and filtering of Personally
Identifiable Information (PII). This information includes credit card
numbers, U.S. Social Security numbers, phone numbers, and email addresses.
A rich regular expression syntax is available for defining your own PII.

==== Hyperscan

The `sd_pattern` rule option is powered by the open source Hyperscan
library from Intel. It provides a regex grammar which is mostly PCRE
compatible. To learn more about Hyperscan see
https://intel.github.io/hyperscan/dev-reference/

==== Syntax

Snort provides `sd_pattern` as IPS rule option with no additional inspector
overhead. The Rule option takes the following syntax.

    sd_pattern: "<pattern>"[, threshold <count>];

===== Pattern

Pattern is the most important and is the only required parameter to
`sd_pattern`. It supports 5 built-in patterns which are configured by name:
"credit_card", "us_social", "us_social_nodashes", "email", and "us_phone" as
well as user defined regular expressions of the Hyperscan dialect (see
https://intel.github.io/hyperscan/dev-reference/compilation.html#pattern-support).

    sd_pattern:"credit_card";

When configured, Snort will replace the pattern 'credit_card' with the built-in
pattern. In addition to pattern matching, Snort will validate that the matched
digits will pass the Luhn-check algorithm.

    sd_pattern:"us_social";
    sd_pattern:"us_social_nodashes";

These special patterns will also be replaced with a built-in pattern.
Naturally, "us_social" is a pattern of 9 digits separated by `-`'s in the
canonical form. For this pattern, some validation of compliance with the
Social Security Numbers randomization rules is also performed.

    sd_pattern:"email";

This pattern will be replaced with a built-in pattern created to match email.
The regex implements the “preferred” syntax from RFC 1035 which is one of the
recommendations in RFC 5322.

    sd_pattern:"us_phone";

This pattern will match U.S. phone numbers in different formats with or without
country code.

    sd_pattern:"\b\w+@ourdomain\.com\b"

This is a user defined pattern which matches what is most likely email
addresses for the site "ourdomain.com". The pattern is a PCRE compatible
regex, '\b' matches a word boundary (whitespace, end of line, non-word
characters)  and '\w+' matches one or more word characters. '\.' matches
a literal '.'.

The above pattern would match "a@ourdomain.com", "aa@ourdomain.com" but would
not match `1@ourdomain.com` `ab12@ourdomain.com` or `@ourdomain.com`.

Note: This is just an example, this pattern is not suitable to detect many
correctly formatted emails.

===== Threshold

Threshold is an optional parameter allowing you to change built-in default
value (default value is '1'). The following two instances are identical.
The first will assume the default value of '1' the second declaration
explicitly sets the threshold to '1'.

    sd_pattern:"This rule requires 1 match";
    sd_pattern:"This rule requires 1 match", threshold 1;

That's pretty easy, but here is one more example anyway.

    sd_pattern:"This is a string literal", threshold 300;

This example requires 300 matches of the pattern "This is a string literal"
to qualify as a positive match. That is, if the string only occurred 299 times
in a packet, you will not see an event.

===== Obfuscating built-in patterns

Snort provides discreet logging for the built-in patterns "credit_card",
"us_social", "us_social_nodashes", "us_phone", and "email". Enabling
`ips.obfuscate_pii` makes Snort obfuscate the suspect packet payload which
was matched by the patterns. This configuration is enabled by default.

    ips =
    {
        obfuscate_pii = true
    }

==== Examples

Complete Snort IPS rules with built-in sensitive data patterns.

    alert tcp ( sid:1; msg:"Credit Card"; sd_pattern:"credit_card"; )
    alert tcp ( sid:2; msg:"US Social Number"; sd_pattern:"us_social"; )
    alert tcp ( sid:3; msg:"US Social Number No Dashes"; sd_pattern:"us_social_nodashes"; )
    alert tcp ( sid:4; msg:"US Phone Number"; sd_pattern:"us_phone"; )
    alert tcp ( sid:5; msg:"Email"; sd_pattern:"email"; )

Let's try them on the next traffic.

    33 34 38 30 31 32 37 34 33 35  37 34 35 38 30 20 20 20 20 20  348012743574580     
    34 30 34 2D 35 30 2D 32 31 38  33 20 20 20 20 20 20 20 20 20  404-50-2183         
    34 30 34 35 30 32 31 38 33 20  20 20 20 20 20 20 20 20 20 20  404502183           
    31 2D 39 31 39 2D 36 36 33 2D  32 35 32 34 20 20 20 20 20 20  1-919-663-2524      
    74 75 72 2E 63 61 6C 6C 69 65  40 67 6D 61 69 6C 2E 63 6F 6D  tur.callie@gmail.com

Printout of alert_cmg logger for this would be obfuscated.

    snort.raw[100]:
    - - - - - - - - - - - - - - -  - - - - - - - - - - - - - - -  - - - - -  - - - - -
    58 58 58 58 58 58 58 58 58 58  58 34 35 38 30 20 20 20 20 20  XXXXXXXXXXX4580     
    58 58 58 58 58 58 58 32 31 38  33 20 20 20 20 20 20 20 20 20  XXXXXXX2183         
    58 58 58 58 58 32 31 38 33 20  20 20 20 20 20 20 20 20 20 20  XXXXX2183           
    58 58 58 58 58 58 58 58 58 58  32 35 32 34 20 20 20 20 20 20  XXXXXXXXXX2524      
    58 58 58 58 58 58 58 58 58 58  58 58 58 58 58 58 2E 63 6F 6D  XXXXXXXXXXXXXXXX.com
    - - - - - - - - - - - - - - -  - - - - - - - - - - - - - - -  - - - - -  - - - - -

But obfuscation doesn't work for custom patterns.

Example of a rule with a custom pattern.

    alert tcp (sid: 6; sd_pattern:"\b\w+@ourdomain\.com\b"; msg: "Custom email")

Traffic.

    61 40 6F 75 72 64 6F 6D 61 69  6E 2E 63 6F 6D 20 20 20 20 20  a@ourdomain.com     
    61 61 40 6F 75 72 64 6F 6D 61  69 6E 2E 63 6F 6D              aa@ourdomain.com

Printout of alert_cmg logger for this would not be obfuscated.

    01/01-02:00:00.000004 [**] [1:6:0] "Custom email" [**] [Priority: 0] {TCP} 10.1.2.3:48620 -> 10.9.8.7:80
    02:01:02:03:04:05 -> 02:09:08:07:06:05 type:0x800 len:0x5A
    10.1.2.3:48620 -> 10.9.8.7:80 TCP TTL:64 TOS:0x0 ID:3 IpLen:20 DgmLen:76
    ******** Seq: 0x2  Ack: 0x0  Win: 0x2000  TcpLen: 20

    snort.raw[36]:
    - - - - - - - - - - - - - - -  - - - - - - - - - - - - - - -  - - - - -  - - - - -
    61 40 6F 75 72 64 6F 6D 61 69  6E 2E 63 6F 6D 20 20 20 20 20  a@ourdomain.com     
    61 61 40 6F 75 72 64 6F 6D 61  69 6E 2E 63 6F 6D              aa@ourdomain.com
    - - - - - - - - - - - - - - -  - - - - - - - - - - - - - - -  - - - - -  - - - - -

Threshold values are applied per packet.

So, traffic like this.

    Packet 1 payload:"a@ourdomain.com"
    Packet 2 payload:"aa@ourdomain.com"

Doesn't match a rule like this.

    alert tcp (sid: 7; sd_pattern:"\b\w+@ourdomain\.com\b", threshold 2; msg: "Custom email")

==== Caveats

1. sd_pattern implementation relies on Hyperscan, regardless of the search engine specified
in the config. So, Snort must be built and run with Hyperscan to have sd_pattern
IPS option available.

2. Log obfuscation is only applicable to the buffer that the pattern was found in.

3. Log obfuscation is only applicable to CMG and Unified2 logging formats.
Unified2 logger only supports obfuscation of http extra data.

4. Log obfuscation doesn't support user defined PII patterns. It is
currently only supported for the built-in patterns for Credit Cards and U.S.
Social Security numbers, emails and U.S. phone numbers.

