
		Filtering

------------------------------------------------------------1.1	Filtering by fml
1.2	customize
1.3	examples fml cannot rejects

2	examples
2.2	reject Subject: with FREE, SEX, ADULT, XXX.
2.3	reject Subject: with FREE, SEX, ADULT, XXX (case insensitive).
2.4	reject if received: contains spam.co.jp
2.5	reject if the mail body contains http-equiv=3DContent-Type
2.6	reject if the domain of From: conflicts Message-ID:
2.7	aggregate the previous 3 rules
2.8	rule combined with sendmail PICKY_HELO_CHECK
2.9	size limit for the mail header
2.10	To: or Cc: should have $MAIL_LIST !
2.11	hook without using EnvelopeFilter
2.12	Duplicated Message-ID for mails from Notes
2.13	Reject Mails From Specific Domains
2.14	ISP enforce to append a signature to your mail 

3	Q & A
3.1	fml mis-judge "one line body" mail ?

4	Internals
4.1	Filtering
4.3	Reject if a field matches reject patterns

5	ContentHandler
5.1	Filtering Rule Examples
5.2	description by the original author 
6.2	MH slocal Interface

7	hacks
7.1	Cut off except the first multipart block
7.2	Against replied mails without In-Reply-To: nor References:
Appendix A	filtering of Postfix
Appendix A.1	body_checks directive
Appendix A.2	header_checks directive
Appendix A.3	Content Filter
Appendix B	Sendmail specific topics
Appendix B.1	PerlMX
------------------------------------------------------------


Filtering process consists of several positions. 
Firstly the user level filtering (=> 6)
Secondary mail server rejects it in receiving mails (=> 4.1).

1.1	Filtering by fml

	$USE_DISTRIBUTE_FILTER = 1;

   MENU -> SECURITY -> USE_DISTRIBUTE_FILTER -> ON

1.2	customize

	&DEFINE_FIELD_PAT_TO_REJECT('subject', 'FREE|SEX|ADULT|XXX');
	&DEFINE_FIELD_PAT_TO_REJECT('subject', '/FREE|SEX|ADULT|XXX/');
	&DEFINE_FIELD_PAT_TO_REJECT('from', 'ADULT');

1.3	examples fml cannot rejects

	the paragraph looks a command

	the last paragraph looks a signature 

	some paragraph contains Japanese

If the judgement of some paragraph is ambiguous, it may be mis-judged.

2	examples

	reject null Message-Id:
	reject null Message-Id:

	&DEFINE_FIELD_PAT_TO_REJECT('message-id', '^\s*$');

2.2	reject Subject: with FREE, SEX, ADULT, XXX.

	reject Subject: with FREE, SEX, ADULT, XXX.
	(but this config ignores "free software" content ?;-).

	&DEFINE_FIELD_PAT_TO_REJECT('subject', 'FREE|SEX|ADULT|XXX');
	&DEFINE_FIELD_PAT_TO_REJECT('subject', '/FREE|SEX|ADULT|XXX/');
	&DEFINE_FIELD_PAT_TO_REJECT('from', 'ADULT');

2.3	reject Subject: with FREE, SEX, ADULT, XXX (case insensitive).

	reject Subject: with FREE, SEX, ADULT, XXX (case insensitive).

	&DEFINE_FIELD_PAT_TO_REJECT('subject', '/free|sex|adult|xxx/i');

2.4	reject if received: contains spam.co.jp

$DISTRIBUTE_FILTER_HOOK = q#
    if ($e{'h:received:'} =~ /from spam.co.jp/) {
	return 'from a host in spam blacklist';
    }
#;

2.5	reject if the mail body contains http-equiv=3DContent-Type

$DISTRIBUTE_FILTER_HOOK = q#
    if ($e{'Body'} =~ /http-equiv=3DContent-Type/) {
	return 'mail with appended HTML documents';
    }
#;

2.6	reject if the domain of From: conflicts Message-ID:

$DISTRIBUTE_FILTER_HOOK = q#
   local($domain) = (split(/@/, $From_address))[1];
   if ($e{'h:message-id:'} !~ /$domain/i) {
	return 'Message-Id conflicts your From: address';
   }
#;

2.7	aggregate the previous 3 rules

Example:
	* reject if a Received: line has "from spam.co.jp".
	* reject if http-equiv=3DContent-Type exists in the body.
	* reject $From_address conflicts with Message-Id:'s domain.

$DISTRIBUTE_FILTER_HOOK = q#
    if ($e{'h:received:'} =~ /from spam.co.jp/) {
	return 'from a host in spam blacklist';
    }

    if ($e{'Body'} =~ /http-equiv=3DContent-Type/) {
	return 'mail with appended HTML documents';
    }

   local($domain) = (split(/@/, $From_address))[1];
   if ($e{'h:message-id:'} !~ /$domain/i) {
	return 'Message-Id conflicts your From: address';
   }

#;

2.8	rule combined with sendmail PICKY_HELO_CHECK

Example 2: When you configure PICKY_HELO_CHECK is on in config.h of
sendmail, reject mail when it has X-Authentication-Warning by
PICKY_HELO_CHECK. However this filter does not work well 
since virtual domain users matches them even if they are not spammers;D

    # PICKY_HELO_CHECK
    if ($e{'h:x-authentication-warning:'} !~ /Host \S+ claimed to be \S+/) {
	$r = "Your SMTP session or your host config is invalid";
    }

2.9	size limit for the mail header

fml checks the size of the whole mail not distinct each header and
body.

2.10	To: or Cc: should have $MAIL_LIST !

fml-support: 06389
fml-support: 07286

If To: or Cc: have not $MAIL_LIST, it is a spam !

$USE_DISTRIBUTE_FILTER = 1;
$DISTRIBUTE_FILTER_HOOK = q#
    if (($e{'h:to:'} !~ /$MAIL_LIST/i) && ($e{'h:cc:'} !~ /$MAIL_LIST/i)){
        return 'Not addressed to mailing list';
    }
#;

2.11	hook without using EnvelopeFilter

    # check attatchment of VB script and others :-)
    $START_HOOK .= q#
        if ($Envelope{'Body'} =~ /Content.*\.(vbs|js|jse|exe)/i ||
    	my($savefile) = "$FP_TMP_DIR/badarticle.$PCurrentTime";
    	if (open(SAVE, "> $savefile")) {
    	    print SAVE $Envelope{'Header'};
    	    print SAVE "\n";
    	    print SAVE $Envelope{'Body'};
    	    close(SAVE);
    	    $Envelope{'Body'}  = 'WARNING: incoming mail is ignored ';
    	    $Envelope{'Body'} .= 'since it may be with virus.';
    	    $Envelope{'Body'} .= "\n\n";
    	    $Envelope{'Body'} .= "This article is saved in\n";
    	    $Envelope{'Body'} .= $savefile;
    	    $Envelope{'Body'} .= "\n\n";
    	    $Envelope{'Body'} .= "-- $MAIL_LIST maintainer\n";
    	}
    	else {
    	    &Log("system error: pass this article through");
    	}
        }
    #;

2.12	Duplicated Message-ID for mails from Notes

It is impossible to solve this. Fix your MTA.

2.13	Reject Mails From Specific Domains

Before the main routine works, fml.pl can check and reject mails from
specific domains in $REJECT_ADDR_LIST ($DIR/spamlist in default).
Please write e.g.

   \S+@spam.org (reject spammers?, mails from spam.org)
   manager@\S+  (reject not personal addresses)

fml adjusts $REJECT_ADDR for rejection irrespective of this file.

2.14	ISP enforce to append a signature to your mail 

Such an ISP is terrible since it implies YOU HAVE NO SECRET, someone
can rewrite your mail.

To cut off the last signature.

$SMTP_OPEN_HOOK = q#
	local($uja);
	for (split(/\n/, $e{'Body'})) {
	   next if /=+/ .. /\-+/;
	   $uja .= "$_\n";
	}

	$e{'Body'} = $uja;
#;

3	Q & A

3.1	fml mis-judge "one line body" mail ?

In some cases the following mail looks one line mail body.
But this is Japanese paragraph case only.

References: fml-support: 08062

4	Internals

4.1	Filtering

	$USE_DISTRIBUTE_FILTER

enables the filtering for distribute mails. This filter checks
in-coming mails based on %Envelope data. You can set up your own hook
using %Envelope. 

	$USE_DISTRIBUTE_FILTER

loads the following filter rule in default.

	* reject null body
	* reject one line English words mail
	  e.g. "help", "unsubscribe"
	* reject invalid Message-Id mails (may be SPAM mails)
	* other strange syntaxes

   [options]

	$FILTER_ATTR_REJECT_COMMAND (default 0)

	reject '#unsubscribe' like commands

	$FILTER_ATTR_REJECT_2BYTES_COMMAND (default 0)

	* reject a line begining with Japanese 2-byte English Characters
	  e.g. 2-byte "unsubscribe"

You can use a hook to write your own more complicated filtering rules.
Attention: in this hook you can refer %Envelope as %e.

	$DISTRIBUTE_FILTER_HOOK (for post)

	$COMMAND_FILTER_HOOK (for command)

In this hook, please write in the following way.

	* if reject, write the code
		return 'the reason of rejection';
	  where this reason is logged in $LOGFILE.
	* if OK, 
		return '';	 

In default fml.pl does not notify the reason of rejection to the
sender (From: address) since no more information is good for security
in one sense. If you want to notify the rejection to the sender, set

	$FILTER_NOTIFY_REJECTION

fml-support: 08182
    }
    
      }
      }
    
    }	

4.3	Reject if a field matches reject patterns

	&DEFINE_FIELD_PAT_TO_REJECT(field-name, regexp, reason)

Define regular expression for a field. Please see an examle below.
XXX "reason" is not yet implemented now.

$START_HOOK = q#
   if ($USE_DISTRIBUTE_FILTER) {
        &EnvelopeFilter(*e, 'distribute');
        undef $DO_NOTHING;
    }
#;

5	ContentHandler

5.1	Filtering Rule Examples

* delete the html part of text/plain and text/html

  &ADD_CONTENT_HANDLER('multipart/.*', 'text/html', 'strip+notice');

* through text/plain
  delete the html part of text/plain + text/html
  delete other all

  &ADD_CONTENT_HANDLER('multipart/.*', 'text/plain', 'allow');
  &ADD_CONTENT_HANDLER('multipart/.*', 'text/html', 'strip+notice');
  &ADD_CONTENT_HANDLER('multipart/.*', '.*/.*', 'strip');

5.2	description by the original author 

Author: t-nakano@marimo.org

# [Example]
# add them at the last of config.ph (but before the last "1;").
#

&ADD_CONTENT_HANDLER('multipart/.*', 'text/plain',   'allow');
&ADD_CONTENT_HANDLER('multipart/.*', '.*/.*',        'reject');
&ADD_CONTENT_HANDLER('text/plain',   '.*/.*',        'allow');
&ADD_CONTENT_HANDLER('!MIME',        '.*/.*',        'allow');

&ADD_CONTENT_HANDLER(type, subtype, action);
	type		MIME type of whole mail
	subtype		content type of each block
	action		action if the type matches

  &ADD_CONTENT_HANDLER('multipart/.*', 'text/plain',   'allow');
  pass only text/plain block in MIME multipart mail

  &ADD_CONTENT_HANDLER('multipart/.*', '.*/.*',        'reject');
  reject any mail with MIME multipart format

Filtering based on each MIME entity block, Content-Type: field
in the mail

allow			permit distribution of this post
allow+multipart		permit the mail and leave the entity as it is

	allow+multipart pass through the block but allow disassembles
	the multipart

strip			strip this type entity block and distribute it
strip+notice		strip this type entity block (same as "strip")
			and also tell the sender "we strip the entity off".

reject			reject this whole mail if only one block matches
			the type

Filtering FAQ is

	http://www.ii.com/internet/faqs/launchers/mail/filtering-faq/

The plain text version is 

	ftp://rtfm.mit.edu/pub/usenet/news.answers/mail/filtering-faq

[procmail faq]

    Html and text version of pm-tips

	http://www.procmail.org/jari/pm-tips.html
	http://www.procmail.org/jari/pm-tips.txt

    Other procmail docuemts

        Era's exellent procmail pages (including procmail faq) are at:

	http://www.iki.fi/~era/procmail/links.html
	http://www.iki.fi/~era/procmail/mini-faq.html

6.2	MH slocal Interface

MH slocal is used like

	"|/usr/local/lib/mh/slocal -user username || exit 75"

	in ~/.forward 
	"|/usr/local/lib/mh/slocal -user username || exit 75"

Example: 
Mail with "To: username@domain (uja)"  is injected to fml.pl.
All others are saved in /var/mail/fukachan.
#field   pattern   action  result    string
To     uja  |    R    "/fml-DIR/fml.pl /fml-DIR /fml-DIR"

# drop to the personal mail-spool
default  -       >       ?      /var/mail/fukachan

7	hacks

7.1	Cut off except the first multipart block

	$HTML_MAIL_DEFAULT_HANDLER = 'strip'; (default "")

Cut off except the first multipart block if
1.	Content-Type: mime/multipart

	$HTML_MAIL_DEFAULT_HANDLER = 'strip'; (default strip)

where the value is 'strip' or 'reject'. If "strip", fml cuts off the
second and after multipart blocks and distributes the mail to ML. If
"reject", fml does not distribute it but tells "denies your html mail"
to the sender.

7.2	Against replied mails without In-Reply-To: nor References:

	$AGAINST_MAIL_WITHOUT_REFERENCE = 1; (default 0)

0. you require to set up 'fml puts the subject tag'.
1. add ML specific Message-ID: 
2. analyze the subject tag e.g. Subject: Re: [elena 00100]
In this mode, fml emulates Message-ID: always based on rule 2. 
Hence fml can ensure the consistency of ML threads against
some MUA's e.g. Eudora...

Appendix A	filtering of Postfix

Appendix A.1	body_checks directive

XXX: This feature is also available in Postfix snapshot 20000528.

Appendix A.2	header_checks directive

    [/etc/postfix/main.cf]
    header_checks = regexp:/etc/postfix/header_checks
    
    [/etc/postfix/header_checks]
    /^Subject.*ILOVEYOU/  REJECT

Appendix A.3	Content Filter

    Subject: Postfix Full Content Filtering Support
    To: postfix-users@postfix.org (Postfix users)
    Message-Id: <20000530222608.0EEB345659@spike.porcupine.org>
    Date: Tue, 30 May 2000 18:26:08 -0400 (EDT)
    From: wietse@porcupine.org (Wietse Venema)
    
          ..................................
          .            Postfix             .
       ------smtpd \                /local-----
          .         -cleanup->queue-       .
       -----pickup /    ^       |   \smtp------
          .             |       v          .
          .           smtpd    smtp        .
          .           10026     |          .
          ......................|...........
                        ^       |
                        |       v
                    ....|............
                    .   |     10025 .
                    .   inspector   .
                    .               .
                    .................
    
Appendix B	Sendmail specific topics

Appendix B.1	PerlMX

	http://www.ActiveState.com/Products/PerlMx/
    ActiveState releases PerlMx 1.0  
    
    PerlMx is a mail filter engine designed to operate under sendmail. 
    Beginning with v8.10, sendmail provides scriptable hooks into all 
    stages of an e-mail transaction, allowing custom actions on any 
    aspect of the transaction, including the content of the message 
    itself. PerlMx allows these hooks to be written in Perl, making for 
    rapid prototyping, debugging, and deployment of custom e-mail 
    filtering solutions.


		INDEX

.maildelivery                              ...   6.2 
$COMMAND_FILTER_HOOK                       ...   4.1 
&DEFINE_FIELD_PAT_TO_REJECT                ...   4.3 
$DISTRIBUTE_FILTER_HOOK                    ...   4.1 
$FILTER_ATTR_REJECT_2BYTES_COMMAND         ...   4.1 
$FILTER_ATTR_REJECT_COMMAND                ...   4.1 
$FILTER_NOTIFY_REJECTION                   ...   4.1 
$REJECT_ADDR                               ...   2.13 
$REJECT_ADDR_LIST                          ...   2.13 
$REJECT_DISTRIBUTE_FILTER_HOOK             ...   4.1 
signature                                  ...   2.14 
slocal                                     ...   6.2 
spam mails                                 ...   2.13 
$USE_DISTRIBUTE_FILTER                     ...   4.1 
