Yahoo Groups archive

Milter-greylist

Index last updated: 2026-04-28 23:32 UTC

Thread

My ultimate anti-spam setup (for now...)

My ultimate anti-spam setup (for now...)

2006-12-21 by reschauzier

After starting the anti-spam war a couple a months ago, I can now
safely say that I won the first battle (with the war of course still
going on...). Since my setup, revolving for a great part around
milter-greylist, is both simple and effective I wanted to share it
with my brothers in arms around the world.

First, the results. On my modest-scale mail server, I used to receive
about 400 - 500 spam messages per day (!). Spamassassin, with the
proper Bayes training, will have 5 - 10 of these come through. This
may not sound like much, but it can be very annoying, especially for
quiet accounts, which quickly fill up with unwanted messages.

The full setup with milter-greylist yet has to pass its first spam
message in the past week. And what's more, none of the legitimate
email messages have been delayed or otherwised touched during this
time, which was a prerequisite. Delays on the order of hours on normal
messages are not acceptable to my users.

The setup:

Sendmail:
dnsrbl based on zen.spamhaus.com
milter-greylist (dynamic IPs only)

Content filtering:
MailScanner w/ Spamassassin

Very simple as you can see. The real trick is to greylist _only_
messages from dynamic IP addresses. This accounts for >90% of the spam
delivered to my server. No need to include static IPs in the
greylisting, as it turns out spam from static IPs are usually caught
quite effectively by the zen.spamhaus.com dnsrbl anyway.

Note that most mailers on the internet will use static IP addresses,
so 99.9% of the legitimate email is passed without delay.

The configuration detects dynamic IP addresses by inspection of the
reverse DNS entry of the mailer, in combination with the DUHL-SORBS
blackhole list of dynamic IP addresses. This combination is needed, as
DUHL-SORBS by itself seems to miss quite some IPs.

In order to investigate the name of the connecting mailer, I use the
client name that Sendmail passes to milter-greylist. This is the
reverse DNS resolution of the connecting IP name. It turns out that if
the IP address is from a dynamic pool (cable, dial-up or dsl) this
will usually clearly show in the name. Tell tail signs are the use of
the words cable, dial, dsl, etc. in the rDNS name, but also IP number
combinations such as 39-185-34-2 or 45.67.231.51 and twelve digit
numbers (decimal IP address without separators).

The combination of DUHL-SORBS and rDNS matching is amazingly
effective. Almost no dynamic addresses go unidentified, with very
little false positive. And even if a false positive occurs, the
corresponding message still comes through, allbeit with a 1 hour delay.

No need to say my users are very happy, as am I, being able to take
some shore leave ;)

See below for my /etc/grey.conf file:

######################################################################
#
# Greylisting config file
#

# Do not tell spammer how long they have to wait
quiet

# Greylisting your own MTA is a very bad idea: never
# comment this line, except for testing purposes.
acl whitelist addr 127.0.0.0/8

# If you use IPv6, uncomment this.
#acl whitelist addr ::1/128

# You will want to avoid greylisting your own clients
# as well, by filtering out your IP address blocks.
acl whitelist addr 192.168.1.0/24

# Use extended regular expressions instead of basic
# regular expressions.
extendedregex

# It is also possible to whitelist sender
# machines using their DNS names.
list "dynamic" domain {		\
	/dsl/			\
	/DSL/			\
	/dhcp/			\
	/DHCP/			\
	/cable/			\
	/CABLE/			\
	/dial/			\
	/DIAL/			\
	/pool/			\
	/POOL/			\
	/dyn/			\
	/DYN/			\
	/ppp/			\
	/PPP/			\
	/catv/			\
	/CATV/			\
	/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/	\
	/[0-9]{12,}/		\
}
acl greylist list "dynamic"

# Use dnsrbl for greylisting
dnsrbl "SORBS DUN" dnsbl.sorbs.net 127.0.0.10
acl greylist dnsrbl "SORBS DUN"

# How often should we dump to the dumpfile (0: on each change, -1: never).
dumpfreq 1d

#
# All of the following options have command-line equivalents.
# See greylist.conf(5) for the exact equivalences.
#

# How long a client has to wait before we accept
# the messages it retries to send. Here, 1 hour.
# May be overridden by the "-w greylist_delay" command line argument.
greylist 1h

# How long does auto-whitelisting last (set it to 0
# to disable auto-whitelisting). Here, 3 days.
# May be overridden by the "-a autowhite_delay" command line argument.
autowhite 30d

# Specify the netmask to be used when checking IPv4 addresses
# in the greylist.
# May be overridden by the "-L cidrmask" command line argument.
subnetmatch /24

# Specify the netmask to be used when checking IPv6 addresses
# in the greylist.
# May be overridden by the "-M prefixlen" command line argument.
#subnetmatch6 /64

# You can specify a file where milter-greylist will
# store its PID.
# May be overridden by the "-P pidfile" command line argument.
#pidfile "/var/run/milter-greylist.pid"

# You can specify the socket file used to communicate
# with sendmail.
# May be overridden by the "-p socket" command line argument.
socket "/var/run/milter-greylist/milter-greylist.sock"

# The dumpfile location.
# May be overridden by the "-d dumpfile" command line argument.
#dumpfile "/var/lib/milter-greylist/db/greylist.db"

# The user the milter should run as.
# May be overridden by the "-u username" command line argument.
user "grmilter"

# Make sure we allow anybody else
acl whitelist default
######################################################################

Re: My ultimate anti-spam setup (for now...)

2006-12-21 by robert_schmidli

> 	/CATV/			\
> 	/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/	\
> 	/[0-9]{12,}/		\
> }
> acl greylist list "dynamic"

Thanks.  I'll be interested to hear if you change anything.  One
problem - I get the following error message:

bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+": Invalid
back reference.

I've had to delete the offending line.

Re: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-21 by Matt Kettler

robert_schmidli wrote:
>> 	/CATV/			\
>> 	/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/	\
>> 	/[0-9]{12,}/		\
>> }
>> acl greylist list "dynamic"
> 
> Thanks.  I'll be interested to hear if you change anything.  One
> problem - I get the following error message:
> 
> bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+": Invalid
> back reference.
> 
> I've had to delete the offending line.

Hmm, at casual glance it should be valid.

That said, the the characters matched are really likely to be . or -, so it
seems inefficient to use back references there anyway.

Personally I'd use one of these instead:

/[0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._]/

/[0-9]+[-._][0-9]+[-._][0-9]+[-._][0-9]+[-._]/

The primary difference being the first one won't match sequences involving more
than three numbers at a time, but the second one will.
(ie: 123-123-123-1234.example.com will not match the first one, but will match
the second.)

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by Jake Di Toro

On Thu, Dec 21, 2006 at 03:35:25PM -0500, Matt Kettler wrote:
> That said, the the characters matched are really likely to be . or -, so it
> seems inefficient to use back references there anyway.
> 
> Personally I'd use one of these instead:
> 
> /[0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._]/
> 
> /[0-9]+[-._][0-9]+[-._][0-9]+[-._][0-9]+[-._]/
> 
> The primary difference being the first one won't match sequences involving more
> than three numbers at a time, but the second one will.
> (ie: 123-123-123-1234.example.com will not match the first one, but will match
> the second.)

I think the theroy behind the use of backreferences is, the
backrefrence will only pick up the following:

123.132.123.123
123-123-123-123
123_123_123_123

where yours would pickup

123.123-123_123

which, while unlikely to occur, if it did, probably wouldn't be
desired.  Though then again you might want to as well.

-- 
Till Later,
Jake <karrde@...>
http://www.viluppo.net/

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by reschauzier

--- In milter-greylist@yahoogroups.com, "robert_schmidli"
<robert_s@...> wrote:

> problem - I get the following error message:
> 
> bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+": Invalid
> back reference.
> 
> I've had to delete the offending line.
>

What version of milter-greylist are you running?

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by reschauzier

--- In milter-greylist@yahoogroups.com, Matt Kettler <mkettler@...> wrote:
>
> robert_schmidli wrote:
> >> 	/CATV/			\
> >> 	/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/	\
> >> 	/[0-9]{12,}/		\
> >> }
> >> acl greylist list "dynamic"
> > 
> > Thanks.  I'll be interested to hear if you change anything.  One
> > problem - I get the following error message:
> > 
> > bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+": Invalid
> > back reference.
> > 
> > I've had to delete the offending line.
> 
> Hmm, at casual glance it should be valid.
> 
> That said, the the characters matched are really likely to be . or
-, so it
> seems inefficient to use back references there anyway.
> 
> Personally I'd use one of these instead:
> 
> /[0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._]/
> 
> /[0-9]+[-._][0-9]+[-._][0-9]+[-._][0-9]+[-._]/
> 
> The primary difference being the first one won't match sequences
involving more
> than three numbers at a time, but the second one will.
> (ie: 123-123-123-1234.example.com will not match the first one, but
will match
> the second.)
>

The reason for the backreferences is names like 123x45x67x89, which do
occur. At the same time, I don't want to match 123a45b67c89 for
example. The regex as is seems to be very good at finding dynamic IPs,
while giving very little false positives.

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by robert_schmidli

> > bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+": Invalid
> > back reference.
> 
> What version of milter-greylist are you running?
>

I'm running 3.0 on gentoo linux.

Re: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-22 by Emmanuel Dreyfus

On Fri, Dec 22, 2006 at 08:31:56AM -0000, reschauzier wrote:
> > bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+": Invalid
> > back reference.
> > I've had to delete the offending line.
> What version of milter-greylist are you running?

regex code has not changed since 2.0 beta3: 21 months ago.

-- 
Emmanuel Dreyfus
manu@...

Re: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-22 by Oliver Fromme

reschauzier wrote:
 > robert_schmidli wrote:
 > > problem - I get the following error message:
 > >  
 > > bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+":
 > > Invalid back reference.
 > >  
 > > I've had to delete the offending line.
 > 
 > What version of milter-greylist are you running?

I think the correct question to ask would be: What OS are
you running?  :-)

Milter-greylist uses the regex functions from the libc of
the OS.  So if those functions are broken (or use a syntax
different from other systems), then that will also affect
milter-greylist's usage of regular expressions.

To the OP, I suggest that you have a look at the manual
page of the regex functions of your OS (regcomp(3) etc.),
and look for hints regarding back references.  For example,
back-references aren't supported at all on Solaris, as far
as I can tell.  However, your error message indicates that
they _are_ supported for you, but the syntax seem to be
different.

One thing you could try is to enclose the parenthesized
sub-expression in backslashed parentheses instead, i.e.
"[0-9]+\([^0-9]\)[0-9]...".  Some regular expression
parsers require it that way.

By the way, someone suggested [-._] for the separator.
Note that underscores are illegal in domain names, so
simply [-.] will suffice.  Mail from domains containing
underscores should be rejected or blacklisted right away;
there's no need to greylist them, because they certainly
don't come from legal mail servers.

It might also be an option to make the whole separator
optional, i.e. "[-.]?" so it will match in those cases
where the IP is encoded as a single stream of digits.
(Of course you can use a second regular expression for
that, but the less of them the better.  The "?" matching
costs nearly nothing in this case, performance-wise.)

Furthermore, a lot of internet access providers use the
hexadecimal IP (instead of decimal) for the reverse look-
up of their pools.  So it's probably a good idea to add
a similar regular expression that matches IP addresses
expressed in hexadecimal, for example:

[0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}

(Note that milter-greylist uses case-insensitive matches,
so it's not necessary to say "[0-9A-Za-z]".)

That will match things like 5f-8b-23-cd.dsl.example.com,
5f8b23cd.cable.foo.net, and even 5f-8b.23-cd.dyn.bar.org
or 5f8b-23cd.pool.baz.biz.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"With sufficient thrust, pigs fly just fine.  However, this
is not necessarily a good idea.  It is hard to be sure where
they are going to land, and it could be dangerous sitting
under them as they fly overhead." -- RFC 1925

RE: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-22 by attila.bruncsak@itu.int

> > 	/CATV/			\
> > 	/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/	\
> > 	/[0-9]{12,}/		\
> > }
> > acl greylist list "dynamic"
> 
> Thanks.  I'll be interested to hear if you change anything.  One
> problem - I get the following error message:
> 
> bad regular expression "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+": Invalid
> back reference.
> 
> I've had to delete the offending line.
> 
That might be something to do with the local regexp library, since
I am sure you did not forget the switch on the extendedregex option.

Bests,
Attila

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by reschauzier

--- In milter-greylist@yahoogroups.com, <attila.bruncsak@...> wrote:

> That might be something to do with the local regexp library, since
> I am sure you did not forget the switch on the extendedregex option.

Make sure you put the extendedregex line _before_ the lines defining
the regular expressions. It turns out the extendedregex keyword is
location sensitive, as I found out the hard way :)

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by robert_schmidli

> That might be something to do with the local regexp library, since
> I am sure you did not forget the switch on the extendedregex option.

Duh.  I didn't have extendedregex on.  I have got these regexes now:

/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/
/[0-9a-f]{2,2}([-.])?[0-9a-f]{2,2}\1?[0-9a-f]{2,2}\1?[0-9a-f]{2,2}/
/[0-9]{12,}/

I don't quite know how to test these, except that milter-greylist now
starts with no problems.  Perhaps somebody could eyeball them for me
and let me know if there's an obvious problem.  From the command line,
the following works:

$ egrep
"[0-9a-f]{2,2}([-.])?[0-9a-f]{2,2}\1?[0-9a-f]{2,2}\1?[0-9a-f]{2,2}"
testregex.txt
5f-8b-23-cd
253928028762
25392802876
2539280287


..where testregex.txt contains a number of strings.  It also picks up
9 and 10-digit strings with no separators - which may not be correct.
 I suspect that the third regex listed above is redundant.  I've
modified Oliver's suggestion - from what people are saying, it
sounds like backreferences are the way to go.

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by reschauzier

--- In milter-greylist@yahoogroups.com, Oliver Fromme <olli@...> wrote:

>  > 
>  > What version of milter-greylist are you running?
> 
> I think the correct question to ask would be: What OS are
> you running?  :-)
> 
> Milter-greylist uses the regex functions from the libc of
> the OS.  So if those functions are broken (or use a syntax
> different from other systems), then that will also affect
> milter-greylist's usage of regular expressions.

Good point, thank you!

> 
> Furthermore, a lot of internet access providers use the
> hexadecimal IP (instead of decimal) for the reverse look-
> up of their pools.  So it's probably a good idea to add
> a similar regular expression that matches IP addresses
> expressed in hexadecimal, for example:
> 
> [0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}
> 
> (Note that milter-greylist uses case-insensitive matches,
> so it's not necessary to say "[0-9A-Za-z]".)
> 
> That will match things like 5f-8b-23-cd.dsl.example.com,
> 5f8b23cd.cable.foo.net, and even 5f-8b.23-cd.dyn.bar.org
> or 5f8b-23cd.pool.baz.biz.

Unfortunately, it will also match web307045.mail.mud.yahoo.com, which
is a very valid mailer. In order to reliably detect hex addresses
without separators, you'd need two passes of regexes: the first to
identify a string of 8 hex numbers, and then a second one to make sure
there is at least one non-decimal number in that string. I don't think
this is possible with milter-greylist at this time.

The good news is that this combination is not very common, and leaving
it out of the regex does not significantly affect the hit rate.

Re: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-22 by Oliver Fromme

reschauzier wrote:
 > Oliver Fromme wrote:
 > > Furthermore, a lot of internet access providers use the
 > > hexadecimal IP (instead of decimal) for the reverse look-
 > > up of their pools.  So it's probably a good idea to add
 > > a similar regular expression that matches IP addresses
 > > expressed in hexadecimal, for example:
 > > 
 > > [0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}[-.]?[0-9a-f]{2,2}
 > > 
 > > (Note that milter-greylist uses case-insensitive matches,
 > > so it's not necessary to say "[0-9A-Za-z]".)
 > > 
 > > That will match things like 5f-8b-23-cd.dsl.example.com,
 > > 5f8b23cd.cable.foo.net, and even 5f-8b.23-cd.dyn.bar.org
 > > or 5f8b-23cd.pool.baz.biz.
 > 
 > Unfortunately, it will also match web307045.mail.mud.yahoo.com, which
 > is a very valid mailer.

Is it?

$ host web307045.mail.mud.yahoo.com
Host web307045.mail.mud.yahoo.com not found: 3(NXDOMAIN)

I'm also not too worried about greylisting Yahoo, but
that's another story.  :-)

 > In order to reliably detect hex addresses
 > without separators, you'd need two passes of regexes: the first to
 > identify a string of 8 hex numbers, and then a second one to make sure
 > there is at least one non-decimal number in that string.

I don't quite understand what you mean, could you please
explain?  What do you mean, "at least one non-decimal
number", and how does it apply to your example "web307045"?

I think it would make sense to whitelist hosts that contain
the word "mail" somewhere in the name, e.g. /mail.*\..*/.
That whitelist entry should be placed before the greylist
entry for decimal/hexadecimal matching of dynamic address
pools, so it is checked first for a match.  It would help
in the case of "web307045.mail.mud.yahoo.com".  Usually
the names of dynamic address pools don't contain the word
"mail".

 > I don't think
 > this is possible with milter-greylist at this time.

I think a lot of things are possible with milter-greylist.
:-)

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

PI:
int f[9814],b,c=9814,g,i;long a=1e4,d,e,h;
main(){for(;b=c,c-=14;i=printf("%04d",e+d/a),e=d%a)
while(g=--b*2)d=h*b+a*(i?f[b]:a/5),h=d/--g,f[b]=d%g;}

Re: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-22 by Matt Kettler

Jake Di Toro wrote:
> On Thu, Dec 21, 2006 at 03:35:25PM -0500, Matt Kettler wrote:
>> That said, the the characters matched are really likely to be . or -, so it
>> seems inefficient to use back references there anyway.
>>
>> Personally I'd use one of these instead:
>>
>> /[0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._][0-9]{1,3}[-._]/
>>
>> /[0-9]+[-._][0-9]+[-._][0-9]+[-._][0-9]+[-._]/
>>
>> The primary difference being the first one won't match sequences involving more
>> than three numbers at a time, but the second one will.
>> (ie: 123-123-123-1234.example.com will not match the first one, but will match
>> the second.)
> 
> I think the theroy behind the use of backreferences is, the
> backrefrence will only pick up the following:
> 
> 123.132.123.123
> 123-123-123-123
> 123_123_123_123
> 
> where yours would pickup
> 
> 123.123-123_123
> 
> which, while unlikely to occur, if it did, probably wouldn't be
> desired.  Though then again you might want to as well.
> 

True, the back reference will also pick up:

1a4443K23_1

Which may or may not be desirable.

IMHO, the extra overhead of using the back-references isn't worth it in this
application. But that's just my subjective opinion based on my own experiences
in spam analysis.

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by reschauzier

--- In milter-greylist@yahoogroups.com, Oliver Fromme <olli@...> wrote:

> 
>  > In order to reliably detect hex addresses
>  > without separators, you'd need two passes of regexes: the first to
>  > identify a string of 8 hex numbers, and then a second one to make
sure
>  > there is at least one non-decimal number in that string.
> 
> I don't quite understand what you mean, could you please
> explain?  What do you mean, "at least one non-decimal
> number", and how does it apply to your example "web307045"?

Let's clarify with an example. 123456 is a valid hex number. From the
looks of it, however, there is no way to tell whether this is just a 6
digit decimal number (which are quite common in mailer names,
especially for big mailer farms), or a true hex number. The only way
to tell a 6 digit hex number from a six digit dec number is when it
includes at least one non-decimal hex digit ([a-f]), eg. 123d56 or 1f3d12.

> I think it would make sense to whitelist hosts that contain
> the word "mail" somewhere in the name, e.g. /mail.*\..*/.
> That whitelist entry should be placed before the greylist
> entry for decimal/hexadecimal matching of dynamic address
> pools, so it is checked first for a match.  It would help
> in the case of "web307045.mail.mud.yahoo.com".  Usually
> the names of dynamic address pools don't contain the word
> "mail".

Ah, that is a very good idea; I think this will prevent some false
positives, eventhough I haven't seen many so far.

Re: My ultimate anti-spam setup (for now...)

2006-12-22 by reschauzier

--- In milter-greylist@yahoogroups.com, Matt Kettler <mkettler@...> wrote:

> 
> True, the back reference will also pick up:
> 
> 1a4443K23_1
> 
> Which may or may not be desirable.

I must be missing something here; how does

[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+

pick up

1a4443K23_1

?

(See also http://www.fileformat.info/tool/regex.htm:

Test  	Target String  	matches()  	replaceFirst()  	replaceAll()  
lookingAt()  	find()  	group(0)  	group(1)
1 	1a4443K23_1 	No 	1a4443K23_1 	1a4443K23_1 	No 	No)


>

Re: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-22 by Matt Kettler

reschauzier wrote:
> --- In milter-greylist@yahoogroups.com, Matt Kettler <mkettler@...> wrote:
> 
>> True, the back reference will also pick up:
>>
>> 1a4443K23_1
>>
>> Which may or may not be desirable.
> 
> I must be missing something here; how does
> 
> [0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+
> 
> pick up
> 
> 1a4443K23_1

Sorry, you're right. because of the back-reference, the non-digits must all be
the same. That said, they can be _anything_ that isn't a digit, and then numeric
sections can be _any_ length.

Better examples are:
1a4443a23a1
111111111111p2p3p4
2Q345245234Q2342313Q4

Re: [milter-greylist] Re: My ultimate anti-spam setup (for now...)

2006-12-22 by Fabien Tassin

According to reschauzier:
> > 
> > True, the back reference will also pick up:
> > 
> > 1a4443K23_1
> > 
> > Which may or may not be desirable.
> 
> I must be missing something here; how does
> 
> [0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+
> 
> pick up
> 
> 1a4443K23_1
> 
> ?

it cannot.
\1 refers to the pattern matched (so to 'a'), not to the pattern to match (not to '[^0-9]' so neither 'K' nor '_' could match).

e.g. with perl:

$ perl -e 'print "1a4443K23_1" =~ m/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/ ? "ok\n" : "nok\n";'
nok
$ perl -e 'print "1a4443a2a31" =~ m/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/ ? "ok\n" : "nok\n";'
ok

explained also by this :

$ perl -e 'use re "debug"; print "1a4443K23_1" =~ m/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/ ? "ok\n" : "nok\n";'
Compiling REx `[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+'
size 68 Got 548 bytes for offset annotations.
first at 2
   1: PLUS(13)
   2:   ANYOF[0-9](0)
  13: OPEN1(15)
  15:   ANYOF[\0-/:-\377{unicode_all}](26)
  26: CLOSE1(28)
  28: PLUS(40)
  29:   ANYOF[0-9](0)
  40: REF1(42)
  42: PLUS(54)
  43:   ANYOF[0-9](0)
  54: REF1(56)
  56: PLUS(68)
  57:   ANYOF[0-9](0)
  68: END(0)
stclass "ANYOF[0-9]" plus minlen 5 
Offsets: [68]
        6[1] 1[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 7[1] 0[0] 8[6] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 14[1] 0[0] 20[1] 15[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 21[2] 0[0] 28[1] 23[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 29[2] 0[0] 36[1] 31[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 37[0] 
Matching REx "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+" against "1a4443K23_1"
Matching stclass "ANYOF[0-9]" against "1a4443K"
  Setting an EVAL scope, savestack=3
   0 <> <1a4443K23_1>     |  1:  PLUS
                           ANYOF[0-9] can match 1 times out of 2147483647...
  Setting an EVAL scope, savestack=3
   1 <1> <a4443K23_1>     | 13:    OPEN1
   1 <1> <a4443K23_1>     | 15:    ANYOF[\0-/:-\377{unicode_all}]
   2 <1a> <4443K23_1>     | 26:    CLOSE1
   2 <1a> <4443K23_1>     | 28:    PLUS
                           ANYOF[0-9] can match 4 times out of 2147483647...
  Setting an EVAL scope, savestack=3
   6 <1a4443> <K23_1>     | 40:      REF1
                                failed...
   5 <1a444> <3K23_1>     | 40:      REF1
                                failed...
   4 <1a44> <43K23_1>     | 40:      REF1
                                failed...
   3 <1a4> <443K23_1>     | 40:      REF1
                                failed...
                              failed...
                            failed...
  Setting an EVAL scope, savestack=3
   2 <1a> <4443K23_1>     |  1:  PLUS
                           ANYOF[0-9] can match 4 times out of 2147483647...
  Setting an EVAL scope, savestack=3
   6 <1a4443> <K23_1>     | 13:    OPEN1
   6 <1a4443> <K23_1>     | 15:    ANYOF[\0-/:-\377{unicode_all}]
   7 <1a4443K> <23_1>     | 26:    CLOSE1
   7 <1a4443K> <23_1>     | 28:    PLUS
                           ANYOF[0-9] can match 2 times out of 2147483647...
  Setting an EVAL scope, savestack=3
   9 <1a4443K23> <_1>     | 40:      REF1
                                failed...
   8 <1a4443K2> <3_1>     | 40:      REF1
                                failed...
                              failed...
   5 <1a444> <3K23_1>     | 13:    OPEN1
   5 <1a444> <3K23_1>     | 15:    ANYOF[\0-/:-\377{unicode_all}]
                              failed...
   4 <1a44> <43K23_1>     | 13:    OPEN1
   4 <1a44> <43K23_1>     | 15:    ANYOF[\0-/:-\377{unicode_all}]
                              failed...
   3 <1a4> <443K23_1>     | 13:    OPEN1
   3 <1a4> <443K23_1>     | 15:    ANYOF[\0-/:-\377{unicode_all}]
                              failed...
                            failed...
Contradicts stclass...
Match failed
nok
Freeing REx: `"[0-9]+([^0-9])[0-9]+\\1[0-9]+\\1[0-9]+"'



while :


$ perl -e 'use re "debug"; print "1a4443a23a1" =~ m/[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+/ ? "ok\n" : "nok\n";'
Compiling REx `[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+'
size 68 Got 548 bytes for offset annotations.
first at 2
   1: PLUS(13)
   2:   ANYOF[0-9](0)
  13: OPEN1(15)
  15:   ANYOF[\0-/:-\377{unicode_all}](26)
  26: CLOSE1(28)
  28: PLUS(40)
  29:   ANYOF[0-9](0)
  40: REF1(42)
  42: PLUS(54)
  43:   ANYOF[0-9](0)
  54: REF1(56)
  56: PLUS(68)
  57:   ANYOF[0-9](0)
  68: END(0)
stclass "ANYOF[0-9]" plus minlen 5 
Offsets: [68]
        6[1] 1[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 7[1] 0[0] 8[6] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 14[1] 0[0] 20[1] 15[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 21[2] 0[0] 28[1] 23[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 29[2] 0[0] 36[1] 31[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 37[0] 
Matching REx "[0-9]+([^0-9])[0-9]+\1[0-9]+\1[0-9]+" against "1a4443a23a1"
Matching stclass "ANYOF[0-9]" against "1a4443a"
  Setting an EVAL scope, savestack=3
   0 <> <1a4443a23a1>     |  1:  PLUS
                           ANYOF[0-9] can match 1 times out of 2147483647...
  Setting an EVAL scope, savestack=3
   1 <1> <a4443a23a1>     | 13:    OPEN1
   1 <1> <a4443a23a1>     | 15:    ANYOF[\0-/:-\377{unicode_all}]
   2 <1a> <4443a23a1>     | 26:    CLOSE1
   2 <1a> <4443a23a1>     | 28:    PLUS
                           ANYOF[0-9] can match 4 times out of 2147483647...
  Setting an EVAL scope, savestack=3
   6 <1a4443> <a23a1>     | 40:      REF1
   7 <1a4443a> <23a1>     | 42:      PLUS
                           ANYOF[0-9] can match 2 times out of 2147483647...
  Setting an EVAL scope, savestack=3
   9 <1a4443a23> <a1>     | 54:        REF1
  10 <1a4443a23a> <1>     | 56:        PLUS
                           ANYOF[0-9] can match 1 times out of 2147483647...
  Setting an EVAL scope, savestack=3
  11 <1a4443a23a1> <>     | 68:          END
Match successful!
ok
Freeing REx: `"[0-9]+([^0-9])[0-9]+\\1[0-9]+\\1[0-9]+"'



/Fabien

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.