Over the past two weeks we have had reports from several companies that they can't deliver email to our employees or that delivery gets delayed. The common denominator seems to be that their email gets relayed through messagelabs.com and our Postfix MTA servers keep logging messages indicating that connections from that domain keep getting closed by the remote peer. (We've enabled debugging for their subnets)
Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: connect from mail1.bemta3.messagelabs.com[195.245.230.171] Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: mail1.bemta3.messagelabs.com ~? hash:/etc/postfix/network_table(0,lock|fold_fix) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db mail1.bemta3.messagelabs.com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db .bemta3.messagelabs.com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db .messagelabs.com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db .com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostaddr: 195.245.230.171 ~? hash:/etc/postfix/network_table(0,lock|fold_fix) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: dict_lookup: 195.245.230.171 = (notfound) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_list_match: mail1.bemta3.messagelabs.com: no match Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_list_match: 195.245.230.171: no match Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: send attr request = connect Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: send attr ident = smtp:195.245.230.171 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: vstream_fflush_some: fd 21 flush 44 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: private/anvil: wanted attribute: status Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: vstream_buf_get_ready: fd 21 got 25 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute name: status Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute value: 0 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: private/anvil: wanted attribute: count Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute name: count Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute value: 1 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: private/anvil: wanted attribute: rate Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute name: rate Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute value: 1 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: private/anvil: wanted attribute: (list terminator) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute name: (end) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: > mail1.bemta3.messagelabs.com[195.245.230.171]: 220 dmz-spamwall-01.dmz.oikt.net ESMTP Postfix Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: vstream_fflush_some: fd 9 flush 48 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: watchdog_pat: 0x83d8ba0 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: smtp_get: EOF Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: mail1.bemta3.messagelabs.com ~? hash:/etc/postfix/network_table(0,lock|fold_fix) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db mail1.bemta3.messagelabs.com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db .bemta3.messagelabs.com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db .messagelabs.com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostname: lookup hash:/etc/postfix/network_table.db .com: notfound Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_hostaddr: 195.245.230.171 ~? hash:/etc/postfix/network_table(0,lock|fold_fix) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: dict_lookup: 195.245.230.171 = (notfound) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_list_match: mail1.bemta3.messagelabs.com: no match Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: match_list_match: 195.245.230.171: no match Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: send attr request = disconnect Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: send attr ident = smtp:195.245.230.171 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: vstream_fflush_some: fd 21 flush 47 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: private/anvil: wanted attribute: status Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: vstream_buf_get_ready: fd 21 got 10 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute name: status Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute value: 0 Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: private/anvil: wanted attribute: (list terminator) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: input attribute name: (end) Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: lost connection after CONNECT from mail1.bemta3.messagelabs.com[195.245.230.171] Apr 8 13:21:11 dmz-spamwall-01 postfix/smtpd[7610]: disconnect from mail1.bemta3.messagelabs.com[195.245.230.171]
But wait... some packet sniffing reveals that the issue isn't as simple as a straight connect/disconnect (it never is)
13:08:35.444849 IP (tos 0x0, ttl 52, id 55359, offset 0, flags [DF], proto: TCP (6), length: 60) 195.245.230.171.34072 > 109.199.194.61.25: S, cksum 0x61ce (correct), 998480089:998480089(0) win 5840 <mss 1460,sackOK,timestamp 1514619110 0,nop,wscale 7> 0x0000: 0050 56b1 092d 0010 dbff 2000 0800 4500 .PV..-........E. 0x0010: 003c d83f 4000 3406 93d6 c3f5 e6ab 6dc7 .<.?@.4.......m. 0x0020: c23d 8518 0019 3b83 98d9 0000 0000 a002 .=....;......... 0x0030: 16d0 61ce 0000 0204 05b4 0402 080a 5a47 ..a...........ZG 0x0040: 40e6 0000 0000 0103 0307 @......... 13:08:35.444892 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: TCP (6), length: 60) 109.199.194.61.25 > 195.245.230.171.34072: S, cksum 0x69b0 (correct), 2031622044:2031622044(0) ack 998480090 win 5792 <mss 1460,sackOK,timestamp 10249964 1514619110,nop,wscale 7> 0x0000: 0010 dbff 2000 0050 56b1 092d 0800 4500 .......PV..-..E. 0x0010: 003c 0000 4000 4006 6016 6dc7 c23d c3f5 .<..@.@.`.m..=.. 0x0020: e6ab 0019 8518 7918 179c 3b83 98da a012 ......y...;..... 0x0030: 16a0 69b0 0000 0204 05b4 0402 080a 009c ..i............. 0x0040: 66ec 5a47 40e6 0103 0307 f.ZG@..... 13:08:35.482475 IP (tos 0x0, ttl 52, id 55360, offset 0, flags [DF], proto: TCP (6), length: 52) 195.245.230.171.34072 > 109.199.194.61.25: ., cksum 0xaec9 (correct), 1:1(0) ack 1 win 46 <nop,nop,timestamp 1514619147 10249964> 0x0000: 0050 56b1 092d 0010 dbff 2000 0800 4500 .PV..-........E. 0x0010: 0034 d840 4000 3406 93dd c3f5 e6ab 6dc7 .4.@@.4.......m. 0x0020: c23d 8518 0019 3b83 98da 7918 179d 8010 .=....;...y..... 0x0030: 002e aec9 0000 0101 080a 5a47 410b 009c ..........ZGA... 0x0040: 66ec f. 13:13:35.476381 IP (tos 0x0, ttl 52, id 55361, offset 0, flags [DF], proto: TCP (6), length: 52) 195.245.230.171.34072 > 109.199.194.61.25: F, cksum 0x1ae6 (correct), 1:1(0) ack 1 win 46 <nop,nop,timestamp 1514919145 10249964> 0x0000: 0050 56b1 092d 0010 dbff 2000 0800 4500 .PV..-........E. 0x0010: 0034 d841 4000 3406 93dc c3f5 e6ab 6dc7 .4.A@.4.......m. 0x0020: c23d 8518 0019 3b83 98da 7918 179d 8011 .=....;...y..... 0x0030: 002e 1ae6 0000 0101 080a 5a4b d4e9 009c ..........ZK.... 0x0040: 66ec f. 13:13:35.476747 IP (tos 0x0, ttl 64, id 28812, offset 0, flags [DF], proto: TCP (6), length: 52) 109.199.194.61.25 > 195.245.230.171.34072: ., cksum 0x86e1 (correct), 1:1(0) ack 2 win 46 <nop,nop,timestamp 10549996 1514919145> 0x0000: 0010 dbff 2000 0050 56b1 092d 0800 4500 .......PV..-..E. 0x0010: 0034 708c 4000 4006 ef91 6dc7 c23d c3f5 .4p.@.@...m..=.. 0x0020: e6ab 0019 8518 7918 179d 3b83 98db 8010 ......y...;..... 0x0030: 002e 86e1 0000 0101 080a 00a0 faec 5a4b ..............ZK 0x0040: d4e9 .. 13:21:11.528108 IP (tos 0x0, ttl 64, id 28813, offset 0, flags [DF], proto: TCP (6), length: 100) 109.199.194.61.25 > 195.245.230.171.34072: P, cksum 0xa94c (correct), 1:49(48) ack 2 win 46 <nop,nop,timestamp 11006047 1514919145> 0x0000: 0010 dbff 2000 0050 56b1 092d 0800 4500 .......PV..-..E. 0x0010: 0064 708d 4000 4006 ef60 6dc7 c23d c3f5 .dp.@.@..`m..=.. 0x0020: e6ab 0019 8518 7918 179d 3b83 98db 8018 ......y...;..... 0x0030: 002e a94c 0000 0101 080a 00a7 f05f 5a4b ...L........._ZK 0x0040: d4e9 3232 3020 646d 7a2d 7370 616d 7761 ..220.dmz-spamwa 0x0050: 6c6c 2d30 312e 646d 7a2e 6f69 6b74 2e6e ll-01.dmz.oikt.n 0x0060: 6574 2045 534d 5450 2050 6f73 7466 6978 et.ESMTP.Postfix 0x0070: 0d0a .. 13:21:11.528973 IP (tos 0x0, ttl 64, id 28814, offset 0, flags [DF], proto: TCP (6), length: 52) 109.199.194.61.25 > 195.245.230.171.34072: F, cksum 0x9136 (correct), 49:49(0) ack 2 win 46 <nop,nop,timestamp 11006047 1514919145> 0x0000: 0010 dbff 2000 0050 56b1 092d 0800 4500 .......PV..-..E. 0x0010: 0034 708e 4000 4006 ef8f 6dc7 c23d c3f5 .4p.@.@...m..=.. 0x0020: e6ab 0019 8518 7918 17cd 3b83 98db 8011 ......y...;..... 0x0030: 002e 9136 0000 0101 080a 00a7 f05f 5a4b ...6........._ZK 0x0040: d4e9 .. 13:21:11.565986 IP (tos 0x0, ttl 52, id 0, offset 0, flags [DF], proto: TCP (6), length: 40) 195.245.230.171.34072 > 109.199.194.61.25: R, cksum 0x7baa (correct), 998480091:998480091(0) win 0 0x0000: 0050 56b1 092d 0010 dbff 2000 0800 4500 .PV..-........E. 0x0010: 0028 0000 4000 3406 6c2a c3f5 e6ab 6dc7 .(..@.4.l*....m. 0x0020: c23d 8518 0019 3b83 98db 0000 0000 5004 .=....;.......P. 0x0030: 0000 7baa 0000 0000 0000 0000 ..{......... 13:21:11.566290 IP (tos 0x0, ttl 244, id 27369, offset 0, flags [DF], proto: TCP (6), length: 40) 195.245.230.171.34072 > 109.199.194.61.25: R, cksum 0xeab3 (correct), 2:2(0) ack 50 win 0 0x0000: 0050 56b1 092d 0010 dbff 2000 0800 4500 .PV..-........E. 0x0010: 0028 6ae9 4000 f406 4140 c3f5 e6ab 6dc7 .(j.@...A@....m. 0x0020: c23d 8518 0019 3b83 98db 7918 17ce 5014 .=....;...y...P. 0x0030: 0000 eab3 0000 0000 0000 0000 ............ 13:21:11.566324 IP (tos 0xc0, ttl 64, id 54773, offset 0, flags [none], proto: ICMP (1), length: 68) 109.199.194.61 > 195.245.230.171: ICMP host 109.199.194.61 unreachable - admin prohibited, length 48 IP (tos 0x0, ttl 244, id 27369, offset 0, flags [DF], proto: TCP (6), length: 40) 195.245.230.171.34072 > 109.199.194.61.25: R, cksum 0xeab3 (correct), 2:2(0) ack 50 win 0 0x0000: 0010 dbff 2000 0050 56b1 092d 0800 45c0 .......PV..-..E. 0x0010: 0044 d5f5 0000 4001 c95d 6dc7 c23d c3f5 .D....@..]m..=.. 0x0020: e6ab 030a d7b6 0000 0000 4500 0028 6ae9 ..........E..(j. 0x0030: 4000 f406 4140 c3f5 e6ab 6dc7 c23d 8518 @...A@....m..=.. 0x0040: 0019 3b83 98db 7918 17ce 5014 0000 eab3 ..;...y...P..... 0x0050: 0000 ..
What, it seems that Postfix waits about 13 minutes from the TCP connection is established until it sends the SMTP greeting? Why would it do this? One common culprit is DNS lookups.
$ dig -x 195.245.230.171 ; <<>> DiG 9.3.6-P1-RedHat-9.3.6-20.P1.el5_8.6 <<>> -x 195.245.230.171 ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20151 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 6, ADDITIONAL: 0 ;; QUESTION SECTION: ;171.230.245.195.in-addr.arpa. IN PTR ;; ANSWER SECTION: 171.230.245.195.in-addr.arpa. 5886 IN PTR mail1.bemta3.messagelabs.com. ;; AUTHORITY SECTION: in-addr.arpa. 161402 IN NS b.in-addr-servers.arpa. in-addr.arpa. 161402 IN NS c.in-addr-servers.arpa. in-addr.arpa. 161402 IN NS d.in-addr-servers.arpa. in-addr.arpa. 161402 IN NS e.in-addr-servers.arpa. in-addr.arpa. 161402 IN NS f.in-addr-servers.arpa. in-addr.arpa. 161402 IN NS a.in-addr-servers.arpa. ;; Query time: 0 msec ;; SERVER: 109.199.194.66#53(109.199.194.66) ;; WHEN: Tue Apr 8 13:30:29 2014 ;; MSG SIZE rcvd: 200
...then it's usually a good idea to do a forward lookup to see if it matches, but this is where it gets interesting:
$ dig mail1.bemta3.messagelabs.com. ;; Truncated, retrying in TCP mode. ; <<>> DiG 9.3.6-P1-RedHat-9.3.6-20.P1.el5_8.6 <<>> mail1.bemta3.messagelabs.com. ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51251 ;; flags: qr rd ra; QUERY: 1, ANSWER: 21, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;mail1.bemta3.messagelabs.com. IN A ;; ANSWER SECTION: mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.169 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.170 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.171 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.172 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.173 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.174 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.175 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.176 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.177 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.178 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.179 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.180 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.34 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.161 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.162 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.163 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.164 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.165 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.166 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.167 mail1.bemta3.messagelabs.com. 13074 IN A 195.245.230.168 ;; Query time: 1 msec;; SERVER: 81.175.0.66#53(81.175.0.66) ;; WHEN: Tue Apr 8 13:31:31 2014 ;; MSG SIZE rcvd: 382
The UDP response gets truncated? This reminds me of when we were setting up our Active Directory environment to work with our Juniper SRX firewalls. As it turned out, Juniper SRX has a problem with large DNS responses (dubbed eDNS if I'm not mistaken) and this issue had to be worked around by defining out own application definitions to replace the built-in "junos-dns-tcp" and "junos-dns-udp".
Editing the relevant firewall policies to use these homegrown application definitions made communication with messagelabs.com kick into overdrive and over the next few minutes we received several hundred emails that had apparently been queued up.
set applications application dns-tcp application-protocol ignore set applications application dns-tcp protocol tcp set applications application dns-tcp destination-port 53 set applications application dns-udp application-protocol ignore set applications application dns-udp protocol udp set applications application dns-udp destination-port 53
I'm sharing this in case someone else bumps into the same issue.