iamcal.com

home | book | software | code | articles | public SVN

RFC (2)822 & 3696 Email Address Parser in PHP

Warning! Releases 4 and older contained a bug in the RFC2882 atext rule, which allowed periods in the wrong places. Upgrade to release 5 or later to fix this issue.

Source code

Tests

The test suite shows results for each parser, based on these test definitions. These are borrowed from Dominic Sayers who has a similar parser. We are still arguing over certain tests ;)

Download

You can download the latest version (release 10) of the functions here.

The very latest versions are available from the SVN repository.

The RFCs

Email address formats are covered by several RFCs:

RFC 821 - Simple Mail Transfer Protocol (Errata)
The original SMTP RFC
RFC 822 - Standard for the Format of ARPA Internet Text Messages (Errata)
The original 'email' RFC
RFC 1035 - Domain names - implementation and specification (Errata)
The old domains RFC
RFC 1123 - Requirements for Internet Hosts - Application and Support (Errata)
An update to RFC 1035
RFC 2821 - Simple Mail Transfer Protocol (Errata)
SMTP contains some address limits not in RFC 2822
RFC 2822 - Internet Message Format (Errata)
Superceeds RFC 822
RFC 3696 - Application Techniques for Checking and Transformation of Names (Errata)
An informative RFC that clarifies some rules (and muddies others)
RFC 4291 - IP Version 6 Addressing Architecture (Errata)
Some useful details about the horrors of IPv6
RFC 5321 - Simple Mail Transfer Protocol (Errata)
Superceeds RFC 2821 (this is the latest SMTP RFC)
RFC 5322 - Internet Message Format (Errata)
Superceeds RFC 2822 (this is the latest email RFC)

Reading the errata is pretty important, since some of the examples and even the EBNF are wrong in the original RFCs.

Copyright

By Cal Henderson <cal@iamcal.com>

This code is dual licensed:
Creative Commons Attribution-ShareAlike 2.5 License - http://creativecommons.org/licenses/by-sa/2.5/
GNU General Public License v3 - http://www.gnu.org/copyleft/gpl.html

If you require the code to be released under a different license, please contact the author.

Limitations

The code only verifies that the email address matches the RFC spec. This does not mean it's a valid Internet email address! For an email address to be valid on the Internet, the domain part must be a valid domain name, be resolvable and have an MX. The code will identify the address "foo@bar.baz" as valid, even though we konw that there's no such domain as bar.baz. If you want to check that it's valid, fetching the MX for the domain is a good start. Connecting to the MX to verify it's a mail server is even better.

Extras

Tim Fletcher has translated the function to ruby and python: http://tfletcher.com/lib/.

A full rolled-up version of the RFC 2882 regexp can be seen here.