iamcal.com

RFC 822 Email Address Parser in PHP

Source code

Download

There is an online interactive demo if you'd just like to give it a try.

You can download the latest stable version (release 11) of the functions here.

The very latest versions are available from the GitHub repository.

Tests

The test suite shows parser's results, based on these test definitions. These are borrowed from Dominic Sayers who has a similar parser. We are still arguing over certain tests ;)

The RFCs

This library was named back when there was only one RFC for email addresses; there are now lots, so it would be better named RFC 822/2822/5322 at the least. These are the most relevant ones:

RFC 821 - Simple Mail Transfer Protocol (Errata)
The original SMTP RFC
RFC 822 - Standard for the Format of ARPA Internet Text Messages (Errata)
The original 'email' RFC
RFC 1035 - Domain names - implementation and specification (Errata)
The old domains RFC
RFC 1123 - Requirements for Internet Hosts - Application and Support (Errata)
An update to RFC 1035
RFC 2821 - Simple Mail Transfer Protocol (Errata)
SMTP contains some address limits not in RFC 2822
RFC 2822 - Internet Message Format (Errata)
Superceeds RFC 822
RFC 3696 - Application Techniques for Checking and Transformation of Names (Errata)
An informative RFC that clarifies some rules (and muddies others)
RFC 4291 - IP Version 6 Addressing Architecture (Errata)
Some useful details about the horrors of IPv6
RFC 5321 - Simple Mail Transfer Protocol (Errata)
Superceeds RFC 2821 (this is the latest SMTP RFC)
RFC 5322 - Internet Message Format (Errata)
Superceeds RFC 2822 (this is the latest email RFC)
RFC 5952 - A Recommendation for IPv6 Address Text Representation (Errata)
Superceeds RFC 4291 (this is the latest IPv6 RFC)

Reading the errata is pretty important, since some of the examples and even the EBNF are wrong in the original RFCs.

Copyright

By Cal Henderson <cal@iamcal.com>

This code is dual licensed:
Creative Commons Attribution-ShareAlike 2.5 License - http://creativecommons.org/licenses/by-sa/2.5/
GNU General Public License v3 - http://www.gnu.org/copyleft/gpl.html

If you require the code to be released under a different license, please contact the author.

Limitations

The code only verifies that the email address matches the various RFC specs. This does not mean it's a valid Internet email address! For an email address to be valid on the Internet, the domain part must be a valid domain name, be resolvable and have an MX. The code will identify the address "foo@bar.baz" as valid, even though we konw that there's no such domain as bar.baz. If you want to check that it's valid, fetching the MX for the domain is a good start. Connecting to the MX to verify it's a mail server is even better.

Extras

Tim Fletcher has translated the function to ruby and python: http://tfletcher.com/lib/.

A fullly unpacked version of the underlying regular expression can be seen here. It's huge.

It's been said that it's impossible to parse email addresses using regular expressions alone. This is somewhat true. If you allow comments in email addresses, then nested comments cannot be matched with a single regexp - a simple loop applying a reducing regexp first is needed. Aside from that, this library uses some post-match checks instead of rolling everything into one regexp. This is not because it wouldn't be possible, but because it would make it huge - the number of IPv6 permutations alone would probably double the size. Aside from the practicality, it seems entirely possible to boil it down to a single regexp. However, the one used for HTML5 is not even close :(