Validating emails is an apparently easy task that turns out to actually be quite difficult.
Early in your carreer as a rails developer you'll find yourself needing to validate email addresses on one of your models. Which is simple, right? It only takes one line.
validates :email, :format => /EMAIL_REGEX_GOES_HERE/
Now all you need to do is come up with a regex that allows all valid emails and rejects all invalid emails. Should me simple, right?
The first time I needed to validate an email address, I spent way too much time working on a regular expression. Eventually I came up with one that covered all the cases I though were valid and rejected the invalid ones I tested with. Feeling happy with my sexy regex, I deployed it, moved onto something else and never gave it another thought.
Then a few weeks later, I got an email from the client saying that one of their users couldn't create an account on the system. Turns out that emails can have all sorts of funky variations, some of which are part of the defined standard, some of which are not.
And so I learned something - there's something in the developer mindset that is bothered by the idea of people submitting invalid email addresses. The idea that my regex could allow some invalid cases through bothered me, and I enjoyed the puzzle of coming up with a regular expression that would validate my example email addresses.
But lets take a step back and think about the purpose of validating an email. The main thing you want to catch is the case when a user accidentally mistypes their email or doesn't realize that they need to provide an email (for example, they enter a name or username instead).
Your app is not the email police, sent out into the world to punish users who have invalid or unusual characters in their email address. You just want to gently nudge the user if they enter something that is obviously not an email address, and otherwise let them get on with it.
Rejecting a single users real email address is many times worse than allowing multiple invalid emails into your database. You want to minimize false positives and who cares about false negatives.
And to that end, you don't really need a complicated regular expression. You just need one that recognizes when something vaguely along the lines of something@somewhere.someplace. And that's enough. If ensuring that your user enters a valid email is valid is important to your app, then send them a confirmation email. That's the only way to ensure they've entered their valid email address.
So, getting back to the technical details. What regex should we use to validate our emails? Lets look at a few common examples that I collected(sources are in the table below). We'll evaluate these regexes against this list of valid and invalid emails.
Here's a quick 'n' dirty script to check a few email validation regular expressions.
| # email list sourced from http://codefool.tumblr.com/post/15288874550/list-of-valid-and-invalid-email-addresses | |
| VALID_EMAILS = [ | |
| 'email@example.com', | |
| 'firstname.lastname@example.com', | |
| 'email@subdomain.example.com', | |
| 'firstname+lastname@example.com', | |
| 'email@123.123.123.123', | |
| 'email@[123.123.123.123]', | |
| '"email"@example.com', | |
| '1234567890@example.com', | |
| 'email@example-one.com', | |
| '_______@example.com', | |
| 'email@example.name', | |
| 'email@example.museum', | |
| 'email@example.co.jp', | |
| 'firstname-lastname@example.com', | |
| 'much."more\ unusual"@example.com', | |
| 'very.unusual.”@”.unusual.com@example.com', | |
| 'very.”(),:;<>[]”.VERY.”very@\\ "very”.unusual@strange.example.com' | |
| ] | |
| INVALID_EMAILS = [ | |
| 'plainaddress', | |
| '#@%^%#$@#$@#.com', | |
| '@example.com', | |
| 'Joe Smith <email@example.com>', | |
| 'email.example.com', | |
| 'email@example@example.com', | |
| '.email@example.com', | |
| 'email.@example.com', | |
| 'email..email@example.com', | |
| 'あいうえお@example.com', | |
| 'email@example.com (Joe Smith)', | |
| 'email@example', | |
| 'email@-example.com', | |
| 'email@example.web', | |
| 'email@111.222.333.44444', | |
| 'email@example..com', | |
| 'Abc..123@example.com', | |
| '”(),:;<>[\]@example.com', | |
| 'just”not”right@example.com', | |
| 'this\ is"really"not\allowed@example.com' | |
| ] | |
| REGEXES = [ | |
| /@/, | |
| /.+@.+\..+/i, # http://davidcel.is/blog/2012/09/06/stop-validating-email-addresses-with-regex/ | |
| /^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/i, # http://www.aidanf.net/posts/rails-authentication-tutorial | |
| /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+)*\.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i, # https://fightingforalostcause.net/content/misc/2006/compare-email-regex.php | |
| /^([\w\.%\+\-]+)@([\w\-]+\.)+([\w]{2,})$/i, # http://awesoham.wordpress.com/2013/10/02/a-simple-regex-for-rails-email-validation/ | |
| /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i, # https://www.railstutorial.org/book/modeling_users | |
| /\A[^@\s]+@([^@.\s]+\.)+[^@.\s]+\z/, # http://stackoverflow.com/questions/4770133/rails-regex-for-email-validation | |
| /\A[^@\s]+@([^@.\s]+\.)*[^@.\s]+\z/, # http://stackoverflow.com/questions/4770133/rails-regex-for-email-validation | |
| /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i, # http://stackoverflow.com/questions/4770133/rails-regex-for-email-validation | |
| /\A(\S+)@(.+)\.(\S+)\z/, # http://stackoverflow.com/questions/4770133/rails-regex-for-email-validation | |
| ] | |
| if __FILE__ == $0 | |
| spacer = "-" * 100 | |
| REGEXES.each do |rex| | |
| puts spacer | |
| puts spacer | |
| print "Testing #{rex.to_s}\n\n" | |
| puts spacer | |
| print "Valid Emails\n" | |
| VALID_EMAILS.each do |email| | |
| print "%50s -> %5s\n" % [email, (email =~ rex ? "YAY" : "NAY")] | |
| end | |
| puts spacer | |
| print "Invalid Emails\n" | |
| (INVALID_EMAILS).each do |email| | |
| print "%50s -> %5s\n" % [email, (email =~ rex ? "NAY" : "YAY")] | |
| end | |
| end | |
| end | 
And here's the results:
| Regex | Valid emails rejected (17) | Invalid emails accepted (20) | Source | 
|---|---|---|---|
| /@/ | 0 | 18 | |
| /.+@.+\..+/i | 0 | 16 | davidcel.is | 
| /^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/i | 5 | 9 | aidanf.net | 
| /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+)*\.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i | 5 | 0 | fightingforalostcause.net | 
| /^([\w\.%\+\-]+)@([\w\-]+\.)+([\w]{2,})$/i | 5 | 7 | awesoham.wordpress.com | 
| /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i | 6 | 7 | railstutorial.org | 
| /\A[^@\s]+@([^@.\s]+\.)+[^@.\s]+\z/i | 3 | 10 | stackoverflow.com | 
| /\A[^@\s]+@([^@.\s]+\.)*[^@.\s]+\z/i | 3 | 11 | stackoverflow.com | 
| /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i | 7 | 7 | stackoverflow.com | 
| /\A(\S+)@(.+)\.(\S+)\z/i | 1 | 13 | stackoverflow.com | 
The important thing to take away from these results is that none of these regular expressions are very good at both accepting valid emails and rejecting invalid ones. And since I think it's much worse to reject a valid emails than to accept an invalid one, that just leaves us with one of the first two. i.e. the simplest ones on the list.
In fact, it's probably better to validate email on the client side, but still allow the user to submit the email even if your validation thinks it's invalid (maybe with minimal server-side validation such as /@/). That way you can give them a helpful hint if it looks like they've entered an invalid email, but still allow the people with unusual emails to submit them.
Here are some related links:
- Comparing E-mail Address Validating Regular Expressions
- An RFC compliant regular expression for validating emails
- Stop validating emails with complicated regular expressions
- A list of valid and invalid email addresses
- StackOverflow discussion on validating emails in rails
- Comparing E-mail Address Validating Regular Expressions