How to Extract Addresses from Email

From MKRD.info Wiki

Keywords: How to Extract Addresses from Email, Windows Live Mail, Email Address Extractor, extract addresses from windows live mail.


If you are a spammer, go kill yourself. If you are a web admin, you may read on.


Although I strongly urge you against this, you may be interested in extracting all addresses from your email account and be able to send an email to that list.


Commercial software to do this exists. However, if you have a legitimate need for this, and have more brains than the typical scammer, then the following will help:


Export e-mails

Firstly, export all of your emails to a standard format. This is usually accomplished by File>Export... Before doing this, I recommend going thru your email, and deleting any attachments that you no longer need, to reduce the size of the exported file. Also, I recommend you go thru your e-mail and consider any messages that you do not want an email sent. Your mistresses, public email lists that are posted online, your relatives, are all those who you do not want to send a mass email to.


Note that exporting your mail to safe storage is a good backup strategy for your e-mail.

Prepare for processing

You should now have a subfolder tree with *.eml files inside. Get all of those files into one folder. You will now need to run the Linux command grep on those files. You may do this on our Linux machine, cygwin, or a server that you have access to. If it is your server, then compress those files into one file, upload, and then extract (to save time over FTPing those files one by one).


Grep Method1 - character range match

In the directory which contains only the list of those *.eml files, run the following grep command:

grep -h [A-Za-z0-9.-]*@[A-Za-z0-9.-]* * >>results.txt

note: for some reason, the following command, which would save a lot of time and processing down the road, does not work - grep -h [A-Za-z0-9.]+@[A-Za-z0-9.]+ * >>results.txt


Grep Method1 - blank stop

Or, if you want grep to trigger on whitespace, use

grep -h [^:space:]*@[^:space:]* * >>results.txt


Note that both methods are not ideal. Linux gurus are welcome to suggest a better way to match an email address.


The resultant file will be large, and will contain a lot of things that are not real e-mail addresses. MUCH BETTER GREP COMMAND IS NEEDED for better results. Primarily, that list will contain whole lines of text where the e-mail was found, and will include invalid addresses that are used by servers to pass email around.


Clean up file in OpenOffice

Remove lines that are server-to-server communication. For example, search for:

.*Message-ID.*$

Replace with a blank (delete)


Look thru any other ways to clean up the file.


Run the following search and replace on that file in OO:

Search for

[A-Za-z0-9.]*@[A-Za-z0-9.]*

replace with

\n&\n

The file should now contain email addresses on separate lines. Run grep on that file again, and you should now have just a long list of lines, with only email addresses on each line. The list will still contain some invalid entries, duplicates, etc.


Remove duplicates from the listing. I usually do something like this by copy-pasting into Calc (the spreadsheet Open Office equivalent), highlighting that whole column, going Data>Filter>Standard filter. Under "Value" choose -not empty-. Click "More Options". Select "No Duplicates", then click OK. Paste the resulting list WITHOUT FORMATTING into a new OO file. That file will contain the email addresses that you are after. Note: go thru the list manually, and remove the following:


  • Invalid e-mail addresses
  • Partial e-mail addresses (easier to find now that the list is sorted)
  • Emails of those who you do not want to offend or send to.


BE CAREFUL to not post to any email addresses that will be reposted publicly (email lists, etc)