How to download some labeled messages from Gmail to a Unix computer

This blog post explains how to download some Gmail messages (distinguished by a label) to a computer running in Unix. The download method shown works in headless mode, so it can be run from cron etc.

Preparation instructions

  • You will need a computer running Unix and which can run Fetchmail. Your mail will be downloaded to a file (in mbox format) onto that computer.
  • If Fetchmail is not installed to that computer, you need to install it (which needs root privileges) or get the admin install it for you.
  • You will have to add your Gmail e-mail address and password to a config file which will be stored on the Unix computer running fetchmail. If this is not secure enough for you, then please don't continue.

Gmail setup instructions

  • Create a Gmail account if you don't already have one.
  • Log in to Gmail in a web browser (can be on any computer).
  • In Settings (the gear button) / Forwarding and POP/IMAP, select Enable IMAP, and save the changes. Click on the Save button.
  • Create two labels, one of them (let's call it foo auto) will be used by Gmail to track which messages have been downloaded already. After a download, Gmail will automatically remove that label. The other label (let's call it foo manual) is for your reference only.
  • If you already have some e-mail in your inbox to be downloaded, apply both labels to them, manually (or by searching).
  • If needed, set up a filter (in Settings / Filters / Create new filter) which will apply both labels to incoming messages you want to be get downloaded.
  • Go to https://www.google.com/settings/security/lesssecureapps in your browser, and enable less secure apps (such as Fetchmail).

Unix computer setup instructions

  • Log in to the Unix computer you wan to download the messages to. Typically such login is done using SSH.
  • Install fetchmail, put it to $PATH (usually /usr/bin/fetchmail from package). Version 6.3.9 works, but probably older versions work too. On Debian and Ubuntu, it is as easy as running (without the $):
    $ sudo apt-get install fetchmail
  • Create a directory which will hold your downloaded mail. For example:
    $ mkdir downloaded.mail
    $ cd    downloaded.mail
  • Create a plain text config file with the contents below, and copy it to the Unix computer, to the downloaded.mail directory. Typically the copying is done using scp or rsync. Don't forget to change the USERNAME, PASSWORD and foo auto settings to reflect your Gmail account. (If it's inconvenient for you to edit files on the Unix computer, change it first locally, and then copy it again).
    mda "exec >>download.mbox && echo From MAILER-DAEMON Thu Mar 29 23:43:41 2007 && cat"
    poll imap.gmail.com
    proto IMAP
    user "USERNAME@gmail.com"
    pass "PASSWORD"
    # Gmail will auto-remove this label as soon as the message has been downloaded,
    # so it won't get downloaded again at the next run.
    folder "foo auto"
    # Also download read messages.
  • Make sure the config file on the Unix computer has the filename download.fetchmailrc. Here is an example how to rename it:
    $ mv download.fetchmailrc.txt download.fetchmailrc
  • Revoke other users' access to the config file, protect your password from being stolen. Please note the star at the end:
    $ chmod 700 download.fetchmailrc*
  • Create the output mbox file and protect it:
    $ : >>download.mbox
    $ chmod 700 download.mbox
  • In the downloaded.mail directory, run:
    $ fetchmail -f download.fetchmailrc

    This will download and append all your Gmail messages with the label foo auto to the file download.mbox to the downloaded.mail directory on the Unix computer, and remove the label foo auto, so when you run the command again, messages already downloaded won't be downloaded again. (Gmail labels are global, so you have to define additional labels if you want to download mail to several computers.)

  • If you get this error:
    .../.fetchmail.pid: Permission denied
    Then add --pidfile fetchmail.pid (without the quotes) to your fetchmail command-line.
  • If you get this error:
    fetchmail: Authorization failure on ...@gmail.com@...
    fetchmail: Query status=3 (AUTHFAIL)

    Then visit http://www.google.com/accounts/DisplayUnlockCaptcha from any web browser, unlock it, and run the fetchmail command again.

  • If needed, set up a cron job which will download automatically for you. Typically you can download once per minute, once per hour or once per day using cron jobs.

Incremental download instructions

  • Log in to the Unix computer you wan to download the messages to. Typically such login is done using SSH.
  • Change to the downloaded.mail directory:
    $ cd downloaded.mail
  • In the downloaded.mail directory, run:
    $ fetchmail -f download.fetchmailrc

    Messages already downloaded won't be downloaded again, because they don't have to foo auto label anymore. (Gmail labels are global, so you have to define additional labels if you want to download mail to several computers.)


On the speed of memset

This blog post presents some of the speed measurements I've done with alternative implementations of memset.

I was filling a 1 GB memory area 20 times (equivalent to memset(a, 0, 1 << 30) each) with various different implementations, and measuring the speed. I've compiled the program for i386 (gcc -m32) and amd64 (gcc -m64), I ran it on desktop PC running 64-bit Linux 3.13.0 on a Xeon CPU (Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz) with 32 GB of RAM. I was compiling the code with GCC 4.6.3, with optimization level gcc -O2, because gcc -O3 optimized the 1-byte-at-a-time loop to a 4-bytes-at-a-time loop.

It turned out that there was no measurable difference in the i386 and amd64 version of the programs. Here are relative speeds (higher numbers are proportionally faster) of user times:

  • 1.480: memset, doing rep stosd, 4 bytes at a time
  • 1.000: 16 writes of 4 bytes each in loop the body (like Duff's device)
  • 1.000: 1 write of 8 bytes in the loop body
  • 1.000: 1 write of 4 bytes in the loop body
  • 0.675: 1 write of 2 bytes in the loop body
  • 0.329: 1 write of 1 byte in the loop body (char *cp = a, *cpend = a + sizeof(a) / sizeof(a[0]); for (; cp != cpend; ++cp) *cp = 0;)

I can interpret the numbers except for memset the following way: the cache doesn't help at this size; CPU does correct branch prediction (or taking the branch is fast enough), or taking a branch is faster than writing to memory; the data bus between the CPU and the memory can take 4 bytes at a time.

But why is the assembly instruction rep stosd that much faster than any of the loops? What's the magic behind it? It looks like my CPU had an optimized rep stosd and rep stosb built in, called ERMSB. More details in the PDF available from here (search for memset within the downloaded PDF).