Rsync failing randomly: writefd_unbuffered failed to write: Broken pipe (32)

27 April 2015

My rsync backup script broke randomly after literally years without a problem.  I haven’t tinkered with the script OR my server for months, and suddenly it stopped working.  I went through a lot of voodoo trying to track down the cause with very little help.  All I really did is rule out a ton of problems:

  • My macbook pro was on a much older version (2.6.9) than my Rackspace server (3.0.9) so I updated so versions matched
  • Ran detailed debug logs on both client and server
  • Checked file it failed on for consistency – none
  • Timed the failures to see if there was any consistency – none
  • Tried different timeout/keepalive settings, no change

In all this debugging my first big hint was what caught someone else’s eye too on this rsync bug thread.  He took it a step further and to realizing the cause & a workaround.  Want to try guessing too?  Here is the symptom:

  • Checked the last *successful* file on the recipient vs. sender.  The recipient’s last file received was dozens or even hundreds of files ago in the sender’s log.

Seems pretty obvious in hindsight, it’s just the sort of thing you don’t expect rock solid software like rsync to have a problem with: the sender was sending data too fast for the receiver.  I’m still unclear on whose fault this is, but the rsync thread claims it’s out of their hands, so maybe it’s the linux network IO layer that’s failing?  The other group of people hitting this a lot is users rsyncing to USB sticks, another case where the recipient is much slower than the sender.

The fix was simple.  -bwlimit limits the bandwidth used, and immediately it works reliably again every time.  I’m going to keep upping it to try to get an idea where things break, but my last run at -bwlimit 10000 worked fine.

Next question: why did it stop working suddenly?  I upgraded my internet to AT&T Uverse.  I haven’t noticed it being slower than my old Time Warner, so maybe it sped up enough to increase the buffer required into breakage territory.

Hopefully this post is helpful to someone, as this one took me quite a bit of hunting!