Tracking down rogue bandwidth: a story of Comcast data caps and offsite backup

tl;dr: if you use BackBlaze and are subject to a data cap, you should keep an eye on BackBlaze to make sure it doesn’t quietly use up all your data allowance. I’ve switched to CrashPlan, which has better retention, encryption, and backup policies.

Beyond sharing this warning, I wanted to do a longer write-up of what I experienced because I found some pretty interesting things along the way – how to get true visibility into what’s happening on your local network, what’s up with Comcast’s usage meter, and what happens when something goes wrong with offsite backup with BackBlaze.

Earlier in the year Comcast announced they were introducing 1 terabyte data caps in 27 markets across the US (of which 18 previously did not have enforced caps). This is a pretty consumer-unfriendly move, and being in California this meant this was happening to me – but personally I had never come close to exceeding the cap, so wasn’t overly worried (and won’t be until 4K video becomes more common). In the second week of November 2016 I got a notification in my browser telling me I’d used 90% of my cap. This was surprising in two ways – one, I didn’t expect Comcast to hijack an http request to do this, and two, it was saying I’d used over 900gb of data in just over a week. I was about to head overseas for 8 days so I decided to shut everything down while I was away and figure out what was using the data once I got back.

While I was away, each day I saw my data continue to increase by approximately 80gb a day. I had a left my Mac Mini server running, a Dropcam, and a Nest and that was it. I checked remotely to ensure there were no other wireless clients running on my network, but couldn’t see anything. I used Activity Monitor to keep a track of how much bandwidth each process was using on the server, and saw BackBlaze (bztransmit) had transmitted ~50gb, but my uptime was 3 weeks at that point, and so that seemed about right. I was completely baffled – nothing else was being generated by the Mac Mini, and the amount of data uploaded by the Dropcam was also minimal, yet each day it kept increasing.

With no obvious culprits on my network, it seemed suspicious for me to be exceeding my cap just as they introduce caps, so my attention turned to whether Comcast’s usage reporting was accurate.

Comcast gets a lot of heat for their data caps, and in particular for their usage meter accuracy. When trying to use the meter to diagnose bandwidth issues, Comcast itself says it should not be relied on. They have a disclaimer on the meter saying it is “delayed by up to 24 hours” (and on the phone I was told it could lag for weeks). By the end of figuring all this out, I actually found their usage meter to be accurate in realtime, but others online report usage changing after the fact. I also found numerous people online complaining that their usage meter did not match their self-evaluated usage (some measured using tools such as using Gargoyle firmware on their router). However, most of these quibbles were off by 10% or so, not the orders of magnitude I was seeing. I was also unable to find well-proven cases of their meter being wildly off except for one instance where a MAC address had been entered with a single character typo. There was only one other example of Comcast rolling back fees for overages – but it only happened after the media became involved, and nobody technically proficient actually checked the network.

Given the likely accuracy, I was keen to find an inexpensive (and preferably software based, so I could do it remotely) method for measuring what was actually going through the cable modem. Most of the advice I found suggested buying a router that supported Gargoyle, but I also discovered that the Motorola SB6121 cable modem I use reports number of codewords per channel, aka bytes.

In the above example, you can see the aggregate codewords for the 4 channels is 275,455,328,290, which is 275.45 gigabytes. This was exactly as I would’ve expected. I remotely reset the cable modem (another quirk – Comcast removed the ability for me to do that myself on the cable modem through a modem update – I actually own the modem too) using the mobile app, and after doing so my traffic was measured in megabytes per day, which was what I expected. At this point I was baffled, as advice I’d seen repeated a few times online is to use the codewords as the source of truth, but they do NOT include uploads, which Comcast counts towards your cap.

At this point I didn’t know that, and since I had no idea how they were measuring so much traffic, I kept escalating with Comcast, eventually being told a level 3 support technician would contact me. That was a couple of weeks ago, and I never heard anything more from Comcast.

To be completely sure it wasn’t something on my end, my next step to diagnose this was to use SNMP logging to verify the amount of data leaving my network via the ethernet connection to the cable modem. SNMP is “simple network management protocol”, and is a standardized way of reporting and collating information about managed devices on a network. Routers with SNMP logging are able (amongst other things) to report exactly how much bandwidth is being consumed on any of its many interfaces (which are unhelpfully named, but a good description of each on an Airport Extreme is available here). Helpfully, mgi1 is the interface specifically for the WAN connection – i.e. I was able to measure very specifically every bit of traffic going to and from the cable modem. Unfortunately, Apple took this feature out of the 802.11ac Airport Extreme, which I used as my main router. As such I reconfigured my entire network to use my older router and an old Airport Express (which also includes SNMP logging).

At this point I had SNMP logging all my internal network bandwidth, and to visualize it I used PeakHour 3, which did a great job of making it very easy to see what was happening. At this point I finally had proof my Mac Mini really was uploading a LOT of data, and matched the Comcast usage. But Activity Monitor still did not show me that anything was out of sorts, so I still didn’t know WHAT was causing all the usage.

I researched what other tools I could use to monitor network traffic by process, and I found Little Snitch. Little Snitch lets you police all network requests, and approve or deny them, which is pretty nifty, but all I needed was the monitoring tool. This let me see that bztransmit was happily uploading at 5-10mbps in bursts every few minutes, and the cumulative from this process matched exactly the upload traffic seen in the SNMP logging. Leaving it for even just a few hours, it was clear this was the culprit.

I throttled BackBlaze using their preference pane to 128kbps (it claimed it would upload “approximately 1gb a day”, compared to the 4gb/hour it was doing at the time) and contacted their customer support. While I waited to hear back from them, I started reading through the BackBlaze log files, and saw it was uploading the same amount each day, and then a little more, e.g.:

server:bzreports_eventlog admin$ cat 10.log
 2016-11-10 04:55:40 - END_BACKUP: Backed up 2801 FILES / 40617 MB

server:bzreports_eventlog admin$ cat 17.log
 2016-11-17 05:02:54 - END_BACKUP: Backed up 2933 FILES / 51489 MB

They helpfully have a log of the last files uploaded, located in /Library/Backblaze/bzdata/bzlogs/bzreports_lastfilestransmitted. The same files were being uploaded each day.

Another thing I noticed in the log files is that it appeared that BackBlaze was downloading updates and/or reinstalling itself basically every day. As far as I can tell, this was resetting the usage in Activity Monitor and why that was not a reliable measure.

While I was investigating these logs, I continued monitoring my network, and noticed that while BackBlaze was “throttled”, it ended up uploading nearly 4gb of data in 24 hours. The bztransmit process was using just under 1mbps of bandwidth, approximately 8x the promised throttle limit. To be clear, I have been using megabits for all my bandwidth measures in this post. I have to assume that there is a mistake in the conversion somewhere, which would perfectly explain why it was uploading at 128 KBps rather than Kbps. Their annotation in the UI and their documentation is ambiguous as it’s all lowercase and abbreviated, however, their estimates match that the numbers shown should be kilobits (128 kbps = 16 KBps = 1,382,400 kbytes = ~1.3GB/day).

BackBlaze got back to me the following day and asked for a copy of my logs – all of them. They gave me a unsigned tool which gathers all these logs, as well as a full system snapshot – approximately 50mb of log files in my case. At this point I wasn’t very comfortable about this. BackBlaze encrypts your files with your private key before uploading them, and according to an employee on Reddit, they can’t even see the filename. I really liked this feature – having all my private documents in the cloud is a scary proposition from a security perspective, and even if the contents are encrypted, the filename themselves leak entropy (e.g. financial documents, photo folder names, etc). I wasn’t particularly keen to send that over, as the logs are very chatty about your system and the files it’s working on.

At this point things with BackBlaze broke down. The customer support person I was communicating with ignored my request to surgically provide logs rather than send all of them. And by ignore, I mean they stopped updating the ticket and ignored my updates to the ticket. I was only able to re-engage them by pinging the BackBlaze Twitter account. He then refused to escalate further without the logs, and ignored my report of the throttling bug.

Without BackBlaze making any good faith efforts to remedy the situation (I should also note they were never apologetic about any of this, including just ignoring the ticket), I investigated alternatives, and have switched to CrashPlan. They offer a variety of better features compared to BackBlaze, including file versioning (including allowing deleted files to stay backed up – BackBlaze will delete after 30 days), 448-bit file encryption (versus 128-bit for BackBlaze), and allow NAS backups too. They are $60/year compared to $50/year for BackBlaze.

I had been using BackBlaze for 6 years. I had been an evangelist, recommending it to family and friends, and likely referred at least a dozen new customers to them. To say the least, I was very disappointed at this experience. After this happened, I did a quick look around to see if this had happened to others. What I found was a pattern of issues with customer support, slow restores, other bugs (folks missing files showed up several times), and backups being unexpectedly deleted from the server (e.g. someone goes on vacation and leaves their external drive at home, and while away the 30 day trigger hits. This also means if you do have a catastrophic data failure, you have 30 days to get your computer set up again to do a restore). It was interesting the different approaches people took to try to vent their issues with the company, including Amazon reviews, Facebook, the BBB and even CNet.

Most tellingly I found someone on Reddit reporting the same issue I encountered – a year ago – with other users in the thread reporting the same problems.

At the end of all this, I’ve spent somewhere in the vicinity of 20-30 hours of my time diagnosing this and talking with customer support, I’ve gone over my quota for two months (October and November – with the latter hitting 2TB in the end), leaving me a single courtesy month with Comcast (after which I never get any courtesy months again). On the plus side, my network is now more secure, and I learnt some interesting stuff along the way, and I was able to diagnose the cause literally 1 hour before midnight on November 30th, preventing me from going over again. The bummer is that there are two companies who have shown no interest in fixing issues for the consumer and puts all the onus on them: BackBlaze need to fix these bugs, and Comcast needs to provide their customers with better tools for diagnosing and monitoring network traffic if they’re going to institute caps for everyone.

Fixing a corrupted/deleted partition table

About a month ago, while trying to upgrade to Windows 7, I managed to wipe the partition table and in trying to fix it, created a corrupted table.

(incidentally, if you can’t update Vista with the latest service pack, you won’t be able to upgrade to Windows 7, so don’t bother trying without fixing your boot configuration. Turns out my problem was having a dual-boot configuration with XP)

I had backed up my key files, but I wasn’t keen on losing my nice Vista configuration. I posted the whole sordid tale on Superuser.

Happily, I managed to figure out what had happened, what I was actually doing at a low level (sometimes I am a little too lazy and do just blindly run commands, something that Raymond Chen despises), and completely recover. I figured I’d post a link to the solution in case anyone else has their own troubles.

Completeness versus permanence

I realised something interesting tonight. For the first time, for a huge number of people, all of their personal correspondence will be neatly catalogued and saved. This is thanks to the pervasiveness of email — both people and businesses.

And unlike old boxes of letters, this correspondence is easily disseminated. Gmail has made me pretty fastiduous about keeping my inbox clear — however, with great search, most people don’t even need to organise their email. What this means is that all aspects of my life — travel bookings, concert tickets, bills, short notes, long catch ups, letters back home from holidays, photos sent, job applications, arguments with my brother, and so on — are being neatly stored and catalogued for the future. No lost filing cabinets, nothing thrown away, no mould or water damage.

It’s a stunning thought really — with Gmail specifically, people are now far more likely to have a permanent email account with enough storage to keep using indefinitely without deleting anything. Imagine how useful this information will be in hundreds of years for researching history.

But there’s the rub. I won’t cover this too much, as data ownership and safety is a much-discussed issue, but what happens when/if Google is acquired or goes out of business? What if they decide to close the account, or they have a catastrophic server failure? I have Gmail offline and Thunderbird to back up my mail (which most Gmail users won’t do), but even in this case, who knows what web browsers and mail readers will be like in the future — even if I own a local copy, will I be able to read it?

Essentially we’ve traded simplicity of archiving for the difficulty of maintaining the archive into the future. Previously you could just take a letter and throw it in an archive box and put it in an attic. You never had to touch it again and it would maintain its state. Now you need to find different ways to keep your data available and safe. In fact it wouldn’t surprise me if we actually end up with fewer sources of archived data, but they will be more complete. It’s certainly interesting wondering how this is all going to shake out…

People power versus algorithms

Very interesting article from Wired.  This is something I have struggled with personally.  Is it worth investing the time to automate a process, or is it just cheaper to outsource the smarts of up-and-coming countries?  Ultimately I’ve found that not only is it cheaper to outsource the work, but the quality of the results is much higher.  The main problems I’ve found are scalability and training.  Overcome these for your task at hand and the benefits are immense.

The vogue for human curation reflects the growing frustration Net users have with the limits of algorithms. Unhelpful detritus often clutters search results, thanks to online publishers who have learned how to game the system. Users have tired of clicking through to adware-laden splogs posing as legitimate resources. And unless you get your keywords just right, services like Google Alerts spew out either too much relevant content — or not enough.

Again, I have to say, the quality of the work just blows any automated stuff I’ve done out of the water.  However you do have to manage your sources – something like Amazon’s Mechanical Turk is a bit hit a miss, whereas something like Elance allows for a feedback system and a more personal relationship – just not the sheer bulk of work.

What the article doesn’t cover is the fact that most of this type of work is outsourced.  It would be very interesting to see what the demographics of the workers are like for something like Mechanical Turk.  So if this continues to grow in popularity, what are the long term effects of this going to be?  Will this help improve the skills of the contributors or just burn them out with mindless work?  I personally think the former – most of the projects I have seen are actually very interesting.  I know of people who use Mechanical Turk for fun and as a timewaster – certainly not as an income source.

This Internet thing – it ain’t gonna last.

Hilariously off-the-mark article published by Newsweek in 1995 by Clifford Stoll. Some excerpts:

After two decades online, I’m perplexed. It’s not that I haven’t had a gas of a good time on the Internet. I’ve met great people and even caught a hacker or two. But today, I’m uneasy about this most trendy and oversold community. Visionaries see a future of telecommuting workers, interactive libraries and multimedia classrooms. They speak of electronic town meetings and virtual communities. Commerce and business will shift from offices and malls to networks and modems. And the freedom of digital networks will make government more democratic.

I always love a good future prediction, especially when it comes true. The best though is when someone makes an “anti-prediction” that then comes to pass.

Nicholas Negroponte, director of the MIT Media Lab, predicts that we’ll soon buy books and newspapers straight over the Intenet. Uh, sure.

Wait, wait, it gets better! It reminds me of someone who reviewed Wikipedia in 2001. Expecting so much from a nascent product, without any foresight to possibilities.

What the Internet hucksters won’t tell you is tht the Internet is one big ocean of unedited data, without any pretense of completeness. Lacking editors, reviewers or critics, the Internet has become a wasteland of unfiltered data. You don’t know what to ignore and what’s worth reading. Logged onto the World Wide Web, I hunt for the date of the Battle of Trafalgar. Hundreds of files show up, and it takes 15 minutes to unravel them–one’s a biography written by an eighth grader, the second is a computer game that doesn’t work and the third is an image of a London monument. None answers my question, and my search is periodically interrupted by messages like, “Too many connectios, try again later.”

It’s at this point I’m starting to wonder if this is an elaborate hoax. The comments read like parody – the equivalent of the fake “what women were thought of in the 1950s” articles you occasionally see floating around.

Then there’s cyberbusiness. We’re promised instant catalog shopping–just point and click for great deals. We’ll order airline tickets over the network, make restaurant reservations and negotiate sales contracts. Stores will become obselete. So how come my local mall does more business in an afternoon than the entire Internet handles in a month? Even if there were a trustworthy way to send money over the Internet–which there isn’t–the network is missing a most essential ingredient of capitalism: salespeople.

I have enjoyed a virtually salesperson-free existence since 2003 and love it. Being able to research my decision from a wide-variety of products and then choose the best deal is how things should be done.

When geeks bite back

Very interesting article on news.com about Sarah Lacy’s interview of Mark Zuckerberg, from Facebook. I’ve seen some pretty atrocious interviews, keynotes, panels and presentations in my time, but this is right up there when you think of the scope. Interviewing the world’s youngest self-made billionaire (is that title accurate?) at a conference like SXSW…well, I can’t but help think she was a little blase about it all, even by her own self admission in a post-interview discussion with Valleywag.

Edit: Found the interview in question. Suspicions confirmed! Sometimes the arrogance in the Silicon Valley can get a bit much… Haha and Mark’s rising inflection at the end of almost every sentence is kind of irritating.

Edit 2:  Jeff Jarvis has a great post going into detail what went wrong in a far more insightful manner than what I could muster.