Pages

Sunday, December 27, 2015

Bookstore sells some data centre capacity, becomes Microsoft, Oracle's nemesis

Sysadmin's 2015 review part 1 With 2015 drawing to a close and 2016 about to begin, it is time to reflect on the fact that the world never stops changing. The tech industry certainly changes, and so here's one sysadmin's view of the industry's movers and shakers.

In part one we're going to look at Amazon, Oracle and Microsoft. As I see it, the strategy of these three companies are broken reflections of one another. Oracle is trying to become Amazon. Microsoft is trying to become Oracle. Amazon's plans are completely unrelated to either of them.

Amazon

Amazon started life as an online book store. It quickly expanded to become an online store of, well... everything! This "everything" included the spare capacity of its own data centre. Amazon could have become "just another hosting provider", but to dismiss it as such – and the exceptionally ignorant among us enjoy loudly doing so – is to fail to understand the very first thing about Amazon.

No matter in which area of endeavour it chooses to participate, Amazon is corporately obsessed with efficiency. Amazon commoditises everything, from books to labour to computing to logistics. Amazon automates and orchestrates. It lives and breathes metrics and analytics.

It is more than a business strategy; it is a religion. The idea that everything can be improved through instrumentation, metrics and analytics is the religion of the Seattle tech scene. Microsoft is infected by it, as is virtually every other tech business in the area.

Humans are inefficient and their judgment not as pure as that of an algorithm. Imagine an entire metropolis where everyone has perpetual Google envy, but they try to address it by figuring out a way to ship you a box of bananas that costs half a cent less than the previous method.

Recently, Amazon has decided that owning the online world isn't enough. It wants to be Walmart. It wants physical stores and even more warehouses. It wants sub-warehouses everywhere delivering you goods automatically.

Above all else, Amazon wants those pesky humans out of the picture. Robots in the warehouses, robots to transfer goods from the primary storage facilities to the local sub-warehouses and drones to deliver goods directly to your doorstep.

And Amazon wants to deliver everything you need. Compute resource, physical goods, you name it. If someone buys something – anything – Amazon wants its 30 per cent.

Oracle

Oracle, meanwhile, is obsessed with what Amazon was five years ago and in doing so they are missing the bigger picture. Oracle wants to move to a subscription revenue model where they not only have an absolute lock on licensing, the workloads in question are running on Oracle's cloud, too.

Oracle completely misses the point of what Amazon is about and in doing so they guarantees it will never be the success Amazon is becoming.

As discussed above, Amazon's creation of AWS was essentially an accident. An outgrowth of Amazon's obsession with efficiency. Amazon simply can't have idle server capacity around, especially once the nerds have created this really neat layer of automation and orchestration for using those servers that make the idle capacity easy to sell!

But in doing so, Amazon turned self-service automated hosting of compute workloads into a commodity with relatively low prices and ease of use. Both of these things are anathema to Oracle, but necessary for success as a public cloud provider.

Oracle views becoming a cloud provider as a means of seeing who is using how much of what and how often. This is important because Oracle could then automate its licensing changes to squeeze the maximum amount of dollars out of its customers, adapting in near real time to any attempts to use licensing loopholes.

If Oracle can move enough customers over to its cloud with its new sales policies, then the cloud should help Oracle see increased short term revenue. Keeping to its existing licensing strategies, however, seems doomed to failure when the commodity approach of Amazon is just a click away.

Microsoft

Microsoft has a very successful public cloud. It also has some of the most advanced technologies in any number of markets and when assembled and analysed as a whole probably was the single most impressive technological product portfolio of any company on the planet. For Microsoft, this isn't even close to enough.

In your correspondent's opinion, it seems as if Microsoft operates on the belief that for every computer in use, Microsoft is owed a tithe. Every desktop ever sold should bring in a minimum amount. Every server in use should bring in a much – much – larger amount. Microsoft is seemingly so tied to this model that it has apparently blinded it to rational or useful licensing overhauls for almost two decades.

Today, however, Microsoft no longer has a monopoly on the endpoint. Mobile is huge and Microsoft is a non-entity there. Millions of people have never owned a Microsoft product but manage to access the internet every single day. Soon that number will reach a billion. What's Microsoft to do?

The answer for desktop users, apparently, is to alienate their existing installed base with intrusive Windows 10 advertisements that the average user can't make go away, download a copy of the operating system to their devices unwanted and plan to trigger an install of Windows 10 even when not requested by the user.

I'm personally bitter about the "download a copy of the operating system to their devices unwanted" because this happened to me while I had a device connected to a MiFi device while in another country. It ended up costing me hundreds of dollars, and there's absolutely nothing I can do about it.

Microsoft wants its endpoint dominance back because it allowed the firm to keep end users addicted to a huge number of Microsoft products (such as Office). These products used to reinforce Microsoft's position and in turn drove the uptake of Microsoft Server products. In turn these make using Microsoft endpoints easier.

On the Server side of the house, more than anything, Microsoft wants customers to stop running their own IT. For the very same reasons as Oracle, it wants customers to be using its public cloud for everything.

In order to help push customers to the cloud, Microsoft is making running workloads on your own tin as miserable as possible, and appears to be adopting some of Oracle's most reviled licensing strategies. One of these strategies is per core licensing.

Microsoft isn't, as the term might imply, charging some rational amount per actual core, perhaps with varying tiers based on the power and capability of the core. No, Microsoft is packaging cores up in minimum bundles, apparently ensuring that the average user can't possibly license things optimally and that ultimately the cost of on-premises workload hosting will be above that of hosting it on Azure.

Like Oracle, Microsoft is also massively incentivising its salesforce and what's left of its partners to sell Azure instead of on-premises licensing. Facing competition on various fronts, Microsoft will clearly try anything to get customers locked in.

If Microsoft succeed, it will be brilliant: once locked back in to its ecosystem, end users will be with it for another decade, maybe two. But if it fails, this will go down in history as a text book example of corporate hubris.

Microsoft stands upon the razor's edge. Will the superiority of its technology and its cross-integration win out? Or will customers say "Hold, enough"?

In part 2, I'll take a shot at decoding Cisco, Dell and HPE. ®

Friday, December 18, 2015

rsync.net: ZFS Replication to the cloud is finally here—and it’s fast

Even an rsync-lifer admits ZFS replication and rsync.net are making data transfers better.

by Jim Salter - Dec 17, 2015 2:00pm CET

In mid-August, the first commercially available ZFS cloud replication target became available at rsync.net. Who cares, right? As the service itself states, "If you're not sure what this means, our product is Not For You."

Of course, this product is for someone—and to those would-be users, this really will matter. Fully appreciating the new rsync.net (spoiler alert: it's pretty impressive!) means first having a grasp on basic data transfer technologies. And while ZFS replication techniques are burgeoning today, you must actually begin by examining the technology that ZFS is slowly supplanting.

A love affair with rsync

Further Reading

Bitrot and atomic COWs: Inside "next-gen" filesystems

We look at the amazing features in ZFS and btrfs—and why you need them.

Revisiting a first love of any kind makes for a romantic trip down memory lane, and that's what revisiting rsync—as in "rsync.net"—feels like for me. It's hard to write an article that's inevitably going to end up trashing the tool, because I've been wildly in love with it for more than 15 years. Andrew Tridgell (of Samba fame) first announced rsync publicly in June of 1996. He used it for three chapters of his PhD thesis three years later, about the time that I discovered and began enthusiastically using it. For what it's worth, the earliest record of my professional involvement with major open source tools—at least that I've discovered—is my activity on the rsync mailing list in the early 2000s.

Rsync is a tool for synchronizing folders and/or files from one location to another. Adhering to true Unix design philosophy, it's a simple tool to use. There is no GUI, no wizard, and you can use it for the most basic of tasks without being hindered by its interface. But somewhat rare for any tool, in my experience, rsync is also very elegant. It makes a task which is humanly intuitive seem simple despite being objectively complex. In common use, rsync looks like this:

root@test:~# rsync -ha --progress /source/folder /target/

Invoking this command will make sure that once it's over with, there will be a /target/folder, and it will contain all of the same files that the original /source/folder contains. Simple, right? Since we invoked the argument -a (for archive), the sync will be recursive, the timestamps, ownership, permission, and all other attributes of the files and folders involved will remain unchanged in the target just as they are on the source. Since we invoked -h, we'll get human-readable units (like G, M, and K rather than raw bytes, as appropriate). Progress means we'll get a nice per-file progress bar showing how fast the transfer is going.

So far, this isn't much more than a kinda-nice version of copy. But where it gets interesting is when /target/folder already exists. In that case, rsync will compare each of those files in /source/folder with its counterpart in /target/folder, and it will only update the latter if the source has changed. This keeps everything in the target updated with the least amount of thrashing necessary. This is much cleaner than doing a brute-force copy of everything, changed or not!

It gets even better when you rsync to a remote machine:

root@test:~# rsync -ha --progress /source/folder user@targetmachine:/target/

When rsyncing remotely, rsync still looks over the list of files in the source and target locations, and the tool only messes with files that have changed. It gets even better still—rsync also tokenizes the changed files on each end and then exchanges the tokens to figure out which blocks in the files have changed. Rsync then only moves those individual blocks across the network. (Holy saved bandwidth, Batman!)

You can go further and further down this rabbit hole of "what can rsync do." Inline compression to save even more bandwidth? Check. A daemon on the server end to expose only certain directories or files, require authentication, only allow certain IPs access, or allow read-only access to one group but write access to another? You got it. Running "rsync" without any arguments gets you a "cheat sheet" of valid command line arguments several pages long.

To Windows-only admins whose eyes are glazing over by now: rsync is "kinda like robocopy" in the same way that you might look at a light saber and think it's "kinda like a sword."

If rsync's so great, why is ZFS replication even a thing?

This really is the million dollar question. I hate to admit it, but I'd been using ZFS myself for something like four years before I realized the answer. In order to demonstrate how effective each technology is, let's go to the numbers. I'm using rsync.net's new ZFS replication service on the target end and a Linode VM on the source end. I'm also going to be using my own open source orchestration tool syncoid to greatly simplify the otherwise-tedious process of ZFS replication.

First test: what if we copy 1GB of raw data from Linode to rsync.net? First, let's try it with the old tried and true rsync:

root@rsyncnettest:~# time rsync -ha --progress /test/linodetest/ root@myzfs.rsync.net:/mnt/test/linodetest/  sending incremental file list  ./  1G.bin            1.07G 100%    6.60MB/s    0:02:35 (xfr#1, to-chk=0/2)    real	2m36.636s  user	0m22.744s  sys	0m3.616s

And now, with ZFS send/receive, as orchestrated by syncoid:

root@rsyncnettest:~# time syncoid --compress=none test/linodetest root@myzfs.rsync.net:test/linodetest  INFO: Sending oldest full snapshot test/linodetest@1G-clean (~ 1.0 GB) to new target filesystem:     1GB 0:02:32 [6.54MB/s] [=================================================>] 100%              INFO: Updating new target filesystem with incremental test/linodetest@1G-clean ...         syncoid_rsyncnettest_2015-09-18:17:15:53 (~ 4 KB):  1.52kB 0:00:00 [67.1kB/s] [===================>                              ] 38%                real	2m36.685s  user	0m0.244s  sys	0m2.548s

Time-wise, there's really not much to look at. Either way, we transfer 1GB of data in two minutes, 36 seconds and change. It is a little interesting to note that rsync ate up 26 seconds of CPU time while ZFS replication used less than three seconds, but still, this race is kind of a snoozefest.

So let's make things more interesting. Now that we have our 1GB of data actually there, what happens if we change it just enough to force a re-synchronization? In order to do so, we'll touch the file, which doesn't do anything but change its timestamp to the current time.

Just like before, we'll start out with rsync:

root@rsyncnettest:/test# touch /test/linodetest/1G.bin  root@rsyncnettest:/test# time rsync -ha --progress /test/linodetest/ root@myzfs.rsync.net:/mnt/test/linodetest  sending incremental file list  1G.bin            1.07G 100%  160.47MB/s    0:00:06 (xfr#1, to-chk=0/2)    real	0m13.248s  user	0m6.100s  sys	0m0.296s

And now let's try ZFS:

root@rsyncnettest:/test# touch /test/linodetest/1G.bin  root@rsyncnettest:/test# time syncoid --compress=none test/linodetest root@myzfs.rsync.net:test/linodetest  INFO: Sending incremental test/linodetest@syncoid_rsyncnettest_2015-09-18:16:07:06 ...         syncoid_rsyncnettest_2015-09-18:16:07:10 (~ 4 KB):  6.73kB 0:00:00 [ 277kB/s] [==================================================] 149%                real	0m1.740s  user	0m0.068s  sys	0m0.076s

Now things start to get real. Rsync needed 13 seconds to get the job done, while ZFS needed less than two. This problem scales, too. For a touched 8GB file, rsync will take 111.9 seconds to re-synchronize, while ZFS still needs only 1.7.

Touching is not even the worst-case scenario. What if, instead, we move a file from one place to another—or even just rename the folder it's in? For this test, we have synchronized folders containing 8GB of data in /test/linodetest/1. Once we've got that done, we rename /test/linodetest/1 to /test/linodetest/2 and resynchronize. Rsync is up first:

root@rsyncnettest:/test# mv /test/linodetest/1 /test/linodetest/2  root@rsyncnettest:/test# time rsync -ha --progress --delete /test/linodetest/ root@myzfs.rsync.net:/mnt/test/linodetest/  sending incremental file list  deleting 1/8G.bin  deleting 1/  ./  2/  2/8G.bin            8.59G 100%    5.56MB/s    0:24:34 (xfr#1, to-chk=0/3)    real	24m39.267s  user	3m15.944s  sys	0m30.056s

Ouch. What's essentially a subtle change requires nearly half an hour of real time and nearly four minutes of CPU time. But with ZFS...

root@rsyncnettest:/test# mv /test/linodetest/1 /test/linodetest/2  root@rsyncnettest:/test# time syncoid --compress=none test/linodetest root@myzfs.rsync.net:test/linodetest  INFO: Sending incremental test/linodetest@syncoid_rsyncnettest_2015-09-18:16:17:29 ...         syncoid_rsyncnettest_2015-09-18:16:19:06 (~ 4 KB):  9.41kB 0:00:00 [ 407kB/s] [==================================================] 209%                real	0m1.707s  user	0m0.072s  sys	0m0.024s

Yep—it took the same old 1.7 seconds for ZFS to re-sync, no matter whether we touched a 1GB file, touched an 8GB file, or even moved an 8GB file from one place to another. In the last test, that's almost three full orders of magnitude faster than rsync: 1.7 seconds versus 1,479.3 seconds. Poor rsync never stood a chance.

Listing image by Flickr user: jonel hanopol

OK, ZFS is faster sometimes. Does it matter?

I have to be honest—I feel a little like a monster. Most casual users' experience of rsync will be "it rocks!" and "how could anything be better than this?" But after 15 years of daily use, I knew exactly what rsync's weaknesses were, and I targeted them ruthlessly.

As for ZFS replication's weaknesses, well, it really only has one: you need to be using ZFS on both ends. On the one hand, I think you should already want ZFS on both ends. There's a giant laundry list of features you can only get with a next-generation filesystem. But you could easily find yourself stuck with a lesser filesystem—and if you're stuck, you're stuck. No ZFS, no ZFS replication.

Aside from that, ZFS replication ranges from "just as fast as anything else" to "noticeably faster than anything else" to "sit down, shut up, and hold on." The particular use case that drove me to finally exploring replication—which was much, much more daunting before tools like syncoid automated it—was the replication of VM images.

Virtualization keeps getting more and more prevalent, and VMs mean gigantic single files. rsync has a lot of trouble with these. The tool can save you network bandwidth when synchronizing a huge file with only a few changes, but it can't save you disk bandwidth, since rsync needs to read through and tokenize the entire file on both ends before it can even begin moving data across the wire. This was enough to be painful, even on our little 8GB test file. On a two terabyte VM image, it turns into a complete non-starter. I can (and do!) sync a two terabyte VM image daily (across a 5mbps Internet connection) usually in well under an hour. Rsync would need about seven hours just to tokenize those files before it even began actually synchronizing them... and it would render the entire system practically unusable while it did, since it would be greedily reading from the disks at maximum speed in order to do so.

The moral of the story? Replication definitely matters.

rsync is great, but when you feel the need for (data transfer) speed...

What about rsync.net?

Now that we know what ZFS replication is and why it matters, let's talk about rsync.net. I can't resist poking a little fun, but I like these folks. The no-nonsense "if you don't know what it is, it isn't for you" approach is a little blunt, but it makes a realistic assessment of what they're there for. This is a basic service offering extremely high-quality infrastructure to admins who know what they're doing and want to use standard system tools without getting hamstrung by "friendly" interfaces aimed at Joe and Jane Six-Pack. They've been around since 2001, and they are sporting some pretty big names in their "these are our customers" list, including Disney, ESPN, and 3M.

What you're actually getting with your rsync.net subscription is a big, honking VM with as much space on it as you paid for. You can use that VM as a target for rsync—the basic service they've been offering to the world for fourteen years now—or, now, for ZFS replication. It's kind of a sysadmin's dream. You can install what you want, however you want, without any "helpful" management interface getting in your way. Despite the fact that they'd never heard of my helper application Syncoid, I was able to get its dependencies installed immediately and get right to syncing without any trouble. As a veteran sysadmin... niiiiice.

I had a little more trouble testing their bandwidth, simply because it's hard to get enough bandwidth to really challenge their setup. I spun up a ridiculously expensive Linode instance ($960/mo, thank Ghu for hourly billing!) that claimed to offer 10Gbps outbound bandwidth, but it turned out to be... less impressive. Whether I sent to rsync.net or did a speed test with any speedtest.net provider within 200 miles of the exit point of Linode's network, the results turned out the same—about 57mbps. It's possible that Linode really is offering 10gbps outbound in aggregate but is using traffic-shaping to limit single pipes to what I saw. But I frankly didn't have the time, or the inclination, to test.

Did I mention speedtest.net? Did I mention that rsync.net offers you full root access to your VM and doesn't get in your way? I'm back to my happy place now. A couple of git clones later, I had a working copy of a command-line-only interface to speedtest.net's testing infrastructure on my rsync.net VM, and I could test it that way:

root@3730:/usr/local/bin # python ./speedtest_cli.py  Retrieving speedtest.net configuration...  Retrieving speedtest.net server list...  Testing from Castle Access (69.43.165.28)...  Selecting best server based on latency...  Hosted by I2B Networks Inc (San Diego, CA) [57.35 km]: 4.423 ms  Testing download speed........................................  Download: 501.63 Mbit/s  Testing upload speed..................................................  Upload: 241.71 Mbit/s

502mbps down, 242mbps up. Am I limited by rsync.net there, or by the speedtest.net infrastructure? I honestly don't know. Results were pretty similar with several other speedtest.net servers, so these are some pretty reasonable numbers as far as I can tell. The TL;DR is "almost certainly more bandwidth than you can use," and that's good enough for me... particularly considering that my 1TB VM with rsync.net is only $60/mo.

For a backup target, though, network bandwidth isn't the only concern. What about disk bandwidth? Can you write those incoming ZFS snapshots as fast as the network will carry them? In order to find out, I got a little crafty. Since ZFS supports inline compression and I have no way to be sure rsync.net isn't using it where I can't see it, I wrote a few lines of Perl to generate 1GB of incompressible (pseudo-random) data in memory and then write it repeatedly to the disk.

#!/usr/local/bin/perl    print "Obtaining 1G of pseudorandom data:\n";  my $G1 = `dd if=/dev/urandom bs=1024M count=1 2>/dev/null`;    print "Beginning write test:\n";  my $loop;  open FH, "| pv -s 10G > /mnt/test/linodetest/test.bin";  while ($loop <10) { print FH $G1; $loop++; }  close FH;

Looks good. So let's see what happens when we hammer the disks with a 10G stream of random data:

root@3730:/mnt/test/linodetest # perl ~/test.pl  Obtaining 1G of pseudorandom data:  Beginning write test:    10GiB 0:00:49 [ 208MiB/s] [==================================================>] 100%

Ultimately, I can receive data from the network at about 60MB/sec, which won't stress my storage since it can write at >200 MB/sec. We're solid... seriously solid. From our perspective as the user, it looks just like we have our own beefy machine with 8GB of RAM, a good CPU, and several high-quality hard drives all to ourselves. With a nice fat several-hundred-mbps pipe. In a datacenter. With support. For $60/mo.

Good

  • That price is pretty amazing for what you get—it's about on par with Amazon S3, while offering you any number of things S3 won't and can't give you.
  • rsync.net is so simple. If you know what you're doing, it's going to be extremely easy to work with and extremely difficult to compromise—there's no big complex Internet-facing Web interface to get compromised, it's just you, ssh, and the stuff you put in place.
  • When vulnerabilities do pop up, they're going to be extremely easy to address. FreeBSD is going to patch them, rsync.net is going to apply them (making the vulnerability window extremely small).
  • With no tier 1 support, anybody you talk to is going to be a serious *nix engineer, with serious security policies they understand. The kind of social engineering that owned Mat Honan's iCloud account will be extremely difficult to pull off.

Bad

  • Some of the above strengths are also weaknesses. Again, there is no tier 1 support for rsync.net—if you need support, you're going to be talking to a real, no-kidding *nix engineer.
  • If you have to use that support, well, it can get frustrating. I did have some back and forth with support while writing this review, and I learned some things. (I wasn't aware of the High Performance Networking fork of SSH until I reached out to rsync.net support during the course of reviewing the service.) Despite the fact that the folks at rsync.net knew I was writing a review and that it might end up on the front page of Ars Technica, it generally took anywhere from several hours to a day to get an e-mail response. 

Ugly

  • There's nothing else like rsync.net commercially available right now, but this is a pretty specialized service. Neither I nor rsync.net are likely to advocate it as a replacement for things like Dropbox any time soon.

Jim Salter (@jrssnet) is an author, public speaker, small business owner, mercenary sysadmin, and father of three—not necessarily in that order. He got his first real taste of open source by running Apache on his very own dedicated FreeBSD 3.1 server back in 1999, and he's been a fierce advocate of FOSS ever since. He also created and maintains http://freebsdwiki.net and http://ubuntuwiki.net.

In the name of full disclosure, the author developed and maintained the ZFS replication tool referenced above (Syncoid). And for this piece, the shell sessions would become a lot more cumbersome to follow if replication was instead done manually. While there are commercial options, Syncoid is a fully open source, GPL v3.0 licensed tool. The links above lead directly to the Github repo where it can be freely downloaded and used by anyone.

Tuesday, November 24, 2015

Kill the Password: A String of Characters Won’t Protect You

You have a secret that can ruin your life.

It's not a well-kept secret, either. Just a simple string of characters—maybe six of them if you're careless, 16 if you're cautious—that can reveal everything about you.


Your email. Your bank account. Your address and credit card number. Photos of your kids or, worse, of yourself, naked. The precise location where you're sitting right now as you read these words. Since the dawn of the information age, we've bought into the idea that a password, so long as it's elaborate enough, is an adequate means of protecting all this precious data. But in 2012 that's a fallacy, a fantasy, an outdated sales pitch. And anyone who still mouths it is a sucker—or someone who takes you for one.

No matter how complex, no matter how unique, your passwords can no longer protect you.

Look around. Leaks and dumps—hackers breaking into computer systems and releasing lists of usernames and passwords on the open web—are now regular occurrences. The way we daisy-chain accounts, with our email address doubling as a universal username, creates a single point of failure that can be exploited with devastating results. Thanks to an explosion of personal information being stored in the cloud, tricking customer service agents into resetting passwords has never been easier. All a hacker has to do is use personal information that's publicly available on one service to gain entry into another.

This summer, hackers destroyed my entire digital life in the span of an hour. My Apple, Twitter, and Gmail passwords were all robust—seven, 10, and 19 characters, respectively, all alphanumeric, some with symbols thrown in as well—but the three accounts were linked, so once the hackers had conned their way into one, they had them all. They really just wanted my Twitter handle: @mat. As a three-letter username, it's considered prestigious. And to delay me from getting it back, they used my Apple account to wipe every one of my devices, my iPhone and iPad and MacBook, deleting all my messages and documents and every picture I'd ever taken of my 18-month-old daughter.

The age of the password is over. We just haven't realized it yet.

Since that awful day, I've devoted myself to researching the world of online security. And what I have found is utterly terrifying. Our digital lives are simply too easy to crack. Imagine that I want to get into your email. Let's say you're on AOL. All I need to do is go to the website and supply your name plus maybe the city you were born in, info that's easy to find in the age of Google. With that, AOL gives me a password reset, and I can log in as you.

First thing I do? Search for the word "bank" to figure out where you do your online banking. I go there and click on the Forgot Password? link. I get the password reset and log in to your account, which I control. Now I own your checking account as well as your email.

This summer I learned how to get into, well, everything. With two minutes and $4 to spend at a sketchy foreign website, I could report back with your credit card, phone, and Social Security numbers and your home address. Allow me five minutes more and I could be inside your accounts for, say, Amazon, Best Buy, Hulu, Microsoft, and Netflix. With yet 10 more, I could take over your AT&T, Comcast, and Verizon. Give me 20—total—and I own your PayPal. Some of those security holes are plugged now. But not all, and new ones are discovered every day.

The common weakness in these hacks is the password. It's an artifact from a time when our computers were not hyper-connected. Today, nothing you do, no precaution you take, no long or random string of characters can stop a truly dedicated and devious individual from cracking your account. The age of the password has come to an end; we just haven't realized it yet.

Passwords are as old as civilization. And for as long as they've existed, people have been breaking them.

In 413 BC, at the height of the Peloponnesian War, the Athenian general Demosthenes landed in Sicily with 5,000 soldiers to assist in the attack on Syracusae. Things were looking good for the Greeks. Syracusae, a key ally of Sparta, seemed sure to fall.

But during a chaotic nighttime battle at Epipole, Demosthenes' forces were scattered, and while attempting to regroup they began calling out their watchword, a prearranged term that would identify soldiers as friendly. The Syracusans picked up on the code and passed it quietly through their ranks. At times when the Greeks looked too formidable, the watchword allowed their opponents to pose as allies. Employing this ruse, the undermatched Syracusans decimated the invaders, and when the sun rose, their cavalry mopped up the rest. It was a turning point in the war.

The first computers to use passwords were likely those in MIT's Compatible Time-Sharing System, developed in 1961. To limit the time any one user could spend on the system, CTSS used a login to ration access. It only took until 1962 when a PhD student named Allan Scherr, wanting more than his four-hour allotment, defeated the login with a simple hack: He located the file containing the passwords and printed out all of them. After that, he got as much time as he wanted.

During the formative years of the web, as we all went online, passwords worked pretty well. This was due largely to how little data they actually needed to protect. Our passwords were limited to a handful of applications: an ISP for email and maybe an ecommerce site or two. Because almost no personal information was in the cloud—the cloud was barely a wisp at that point—there was little payoff for breaking into an individual's accounts; the serious hackers were still going after big corporate systems.

So we were lulled into complacency. Email addresses morphed into a sort of universal login, serving as our username just about everywhere. This practice persisted even as the number of accounts—the number of failure points—grew exponentially. Web-based email was the gateway to a new slate of cloud apps. We began banking in the cloud, tracking our finances in the cloud, and doing our taxes in the cloud. We stashed our photos, our documents, our data in the cloud.

Eventually, as the number of epic hacks increased, we started to lean on a curious psychological crutch: the notion of the "strong" password. It's the compromise that growing web companies came up with to keep people signing up and entrusting data to their sites. It's the Band-Aid that's now being washed away in a river of blood.

Every security framework needs to make two major trade-offs to function in the real world. The first is convenience: The most secure system isn't any good if it's a total pain to access. Requiring you to remember a 256-character hexadecimal password might keep your data safe, but you're no more likely to get into your account than anyone else. Better security is easy if you're willing to greatly inconvenience users, but that's not a workable compromise.

A Password Hacker in Action

The following is from a January 2012 live chat between Apple online support and a hacker posing as Brian—a real Apple customer. The hacker's goal: resetting the password and taking over the account.

Apple: Can you answer a question from the account? Name of your best friend?

Hacker: I think that is "Kevin" or "Austin" or "Max."

Apple: None of those answers are correct. Do you think you may have entered last names with the answer?

Hacker: I might have, but I don't think so. I've provided the last 4, is that not enough?

Apple: The last four of the card are incorrect. Do you have another card?

Hacker: Can you check again? I'm looking at my Visa here, the last 4 is "5555."

Apple: Yes, I have checked again. 5555 is not what is on the account. Did you try to reset online and choose email authentication?

Hacker: Yes, but my email has been hacked. I think the hacker added a credit card to the account, as many of my accounts had the same thing happen to them.

Apple: You want to try the first and last name for the best friend question?

Hacker: Be right back. The chicken is burning, sorry. One second.

Apple: OK.

Hacker: Here, I'm back. I think the answer might be Chris? He's a good friend.

Apple: I am sorry, Brian, but that answer is incorrect.

Hacker: Christopher A********h is the full name. Another possibility is Raymond M*******r.

Apple: Both of those are incorrect as well.

Hacker: I'm just gonna list off some friends that might be haha. Brian C**a. Bryan Y***t. Steven M***y.

Apple: How about this. Give me the name of one of your custom mail folders.

Hacker: "Google" "Gmail" "Apple" I think. I'm a programmer at Google.

Apple: OK, "Apple" is correct. Can I have an alternate email address for you?

Hacker: The alternate email I used when I made the account?

Apple: I will need an email address to send you the password reset.

Hacker: Can you send it to "toe@aol.com"?

Apple: The email has been sent.

Hacker: Thanks!

The second trade-off is privacy. If the whole system is designed to keep data secret, users will hardly stand for a security regime that shreds their privacy in the process. Imagine a miracle safe for your bedroom: It doesn't need a key or a password. That's because security techs are in the room, watching it 24/7, and they unlock the safe whenever they see that it's you. Not exactly ideal. Without privacy, we could have perfect security, but no one would accept a system like that.

For decades now, web companies have been terrified by both trade-offs. They have wanted the act of signing up and using their service to seem both totally private and perfectly simple—the very state of affairs that makes adequate security impossible. So they've settled on the strong password as the cure. Make it long enough, throw in some caps and numbers, tack on an exclamation point, and everything will be fine.

But for years it hasn't been fine. In the age of the algorithm, when our laptops pack more processing power than a high-end workstation did a decade ago, cracking a long password with brute force computation takes just a few million extra cycles. That's not even counting the new hacking techniques that simply steal our passwords or bypass them entirely—techniques that no password length or complexity can ever prevent. The number of data breaches in the US increased by 67 percent in 2011, and each major breach is enormously expensive: After Sony's PlayStation account database was hacked in 2011, the company had to shell out $171 million to rebuild its network and protect users from identity theft. Add up the total cost, including lost business, and a single hack can become a billion-dollar catastrophe.

How do our online passwords fall? In every imaginable way: They're guessed, lifted from a password dump, cracked by brute force, stolen with a keylogger, or reset completely by conning a company's customer support department.

Let's start with the simplest hack: guessing. Carelessness, it turns out, is the biggest security risk of all. Despite years of being told not to, people still use lousy, predictable passwords. When security consultant Mark Burnett compiled a list of the 10,000 most common passwords based on easily available sources (like passwords dumped online by hackers and simple Google searches), he found the number one password people used was, yes, "password." The second most popular? The number 123456. If you use a dumb password like that, getting into your account is trivial. Free software tools with names like Cain and Abel or John the Ripper automate password-cracking to such an extent that, very literally, any idiot can do it. All you need is an Internet connection and a list of common passwords—which, not coincidentally, are readily available online, often in database-friendly formats.

What's shocking isn't that people still use such terrible passwords. It's that some companies continue to allow it. The same lists that can be used to crack passwords can also be used to make sure no one is able to choose those passwords in the first place. But saving us from our bad habits isn't nearly enough to salvage the password as a security mechanism.

Our other common mistake is password reuse. During the past two years, more than 280 million "hashes" (i.e., encrypted but readily crackable passwords) have been dumped online for everyone to see. LinkedIn, Yahoo, Gawker, and eHarmony all had security breaches in which the usernames and passwords of millions of people were stolen and then dropped on the open web. A comparison of two dumps found that 49 percent of people had reused usernames and passwords between the hacked sites.

"Password reuse is what really kills you," says Diana Smetters, a software engineer at Google who works on authentication systems. "There is a very efficient economy for exchanging that information." Often the hackers who dump the lists on the web are, relatively speaking, the good guys. The bad guys are stealing the passwords and selling them quietly on the black market. Your login may have already been compromised, and you might not know it—until that account, or another that you use the same credentials for, is destroyed.

Hackers also get our passwords through trickery. The most well-known technique is phishing, which involves mimicking a familiar site and asking users to enter their login information. Steven Downey, CTO of Shipley Energy in Pennsylvania, described how this technique compromised the online account of one of his company's board members this past spring. The executive had used a complex alphanumeric password to protect her AOL email. But you don't need to crack a password if you can persuade its owner to give it to you freely.

The hacker phished his way in: He sent her an email that linked to a bogus AOL page, which asked for her password. She entered it. After that he did nothing. At first, that is. The hacker just lurked, reading all her messages and getting to know her. He learned where she banked and that she had an accountant who handled her finances. He even learned her electronic mannerisms, the phrases and salutations she used. Only then did he pose as her and send an email to her accountant, ordering three separate wire transfers totaling roughly $120,000 to a bank in Australia. Her bank at home sent $89,000 before the scam was detected.

An even more sinister means of stealing passwords is to use malware: hidden programs that burrow into your computer and secretly send your data to other people. According to a Verizon report, malware attacks accounted for 69 percent of data breaches in 2011. They are epidemic on Windows and, increasingly, Android. Malware works most commonly by installing a keylogger or some other form of spyware that watches what you type or see. Its targets are often large organizations, where the goal is not to steal one password or a thousand passwords but to access an entire system.

One devastating example is ZeuS, a piece of malware that first appeared in 2007. Clicking a rogue link, usually from a phishing email, installs it on your computer. Then, like a good human hacker, it sits and waits for you to log in to an online banking account somewhere. As soon as you do, ZeuS grabs your password and sends it back to a server accessible to the hacker. In a single case in 2010, the FBI helped apprehend five individuals in the Ukraine who had employed ZeuS to steal $70 million from 390 victims, primarily small businesses in the US.

Targeting such companies is actually typical. "Hackers are increasingly going after small businesses," says Jeremy Grant, who runs the Department of Commerce's National Strategy for Trusted Identities in Cyberspace. Essentially, he's the guy in charge of figuring out how to get us past the current password regime. "They have more money than individuals and less protection than large corporations."

How to Survive the Password Apocalypse

Until we figure out a better system for protecting our stuff online, here are four mistakes you should never make—and four moves that will make your accounts harder (but not impossible) to crack.—M.H.

DON'T

  • Reuse passwords. If you do, a hacker who gets just one of your accounts will own them all.
  • Use a dictionary word as your password. If you must, then string several together into a pass phrase.
  • Use standard number substitutions. Think "P455w0rd" is a good password? N0p3! Cracking tools now have those built in.
  • Use a short password—no matter how weird. Today's processing speeds mean that even passwords like "h6!r$q" are quickly crackable. Your best defense is the longest possible password.

DO

  • Enable two-factor authentication when offered. When you log in from a strange location, a system like this will send you a text message with a code to confirm. Yes, that can be cracked, but it's better than nothing.
  • Give bogus answers to security questions. Think of them as a secondary password. Just keep your answers memorable. My first car? Why, it was a "Camper Van Beethoven Freaking Rules."
  • Scrub your online presence. One of the easiest ways to hack into an account is through your email and billing address information. Sites like Spokeo and WhitePages.com offer opt-out mechanisms to get your information removed from their databases.
  • Use a unique, secure email address for password recoveries. If a hacker knows where your password reset goes, that's a line of attack. So create a special account you never use for communications. And make sure to choose a username that isn't tied to your name—like m****n@wired.com—so it can't be easily guessed.

If our problems with passwords ended there, we could probably save the system. We could ban dumb passwords and discourage reuse. We could train people to outsmart phishing attempts. (Just look closely at the URL of any site that asks for a password.) We could use antivirus software to root out malware.

But we'd be left with the weakest link of all: human memory. Passwords need to be hard in order not to be routinely cracked or guessed. So if your password is any good at all, there's a very good chance you'll forget it—especially if you follow the prevailing wisdom and don't write it down. Because of that, every password-based system needs a mechanism to reset your account. And the inevitable trade-offs (security versus privacy versus convenience) mean that recovering a forgotten password can't be too onerous. That's precisely what opens your account to being easily overtaken via social engineering. Although "socialing" was responsible for just 7 percent of the hacking cases that government agencies tracked last year, it raked in 37 percent of the total data stolen.

Socialing is how my Apple ID was stolen this past summer. The hackers persuaded Apple to reset my password by calling with details about my address and the last four digits of my credit card. Because I had designated my Apple mailbox as a backup address for my Gmail account, the hackers could reset that too, deleting my entire account—eight years' worth of email and documents—in the process. They also posed as me on Twitter and posted racist and antigay diatribes there.

After my story set off a wave of publicity, Apple changed its practices: It temporarily quit issuing password resets over the phone. But you could still get one online. And so a month later, a different exploit was used against New York Times technology columnist David Pogue. This time the hackers were able to reset his password online by getting past his "security questions."

You know the drill. To reset a lost login, you need to supply answers to questions that (supposedly) only you know. For his Apple ID, Pogue had picked (1) What was your first car? (2) What is your favorite model of car? and (3) Where were you on January 1, 2000? Answers to the first two were available on Google: He had written that a Corolla had been his first car, and had recently sung the praises of his Toyota Prius. The hackers just took a wild guess on the third question. It turns out that at the dawn of the new millennium, David Pogue, like the rest of the world, was at a "party."

With that, the hackers were in. They dove into his address book (he's pals with magician David Blaine!) and locked him out of his kitchen iMac.

OK, you might think, but that could never happen to me: David Pogue is Internet- famous, a prolific writer for the major media whose every brain wave goes online. But have you thought about your LinkedIn account? Your Facebook page? Your kids' pages or your friends' or family's? If you have a serious web presence, your answers to the standard questions—still often the only options available—are trivial to root out. Your mother's maiden name is on Ancestry.com, your high school mascot is on Classmates, your birthday is on Facebook, and so is your best friend's name—even if it takes a few tries.

The ultimate problem with the password is that it's a single point of failure, open to many avenues of attack. We can't possibly have a password-based security system that's memorable enough to allow mobile logins, nimble enough to vary from site to site, convenient enough to be easily reset, and yet also secure against brute-force hacking. But today that's exactly what we're banking on—literally.

Who is doing this? Who wants to work that hard to destroy your life? The answer tends to break down into two groups, both of them equally scary: overseas syndicates and bored kids.

The syndicates are scary because they're efficient and wildly prolific. Malware and virus-writing used to be something hobbyist hackers did for fun, as proofs of concept. Not anymore. Sometime around the mid-2000s, organized crime took over. Today's virus writer is more likely to be a member of the professional criminal class operating out of the former Soviet Union than some kid in a Boston dorm room. There's a good reason for that: money.

Given the sums at stake—in 2011 Russian-speaking hackers alone took in roughly $4.5 billion from cybercrime—it's no wonder that the practice has become organized, industrialized, and even violent. Moreover, they are targeting not just businesses and financial institutions but individuals too. Russian cybercriminals, many of whom have ties to the traditional Russian mafia, took in tens of millions of dollars from individuals last year, largely by harvesting online banking passwords through phishing and malware schemes. In other words, when someone steals your Citibank password, there's a good chance it's the mob.

But teenagers are, if anything, scarier, because they're so innovative. The groups that hacked David Pogue and me shared a common member: a 14-year-old kid who goes by the handle "Dictate." He isn't a hacker in the traditional sense. He's just calling companies or chatting with them online and asking for password resets. But that does not make him any less effective. He and others like him start by looking for information about you that's publicly available: your name, email, and home address, for example, which are easy to get from sites like Spokeo and WhitePages.com. Then he uses that data to reset your password in places like Hulu and Netflix, where billing information, including the last four digits of your credit card number, is kept visibly on file. Once he has those four digits, he can get into AOL, Microsoft, and other crucial sites. Soon, through patience and trial and error, he'll have your email, your photos, your files—just as he had mine.

Click to Open Overlay GalleryMatthew Prince protected his Google Apps account with a second code that would be sent to his phone—so the hackers got his cell account. Ethan Hill

Why do kids like Dictate do it? Mostly just for lulz: to fuck shit up and watch it burn. One favorite goal is merely to piss off people by posting racist or otherwise offensive messages on their personal accounts. As Dictate explains, "Racism invokes a funnier reaction in people. Hacking, people don't care too much. When we jacked @jennarose3xo"—aka Jenna Rose, an unfortunate teen singer whose videos got widely hate-watched in 2010—"I got no reaction from just tweeting that I jacked her stuff. We got a reaction when we uploaded a video of some black guys and pretended to be them." Apparently, sociopathy sells.

A lot of these kids came out of the Xbox hacking scene, where the networked competition of gamers encouraged kids to learn cheats to get what they wanted. In particular they developed techniques to steal so-called OG (original gamer) tags—the simple ones, like Dictate instead of Dictate27098—from the people who'd claimed them first. One hacker to come out of that universe was "Cosmo," who was one of the first to discover many of the most brilliant socialing exploits out there, including those used on Amazon and PayPal. ("It just came to me," he said with pride when I met him a few months ago at his grandmother's house in southern California.) In early 2012, Cosmo's group, UGNazi, took down sites ranging from Nasdaq to the CIA to 4chan. It obtained personal information about Michael Bloomberg, Barack Obama, and Oprah Winfrey. When the FBI finally arrested this shadowy figure in June, they found that he was just 15 years old; when he and I met a few months later, I had to drive.

It's precisely because of the relentless dedication of kids like Dictate and Cosmo that the password system cannot be salvaged. You can't arrest them all, and even if you did, new ones would keep growing up. Think of the dilemma this way: Any password-reset system that will be acceptable to a 65-year-old user will fall in seconds to a 14-year-old hacker.

For the same reason, many of the silver bullets that people imagine will supplement—and save—passwords are vulnerable as well. For example, last spring hackers broke into the security company RSA and stole data relating to its SecurID tokens, supposedly hack-proof devices that provide secondary codes to accompany passwords. RSA never divulged just what was taken, but it's widely believed that the hackers got enough data to duplicate the numbers the tokens generate. If they also learned the tokens' device IDs, they'd be able to penetrate the most secure systems in corporate America.

On the consumer side, we hear a lot about the magic of Google's two-factor authentication for Gmail. It works like this: First you confirm a mobile phone number with Google. After that, whenever you try to log in from an unfamiliar IP address, the company sends an additional code to your phone: the second factor. Does this keep your account safer? Absolutely, and if you're a Gmail user, you should enable it this very minute. Will a two-factor system like Gmail's save passwords from obsolescence? Let me tell you about what happened to Matthew Prince.

This past summer UGNazi decided to go after Prince, CEO of a web performance and security company called CloudFlare. They wanted to get into his Google Apps account, but it was protected by two-factor. What to do? The hackers hit his AT&T cell phone account. As it turns out, AT&T uses Social Security numbers essentially as an over-the-phone password. Give the carrier those nine digits—or even just the last four—along with the name, phone number, and billing address on an account and it lets anyone add a forwarding number to any account in its system. And getting a Social Security number these days is simple: They're sold openly online, in shockingly complete databases.

Prince's hackers used the SSN to add a forwarding number to his AT&T service and then made a password-reset request with Google. So when the automated call came in, it was forwarded to them. Voilà—the account was theirs. Two-factor just added a second step and a little expense. The longer we stay on this outdated system—the more Social Security numbers that get passed around in databases, the more login combinations that get dumped, the more we put our entire lives online for all to see—the faster these hacks will get.

The age of the password has come to an end; we just haven't realized it yet. And no one has figured out what will take its place. What we can say for sure is this: Access to our data can no longer hinge on secrets—a string of characters, 10 strings of characters, the answers to 50 questions—that only we're supposed to know. The Internet doesn't do secrets. Everyone is a few clicks away from knowing everything.

Instead, our new system will need to hinge on who we are and what we do: where we go and when, what we have with us, how we act when we're there. And each vital account will need to cue off many such pieces of information—not just two, and definitely not just one.

This last point is crucial. It's what's so brilliant about Google's two-factor authentication, but the company simply hasn't pushed the insight far enough. Two factors should be a bare minimum. Think about it: When you see a man on the street and think it might be your friend, you don't ask for his ID. Instead, you look at a combination of signals. He has a new haircut, but does that look like his jacket? Does his voice sound the same? Is he in a place he's likely to be? If many points don't match, you wouldn't believe his ID; even if the photo seemed right, you'd just assume it had been faked.

And that, in essence, will be the future of online identity verification. It may very well include passwords, much like the IDs in our example. But it will no longer be a password-based system, any more than our system of personal identification is based on photo IDs. The password will be just one token in a multifaceted process. Jeremy Grant of the Department of Commerce calls this an identity ecosystem.

Click to Open Overlay Gallery"Cosmo," a teenage hacker in Long Beach, California, used social-engineering exploits to crack accounts at Amazon, AOL, AT&T, Microsoft, Netflix, PayPal, and more. Photo: Sandra Garcia

What about biometrics? After watching lots of movies, many of us would like to think that a fingerprint reader or iris scanner could be what passwords used to be: a single-factor solution, an instant verification. But they both have two inherent problems. First, the infrastructure to support them doesn't exist, a chicken-or-egg issue that almost always spells death for a new technology. Because fingerprint readers and iris scanners are expensive and buggy, no one uses them, and because no one uses them, they never become cheaper or better.

The second, bigger problem is also the Achilles' heel of any one-factor system: A fingerprint or iris scan is a single piece of data, and single pieces of data will be stolen. Dirk Balfanz, a software engineer on Google's security team, points out that passcodes and keys can be replaced, but biometrics are forever: "It's hard for me to get a new finger if my print gets lifted off a glass," he jokes. While iris scans look groovy in the movies, in the age of high-definition photography, using your face or your eye or even your fingerprint as a one-stop verification just means that anyone who can copy it can also get in.

Does that sound far-fetched? It's not. Kevin Mitnick, the fabled social engineer who spent five years in prison for his hacking heroics, now runs his own security company, which gets paid to break into systems and then tell the owners how it was done. In one recent exploit, the client was using voice authentication. To get in, you had to recite a series of randomly generated numbers, and both the sequence and the speaker's voice had to match. Mitnick called his client and recorded their conversation, tricking him into using the numbers zero through nine in conversation. He then split up the audio, played the numbers back in the right sequence, and—presto.

None of this is to say that biometrics won't play a crucial role in future security systems. Devices might require a biometric confirmation just to use them. (Android phones can already pull this off, and given Apple's recent purchase of mobile-biometrics firm AuthenTec, it seems a safe bet that this is coming to iOS as well.) Those devices will then help to identify you: Your computer or a remote website you're trying to access will confirm a particular device. Already, then, you've verified something you are and something you have. But if you're logging in to your bank account from an entirely unlikely place—say, Lagos, Nigeria—then you may have to go through a few more steps. Maybe you'll have to speak a phrase into the microphone and match your voiceprint. Maybe your phone's camera snaps a picture of your face and sends it to three friends, one of whom has to confirm your identity before you can proceed.

In many ways, our data providers will learn to think somewhat like credit card companies do today: monitoring patterns to flag anomalies, then shutting down activity if it seems like fraud. "A lot of what you'll see is that sort of risk analytics," Grant says. "Providers will be able to see where you're logging in from, what kind of operating system you're using."

Google is already pushing in this direction, going beyond two-factor to examine each login and see how it relates to the previous one in terms of location, device, and other signals the company won't disclose. If it sees something aberrant, it will force a user to answer questions about the account. "If you can't pass those questions," Smetters says, "we'll send you a notification and tell you to change your password—because you've been owned."

The other thing that's clear about our future password system is which trade-off—convenience or privacy—we'll need to make. It's true that a multifactor system will involve some minor sacrifices in convenience as we jump through various hoops to access our accounts. But it will involve far more significant sacrifices in privacy. The security system will need to draw upon your location and habits, perhaps even your patterns of speech or your very DNA.

We need to make that trade-off, and eventually we will. The only way forward is real identity verification: to allow our movements and metrics to be tracked in all sorts of ways and to have those movements and metrics tied to our actual identity. We are not going to retreat from the cloud—to bring our photos and email back onto our hard drives. We live there now. So we need a system that makes use of what the cloud already knows: who we are and who we talk to, where we go and what we do there, what we own and what we look like, what we say and how we sound, and maybe even what we think.

That shift will involve significant investment and inconvenience, and it will likely make privacy advocates deeply wary. It sounds creepy. But the alternative is chaos and theft and yet more pleas from "friends" in London who have just been mugged. Times have changed. We've entrusted everything we have to a fundamentally broken system. The first step is to acknowledge that fact. The second is to fix it.

Mat Honan (@mat) is a senior writer for Wired and Wired.com's Gadget Lab.

Saturday, November 21, 2015

How to map OneDrive as a network drive letter

How to map OneDrive as a network drive letter
One nice feature with Microsoft OneDrive over many other folder sync services such as Dropbox is that it can be mapped as a network drive letter, much like a NAS drive on a home network. The free OneDrive service provides 15GB of space and this is doubled to 30GB for those who use the mobile app to sync the phone's photos.

The main advantage with accessing OneDrive as a network drive is that no files are stored on the computer. This is particularly useful for laptop or Windows tablet users with a small SSD or someone with several hundred gigabytes of data on OneDrive (e.g. with the Office 365 Home subscription). The obvious catch is that files take longer to open and save and that it totally depends on an Internet connection, unlike the sync app where synced files can be accessed and edited offline. Of course a workaround is to copy & paste the files you need to a local folder to use in an location without connectivity, then copy the changed files back later on.

Microsoft has a support article showing how to map OneDrive as a network drive letter using a loophole in the 'Save to Web' feature in Word 2010, but for those looking for a quicker way or don't have Word 2010, the following guide should work. The Microsoft guide also talks about setting up a Windows Live ID online provider in Windows, but so far I haven't had any issues across several computers without doing that.

1. Go to the OneDrive website www.onedrive.com and sign in.

2. Right-click on 'Files' at the top-left and copy the link ("Copy shortcut" in Internet Explorer, "Copy Link Location" in Firefox or "Copy Link Address" in Chrome):

Name:  OneDrive_Copy_Link.png  Views: 58423  Size:  22.7 KB

3. Open up Notepad and paste the link. Copy the CID code similar to as shown below:

Name:  OneDrive_cid.png  Views: 58343  Size:  11.7 KB

4. Go into Windows Explorer and click on 'Map Network Drive':

Name:  Map_network_drive.png  Views: 58335  Size:  29.9 KB

5. Choose a drive letter to use, then type in the address "https://d.docs.live.net/", followed by the CID code you got in step 3, so it looks like the following:

Name:  Map_Network_Drive_Details.png  Views: 58692  Size:  32.1 KB

6. Tick the option "Connect using different credentials".

7. If you would like this network dive to be remembered, tick "Reconnect at logon". Note that doing so can make Windows Explorer take several seconds to appear (including 'Save As', 'Open', etc. screens).

8. Click on Finish. After a few seconds, it will ask you to log in, so type in your Windows live e-mail address and password. If you chose "Reconnect at logon" for step 7 and don't wish to keep typing in your logon details each time the PC boots, tick "Remember my credentials".

9. Click on 'OK'. If all goes well, the drive network drive should appear:

Name:  OneDrive_as_network_drive_letter.png  Views: 58349  Size:  36.4 KB

Sunday, October 11, 2015

Anatomy of a hack: How crackers ransack passwords like “qeadzcwrsfxv1331”

In March, readers followed along as Nate Anderson, Ars deputy editor and a self-admitted newbie to password cracking, downloaded a list of more than 16,000 cryptographically hashed passcodes. Within a few hours, he deciphered almost half of them. The moral of the story: if a reporter with zero training in the ancient art of password cracking can achieve such results, imagine what more seasoned attackers can do.

Imagine no more. We asked three cracking experts to attack the same list Anderson targeted and recount the results in all their color and technical detail Iron Chef style. The results, to say the least, were eye opening because they show how quickly even long passwords with letters, numbers, and symbols can be discovered.

The list contained 16,449 passwords converted into hashes using the MD5 cryptographic hash function. Security-conscious websites never store passwords in plaintext. Instead, they work only with these so-called one-way hashes, which are incapable of being mathematically converted back into the letters, numbers, and symbols originally chosen by the user. In the event of a security breach that exposes the password data, an attacker still must painstakingly guess the plaintext for each hash—for instance, they must guess that "5f4dcc3b5aa765d61d8327deb882cf99" and "7c6a180b36896a0a8c02787eeafb0e4c" are the MD5 hashes for "password" and "password1" respectively. (For more details on password hashing, see the earlier Ars feature "Why passwords have never been weaker—and crackers have never been stronger.")

While Anderson's 47-percent success rate is impressive, it's miniscule when compared to what real crackers can do, as Anderson himself made clear. To prove the point, we gave them the same list and watched over their shoulders as they tore it to shreds. To put it mildly, they didn't disappoint. Even the least successful cracker of our trio—who used the least amount of hardware, devoted only one hour, used a tiny word list, and conducted an interview throughout the process—was able to decipher 62 percent of the passwords. Our top cracker snagged 90 percent of them.

The Ars password team included a developer of cracking software, a security consultant, and an anonymous cracker. The most thorough of the three cracks was carried out by Jeremi Gosney, a password expert with Stricture Consulting Group. Using a commodity computer with a single AMD Radeon 7970 graphics card, it took him 20 hours to crack 14,734 of the hashes, a 90-percent success rate. Jens Steube, the lead developer behind oclHashcat-plus, achieved impressive results as well. (oclHashcat-plus is the freely available password-cracking software both Anderson and all crackers in this article used.) Steube unscrambled 13,486 hashes (82 percent) in a little more than one hour, using a slightly more powerful machine that contained two AMD Radeon 6990 graphics cards. A third cracker who goes by the moniker radix deciphered 62 percent of the hashes using a computer with a single 7970 card—also in about one hour. And he probably would have cracked more had he not been peppered with questions throughout the exercise.

The list of "plains," as many crackers refer to deciphered hashes, contains the usual list of commonly used passcodes that are found in virtually every breach involving consumer websites. "123456," "1234567," and "password" are there, as is "letmein," "Destiny21," and "pizzapizza." Passwords of this ilk are hopelessly weak. Despite the additional tweaking, "p@$$word," "123456789j," "letmein1!," and "LETMEin3" are equally awful. But sprinkled among the overused and easily cracked passcodes in the leaked list are some that many readers might assume are relatively secure. ":LOL1313le" is in there, as are "Coneyisland9/," "momof3g8kids," "1368555av," "n3xtb1gth1ng," "qeadzcwrsfxv1331," "m27bufford," "J21.redskin," "Garrett1993*," and "Oscar+emmy2."

A screenshot showing a small sampling of cracked passwords.

As big as the word lists that all three crackers in this article wielded—close to 1 billion strong in the case of Gosney and Steube—none of them contained "Coneyisland9/," "momof3g8kids," or the more than 10,000 other plains that were revealed with just a few hours of effort. So how did they do it? The short answer boils down to two variables: the website's unfortunate and irresponsible use of MD5 and the use of non-randomized passwords by the account holders.

Life in the fast lane

"These are terrible passwords," radix, who declined to give his real name, told Ars just a few minutes into run one of his hour-long cracking session. "There's probably not a complexity requirement for them. The hashing alone being MD5 tells me that they really don't care about their passwords too much, so it's probably some pre-generated site."

Like SHA1, SHA3, and most other algorithms, MD5 was designed to convert plaintext into hashes, also known as "message digests," quickly and with a minimal amount of computation. That works in the favor of crackers. Armed with a single graphics processor, they can cycle through more than eight billion password combinations each second when attacking "fast" hashes. By contrast, algorithms specifically designed to protect passwords require significantly more time and computation. For instance, the SHA512crypt function included by default in Mac OS X and most Unix-based operating systems passes text through 5,000 hashing iterations. This hurdle would limit the same one-GPU cracking system to slightly less than 2,000 guesses per second. Examples of other similarly "slow" hashing algorithms include bcrypt, scrypt, and PBKDF2.

The other variable was the account holders' decision to use memorable words. The characteristics that made "momof3g8kids" and "Oscar+emmy2" easy to remember are precisely the things that allowed them to be cracked. Their basic components—"mom," "kids," "oscar," "emmy," and numbers—are a core part of even basic password-cracking lists. The increasing power of hardware and specialized software makes it trivial for crackers to combine these ingredients in literally billions of slightly different permutations. Unless the user takes great care, passwords that are easy to remember are sitting ducks in the hands of crackers.

What's more, like the other two crackers profiled in this article, radix didn't know where the password list was taken from, eliminating one of the key techniques crackers use when deciphering leaked hashes. "If I knew the site, I would go there and find out what the requirements are," he said. The information would have allowed radix to craft custom rule sets targeted at the specific hashes he was trying to crack.

Anatomy of a crack

The longer answer to how these relatively stronger passwords were revealed requires comparing and contrasting the approaches of the three crackers. Because their equipment and the amount of time they devoted to the exercise differed, readers shouldn't assume one cracker's technique was superior to those of the others. That said, all three cracks resembled video games where each successive level is considerably harder than the last. The first stage of each attack typically cracked in excess of 50 percent of the hashes, with each stage that came later cracking smaller and smaller percentages. By the time they got to the latest rounds, they considered themselves lucky to get more than a few hundred plains.

True to that pattern, Gosney's first stage cracked 10,233 hashes, or 62 percent of the leaked list, in just 16 minutes. It started with a brute-force crack for all passwords containing one to six characters, meaning his computer tried every possible combination starting with "a" and ending with "//////." Because guesses have a maximum length of six and are comprised of 95 characters—that's 26 lower-case letters, 26 upper-case letters, 10 digits, and 33 symbols—there are a manageable number of total guesses. This is calculated by adding the sum of 956 + 955 + 954 + 953 + 952 + 95. It took him just two minutes and 32 seconds to complete the round, and it yielded the first 1,316 plains of the exercise.

Beyond a length of six, however, Gosney was highly selective about the types of brute-force attacks he tried. That's because of the exponentially increasing number of guesses each additional character creates. While it took only hours to brute-force all passwords from one to six characters, it would have taken Gosney days, weeks, or even years to brute-force longer passwords. Robert Graham, the CEO of Errata Security who has calculated the requirements, refers to this limitation as the "exponential wall of brute-force cracking."

Enlarge / Brute-force cracks work well against shorter passwords. The technique can take days or months for longer passcodes, even when using Amazon's cloud-based EC2 service.

Recognizing these limits, Gosney next brute-force cracked all passwords of length seven or eight that contained only lower letters. That significantly reduced the time required and still cracked 1,618 hashes. He tried all passwords of length seven or eight that contained only upper letters to reveal another 708 plains. Because their "keyspace" was the sum of 268 + 267, each of these steps was completed in 41 seconds. Next, he brute-forced all passwords made up solely of numbers from one to 12 digits long. It cracked 312 passcodes and took him three minutes and 21 seconds.

It was only then that Gosney turned to his word lists, which he has spent years fine tuning. Augmenting the lists with the "best64" rule set built into Hashcat, he was able to crack 6,228 hashes in just nine minutes and four seconds. To complete stage one, he ran all the plains he had just captured in the previous rounds through a different rule set known as "d3ad0ne" (named after its creator who is a recognized password expert). It took one second to complete and revealed 51 more plains.

"Normally I start by brute-forcing all characters from length one to length six because even on a single GPU, this attack completes nearly instantly with fast hashes," Gosney explained in an e-mail. He continued:

And because I can brute-force this really quickly, I have all of my wordlists filtered to only include words that are at least six chars long. This helps to save disk space and also speeds up wordlist-based attacks. Same thing with digits. I can just brute-force numerical passwords very quickly, so there are no digits in any of my wordlists. Then I go straight to my wordlists + best64.rule since those are the most probable patterns, and larger rule sets take much longer to run. Our goal is to find the most plains in the least amount of time, so we want to find as much low-hanging fruit as possible first.

Cracking the weakest passwords first is especially helpful when hashes contain cryptographic salt. Originally devised to thwart rainbow tables and other types of precomputed techniques, salting appends random characters to each password before it is hashed. Besides defeating rainbow tables, salting slows down brute-force and dictionary attacks because hashes must be cracked one at a time rather than all of them at once.

But the thing about salting is this: it slows down cracking only by a multiple of the number of unique salts in a given list. That means the benefit of salting diminishes with each cracked hash. By cracking the weakest passwords as quickly as possible first (an optimization offered by Hashcat) crackers can greatly diminish the minimal amount of protection salting might provide against cracking. Of course, none of this applies in this exercise since the leaked MD5 wasn't salted.

With 10,233 hashes cracked in stage one, it was time for stage two, which consisted of a series of hybrid attacks. True to the video game analogy mentioned earlier, this second stage of attacks took considerably longer than the first one and recovered considerably fewer plains—to be exact, five hours and 12 minutes produced 2,702 passwords.

As the name implies, a hybrid attack marries a dictionary attack with a brute-force attack, a combination that greatly expands the reach of a well-honed word list while keeping the keyspace to a manageable length. The first round of this stage appended all possible two-characters strings containing digits or symbols to the end of each word in his dictionary. It recovered 585 plains and took 11 minutes and 25 seconds to run. Round two appended all possible three-character strings containing digits or symbols. It cracked 527 hashes and required 58 minutes to complete. The third round, which appended all four-digit number strings, took 25 minutes and recovered 435 plains. Round four appended all possible strings containing three lower-case letters and digits and acquired 451 more passwords.

As fruitful as these attacks were, Gosney said they were handicapped by his use of a single graphics card for this exercise.

"For example, you'll notice that when I was doing hybrid attacks, I appended 2-3 digits/special but then only did digits with length 4," he explained. "This is because doing digits/special for length 4 would have taken a really long time with just one GPU, so I skipped it. Same with when I started appending lower alpha/digits, I only did length 3 because length 4 would have taken too long with just one GPU."

No doubt, Gosney could have attacked much larger keyspaces had he used the monster 25-GPU cluster he unveiled in December. Because the graphics cards in the five-server system scale almost linearly, it's able to harness almost all of their combined power. As a result, it can achieve 350 billion guesses per second when cracking password hashes generated by Microsoft's NTLM algorithm. And it could generate similar results when going up against MD5 and other fast hash functions.

The remaining hybrid attacks in stage two continued in the same vein. By the time it was completed, he had cracked a total of 12,935 hashes, or 78.6 percent of the list, and had spent a total of just 5 hours and 28 minutes doing it.

One of the things Gosney and other crackers have found is that passwords for a particular site are remarkably similar, despite being generated by users who have never met each other. After cracking such a large percentage of hashes from this unknown site, the next step was to analyze the plains and mimic the patterns when attempting to guess the remaining passwords. The result is a series of statistically generated brute-force attacks based on a mathematical system known as Markov chains. Hashcat makes it simple to implement this method. By looking at the list of passwords that already have been cracked, it performs probabilistically ordered, per-position brute-force attacks. Gosney thinks of it as an "intelligent brute-force" that uses statistics to drastically limit the keyspace.

Where a classic brute-force tries "aaa," "aab," "aac," and so on, a Markov attack makes highly educated guesses. It analyzes plains to determine where certain types of characters are likely to appear in a password. A Markov attack with a length of seven and a threshold of 65 tries all possible seven-character passwords with the 65 most likely characters for each position. It drops the keyspace of a classic brute-force from 957 to 657, a benefit that saves an attacker about four hours. And since passwords show surprising uniformity when it comes to the types of characters used in each position—in general, capital letters come at the beginning, lower-case letters come in the middle, and symbols and numbers come at the end—Markov attacks are able crack almost as many passwords as a straight brute-force.

"This is where your attack plan deviates from the standard and becomes unique, because now you're doing site-specific attacks," Gosney said. "From there, if you start hitting upon any interesting patterns, you just start chasing those patterns down the rabbit hole. Once you've fully exploited one pattern you move on to the next."

In all, it took Gosney 14 hours and 59 minutes to complete this third stage, which besides Markov attacks included several other custom wordlists combined with rules. Providing further evidence of the law of diminishing returns that dictates password cracking, it yielded 1,699 more passwords. It's interesting to note that the increasing difficulty is experienced even within this last step itself. It took about three hours to cover the first 962 plains in this stage and 12 hours to get the remaining 737.

The other two password experts who cracked this list used many of the same techniques and methods, although not in the same sequence and with vastly different tools. The only wordlist used by radix, for example, came directly from the 2009 breach of online games service RockYou. Because the SQL-injection hack exposed more than 14 million unique passwords in plaintext, the list represents the largest corpus of real-world passwords ever to be made public. radix has a much bigger custom-compiled dictionary, but like a magician who doesn't want to reveal the secret behind a trick, he kept it under wraps during this exercise.

Killing hashes

Like Nate Anderson's foray into password cracking, radix was able to crack 4,900 of the passwords, nearly 30 percent of the haul, solely by using the RockYou list. He then took the same list, cut the last four characters off each of the words, and appended every possible four-digit number to the end. Hashcat told him it would take two hours to complete, which was longer than he wanted to spend. Even after terminating the run two after 20 minutes, he had cracked 2,136 more passcodes. radix then tried brute-forcing all numbers, starting with a single digit, then two digits, then three digits, and so on (259 additional plains recovered).

He seemed to choose techniques for his additional runs almost at random. But in reality, it was a combination of experience, intuition, and possibly a little luck.

"It's all about analysis, gut feelings, and maybe a little magic," he said. "Identify a pattern, run a mask, put recovered passes in a new dict, run again with rules, identify a new pattern, etc. If you know the source of the hashes, you scrape the company website to make a list of words that pertain to that specific field of business and then manipulate it until you are happy with your results."

He then ran the 7,295 plains he recovered so far through PACK, short for the Password Analysis and Cracking Toolkit (developed by password expert Peter Kacherginsky), and noticed some distinct patterns. A third of them contained eight characters, 19 percent contained nine characters, and 16 percent contained six characters. PACK also reported that 69 percent of the plains were "stringdigit" meaning a string of letters or symbols that ended with numbers. He also noticed that 62 percent of the recovered passwords were classified as "loweralphanum," meaning they consisted solely of lower-case letters and numbers.

This information gave him fodder for his next series of attacks. In run 4, he ran a mask attack. This is similar to the hybrid attack mentioned earlier, and it brings much of the benefit of a brute-force attack while drastically reducing the time it takes to run it. The first one tried all possible combinations of lower-case letters and numbers, from one to six characters long (341 more plains recovered). The next step would have been to try all combinations of lower-case letters and numbers with a length of eight. But that would have required more time than radix was willing to spend. He then considered trying all passwords with a length of eight that contained only lower-case letters. Because the attack excludes upper case letters, the search space was manageable, 268 instead of 528. With radix's machine, that was the difference between spending a little more than one minute and six hours respectively. The lower threshold was still more time than he wanted to spend, so he skipped that step too.

So radix then shifted his strategy and used some of the rule sets built into Hashcat. One of them allows Hashcat to try a random combination of 5,120 rules, which can be anything from swapping each "e" with a "3," pulling the first character off each word, or adding a digit between each character. In just 38 seconds the technique recovered 1,940 more passwords.

"That's the thrill of it," he said. "It's kind of like hunting, but you're not killing animals. You're killing hashes. It's like the ultimate hide and seek." Then acknowledging the dark side of password cracking, he added: "If you're on the slightly less moral side of it, it has huge implications."

Steube also cracked the list of leaked hashes with aplomb. While the total number of words in his custom dictionaries is much larger, he prefers to work with a "dict" of just 111 million words and pull out the additional ammunition only when a specific job calls for it. The words are ordered from most to least commonly used. That way, a particular run will crack the majority of the hashes early on and then slowly taper off. "I wanted it to behave like that so I can stop when things get slower," he explained.

Early in the process, Steube couldn't help remarking when he noticed one of the plains he had recovered was "momof3g8kids."

"This was some logic that the user had," Steube observed. "But we didn't know about the logic. By doing hybrid attacks, I'm getting new ideas about how people build new [password] patterns. This is why I'm always watching outputs."

The specific type of hybrid attack that cracked that password is known as a combinator attack. It combines each word in a dictionary with every other word in the dictionary. Because these attacks are capable of generating a huge number of guesses—the square of the number of words in the dict—crackers often work with smaller word lists or simply terminate a run in progress once things start slowing down. Other times, they combine words from one big dictionary with words from a smaller one. Steube was able to crack "momof3g8kids" because he had "momof3g" in his 111 million dict and "8kids" in a smaller dict.

"The combinator attack got it! It's cool," he said. Then referring to the oft-cited xkcd comic, he added: "This is an answer to the batteryhorsestaple thing."

What was remarkable about all three cracking sessions were the types of plains that got revealed. They included passcodes such as "k1araj0hns0n," "Sh1a-labe0uf," "Apr!l221973," "Qbesancon321," "DG091101%," "@Yourmom69," "ilovetofunot," "windermere2313," "tmdmmj17," and "BandGeek2014." Also included in the list: "all of the lights" (yes, spaces are allowed on many sites), "i hate hackers," "allineedislove," "ilovemySister31," "iloveyousomuch," "Philippians4:13," "Philippians4:6-7," and "qeadzcwrsfxv1331." "gonefishing1125" was another password Steube saw appear on his computer screen. Seconds after it was cracked, he noted, "You won't ever find it using brute force."

The ease these three crackers had converting hashes into their underlying plaintext contrasts sharply with the assurances many websites issue when their password databases are breached. Last month, when daily coupons site LivingSocial disclosed a hack that exposed names, addresses, and password hashes for 50 million users, company executives downplayed the risk.

"Although your LivingSocial password would be difficult to decode, we want to take every precaution to ensure that your account is secure, so we are expiring your old password and requesting that you create a new one," CEO Tim O'Shaughnessy told customers.

In fact, there's almost nothing preventing crackers from deciphering the hashes. LivingSocial used the SHA1 algorithm, which as mentioned earlier is woefully inadequate for password hashing. He also mentioned that the hashes had been "salted," meaning a unique set of bits had been added to each users' plaintext password before it was hashed. It turns out that this measure did little to mitigate the potential threat. That's because salt is largely a protection against rainbow tables and other types of precomputed attacks, which almost no one ever uses in real-world cracks. The file sizes involved in rainbow attacks are so unwieldy that they fell out of vogue once GPU-based cracking became viable. (LivingSocial later said it's in the process of transitioning to the much more secure bcrypt function.)

Officials with Reputation.com, a service that helps people and companies manage negative search results, borrowed liberally from the same script when disclosing their own password breach a few days later. "Although it was highly unlikely that these passwords could ever be decrypted, we immediately changed the password of every user to prevent any possible unauthorized account access," a company e-mail told customers.

Both companies should have said that, with the hashes exposed, users should presume their passwords are already known to the attackers. After all, cracks against consumer websites typically recover 60 percent to 90 percent of passcodes. Company officials also should have warned customers who used the same password on other sites to change them immediately.

To be fair, since both sites salted their hashes, the cracking process would have taken longer to complete against large numbers of hashes. But salting does nothing to slow down the cracking of a single hash and does little to slow down attacks on small numbers of hashes. This means that certain targeted individuals who used the hacked sites—for example, bank executives, celebrities, or other people of particular interest to the attackers—weren't protected at all by salting.

The prowess of these three crackers also underscores the need for end users to come up with better password hygiene. Many Fortune 500 companies tightly control the types of passwords employees are allowed to use to access e-mail and company networks, and they go a long way to dampen crackers' success.

"On the corporate side, its so different," radix said. "When I'm doing a password audit for a firm to make sure password policies are properly enforced, it's madness. You could go three days finding absolutely nothing."

Websites could go a long way to protect their customers if they enforced similar policies. In the coming days, Ars will publish a detailed primer on passwords managers. It will show how to use them to generate long, random passcodes that are unique to each site. Because these types of passwords can only be cracked by brute force, they are the hardest to recover. In the meantime, readers should take pains to make sure their passwords are a minimum of 11 characters, contain upper- and lower-case letters, and numbers, and aren't part of a pattern.

The ease these crackers had in recovering as many as 90 percent of the hashes they targeted from a real-world breach also exposes the inability many services experience when trying to measure the relative strength or weakness of various passwords. A recently launched site from chipmaker Intel asks users "How strong is your password?," and it estimated it would take six years to crack the passcode "BandGeek2014". That estimate is laughable given that it was one of the first ones to fall at the hands of all three real-world crackers.

As Ars explained recently, the problem with password strength meters found on many websites is they use the total number of combinations required in a brute-force crack to gauge a password's strength. What the meters fail to account for is that the patterns people employ to make their passwords memorable frequently lead to passcodes that are highly susceptible to much more efficient types of attacks.

"You can see here that we have cracked 82 percent [of the passwords] in one hour," Steube said. "That means we have 13,000 humans who did not choose a good password." When academics and some websites gauge susceptibility to cracking, "they always assume the best possible passwords, when it's exactly the opposite. They choose the worst."