Tag Archives: download a website

How to download for offline viewing and archiving those expensive website based courses

This article is inspired by three concrete problems I had this weekend, where I had premium content locked in a paid membership site and wanted a local copy of.  That content was often a rather large complicated site consisting of 200-300 files of 500-1GB of size.  While possible to do manually, getting it all, would have been time consuming and tedious to do so, regardless if I did it or one of my staff.  So in this we are going to review and cover 3 automated solutions I tried to get a copy and keep it safe.  So like the pharaohs of old you can take it with you.

Like the pharoahs of old, you can take it with you, provided you know these tricks...

The problem with online courses:

1) your course is only available online in some password protected site. Lose the password, downgrade your membership you lose access to it all.

2) you can’t take the course with you on the road or in places with poor internet connectivity, often which is a perfect time to learn and review this.

3) courses get outdated, and taken offline. E.g. gurus retire, or retire courses.  You spend $2k on the information it’s only fair that you get a copy of it, regardless of what the guru does with it in the future.

If you could get a copy for later for your “Learning Library” you could keep a copy even if you never use it. I still keep copies of my university books just occasionally to refresh my memory on certain subjects. Can’t say that they’ve gotten much use out of (e.g. Linear Algebra and Chemistry) but I spent like $200 on some of them when that was a LOT of money to me (as in a full month of rent) and years later…still… can’t… quite let them go in the trash 🙂

My Inspirations

First, I own Jeff Walkers Product Launch Formula 2.0, which was $2k or so when I bought it about a year ago.   Jeff is retiring PLF2.0 in favor of PLF3.0 in just a couple of days. That means he’s deleting ALL of the excellent content and sites surrounding that product, partly to push people to buy the upgraded “redone from the ground up”.  Much of that pricey but good content I just don’t remember OR worse have paid lots of money for but haven’t gotten a chance to view yet, which come to think of it, is true for many of the courses I have bought, even if I have been working my butt off the entire year!!!.   While I could manually download files one by one, I wanted an easier way and one that would make sure I didn’t lose anything,  and being so busy, I put it to last few days…and just realized I only had a few days left.    I had planned originally just to download the core modules files, but getting into the site after almost a year ofnot seeing it,  I realized there was a huge amount I haven’t seen, and most available in several formats like PDF transcripts, that would benefit me more than just the video. It would be rather insane downloading the entire site that way so I went with the automated approach.

Second I am canceling one of my $300/mo mastermind groups, as over the 4 months, I’ve gotten most the value I need out of it and would rather have that money going someplace else -that’s about what I pay for a month of some of my Filipino workers!  As soon as I cancel though I lose access to ALL the information in the premium mastermind section. Which kinda sucks.  A much better way to go would be to just stop allowing me access to things I haven’t paid for but their membership software doesn’t do that.

Third one of the other membership sites  I wanted local copies of the videos, as they way they organize the site, you can’t watch a video and actually see the slides and take notes at the same time unless you use a spare PC, which make it hard for me to learn the site.

In every one of these cases downloading the videos is perfectly permitted and encouraged, as it’s for personal -not pirated use.  You almost never have the permission to reupload the site to make it ‘your’s” but in some cases (like in the case of employee training) it’s borderline, just make sure you only grant permission to authorized parties.

How to get a copy of that website for safe keeping.

The good news is there are a few different ways to go about this important if somewhat daunting task.

Manual This is great if you only need a few files, pick and choose style.  You can just use the browser to right click over a link and hit save as, or  from the main menu choose File>Save As> Webpage Complete.  This however won’t download video files on say S3 or youtube to keep them safe.   The reason for this is that it’s saves the html file verbatim, and keeps the links intact pointing to the web.

If it’s a video on say youtube there are a few different websites services for this like keepvid and  FireFox plugins like Orbit downloader.

In a pinch when people make it hard to find the video (e.g. EZS3) you can often use the firefox plugin firebug to see where the files are coming from and open the the file in new tab then save it. Firebug is meant as a web development tool (and it kicks butt at this).  Basically how you use it, look at this 9 step picture.

9 Steps to Using Firebug to download a video
9 Steps to Using Firebug to download a Video

Last I should say there are some times when people stream video, that video cannot be saved via any of these approaches, as the video never hits  your harddrive ever. In this last case you can use any screen capture program (Screenflow on a Mac, Camtasia on a PC) to do a capture of everything.  This is particularly great for webinars on gotomeeting.

Automated: If you have a lot of files special software is website rippers, that basically behave as you would viewing the site, except instead of saving it to a local cache they save it permanently to a folder of your choice.

So there are many tools to help you automatically download entire website for later.  They have basically have the responsibility to automatically follow each and every link from a given start page and download each and every file much like google’s search engine bots crawl your site, and once downloaded rewrite the links to something local and not still on the web.   This can actually be relatively complicated job given most sites link out to many other resources (like amazon S3) to host media files, or cross promotions. You have to sometimes setup boundaries on what you do and don’t follow, you don’t want to download the entire web!

Generally you first start a “project” that will contain all the files (e.g. 1 project might be 1 website) , so in my case I needed 3 projects, one for each site.  These projects can exist in various formats: the raw files as they exist on the web, or various compressed archives like zip folders or “chm” help files, which saves space and makes them say easy to share with mastermind or employees.   In all cases these tools try to download everything at a fast rate so will both hog the netconnection and consume significant CPU, so I recommend doing this when you aren’t trying to do something super important at the same time, or better yet start it when you go to sleep etc and check it in the morning.   Also it’s generally a good idea to make sure these files are virus checked in case the site got hacked.

There are MANY programs that do this, I tried the first 3 that caught my eye searching google for “download website offline viewing“, alternate search terms to try are “download entire website”

3 Website Software to Download for Offline Viewing

  • Htttrack, this is an open source free, near the top of the results.  It appears to be a command line utility minimal UI.  It worked fine for normal sites, but I couldn’t figure out how to get it to work with the password protected websites requiring a non popup login.  Though looking at it’s logs I was able to see that one of the sites was using the WP membership site plugin Aweber which I couldn’t have known before hand. It was the cheapest as it was opensouce/no cost.
  • SpideSoft’s Webzip:    this is not free but it has a fully functional 30 day trial, which for my purposes was good enough as I didn’t need this for more than a week.  They’ve been out for it seems a couple of years, various copyright dates on the software.  This is based on InternetExplorer so it’s workflow is different, and it worked better than Httrack for the password sites.  Basically with it’s built in browser, you browse like you would normally to the part of the site you want to copy, login if needed, and then hit go.   If needed to purchase they were also the cheapest of the 3. at $39.95 for the normal version.  The UI is reasonably polished and fancy feeling, some questionably graphics showing all the bytes in an bytes out.  I found running on Windows XP that it crashed a few times but was easy to resume from where it left off.  Viewing the results it added a extra note at the top about where the files came from, and the date. There is also a resync website so for example if the contents had changed and you wanted to only download the new ones. By default it stores the site in a conventional file/folder.
  • Bimesoft’s SurfOffline:  this is not free but has a 200 link trial, which for my purposes on 2 of the sites wasn’t good enough. It had a polished UI, good reporting on what it’s doing, it also had a visible browser so I could log into the sites before downloading them.    It did not crash like Webzip, but it’s not a fair comparison it also didn’t go as far as the trial version stopped at 200 links.  I found it a bit more limited and market/nagware. After you complete the download pops up a message to upgrade to the full version it had four different versions $39.95 for the standard ‘non-commercial’ version and $69.95 for the non-commercial professional version, commercial versions in both would run *significantly* more.  I couldn’t really figure out what the difference was between the standard and pro version.   It by default stores files in a compressed archive which is a great space saver but I didn’t like, as I prefer to be able to jump to movie files I want.
Screenshot of Webzip, after downloading the PLF website for offline viewing
Screenshot of Webzip, after downloading the PLF website for offline viewing.

At the end I was pretty happy with the trial version of Webzip, it did the job I needed, without costing anything.  If I needed this on a regular basis I would probably investigate some of the others out there, that didn’t require restarting. In any case I imagine that most tools like this run $40, and a 1TB harddrive is about $50 as of this writing, which is nothing compared to the price of one of these pricey courses and the peace of mind having the ability to consume it however you want, whenever you want, however long you want, without being dependent on those pesky gurus.

ADVANCED:
I have what I call a Learning Library currently about  1.5TB in size. It contains ALL the courses I have bought to date, some $10K or so.  In particular it contains on the harddrive, copies of  DVD collections and OCR printed materials I get in those “big box courses” as harddrives are cheap, and DVD’s can get easily damaged or lost in the hustle of life and piles of stuff. OCR the swipe files makes them actually easy to copy and paste from.     Even better is once ripped and converted most are compatible with web browsers, I can tag, organize and even search the files and create hyperlinks to relevant courses on relevant subjects which i can’t easily do with my DVD collection collecting dust on the shelves. Not to mention most IM courses are kinda ugly, not exactly something I like having a showcase of..