Top Banner
12/7/11 Create a mirror of a website with Wget _ FOSSwire 1/8 fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/ Create a mirror of a Zebsite Zith Wget GNU's wget command line program for downloading is very popular, and not without reason. While you can use it simply to retrieve a single file from a server, it is much more powerful than that and offers many more features. One of the more advanced features in wget is the mirror feature. This allows you to create a complete local copy of a website, including any stylesheets, supporting images and other support files. All the (internal) links will be followed and downloaded as well (and their resources), until you have a complete copy of the site on your local machine. In its most basic form, you use the mirror functionality like so: $ Zget -m http://ZZZ.e[ample.com/ There are several issues you might have with this approach, however. First of all, it's not very useful for local browsing, as the links in the pages themselves still point to the real URLs and not your local downloads. What that means is that, if, say, you downloaded hWWp://ZZZ.e[ample.com/, the link on that page to hWWp://ZZZ.e[ample.com/page2.hWml would still point to example.com's server and so would be a right pain if you're trying to browse your local copy of the site while being offline for some reason. To fix this, you can use the -k option in conjunction with the mirror option: $ Zget -mk http://ZZZ.e[ample.com/ Now, that link I talked about earlier will point to the relative page2.hWml. The same happens with all images, stylesheets and resources, so you should be able to now get an authentic offline browsing experience. There's one other major issue I haven't covered here yet - bandwidth. Disregarding the bandwidth you'll be using on your connection to pull down a whole site, you're going to be putting some strain on the remote server. You should think about being kind and reduce the load on them (and you) especially if the site is small and bandwidth comes at a premium. Play nice. One of the ways in which you can do this is to deliberately slow down the download by placing a delay between requests to the server. $ Zget -mk -Z 20 http://ZZZ.e[ample.com/ This places a delay of 20 seconds between requests. Replace that number, and optionally you can add a suffix of m for minutes, h for hours, and d for ... yes, days, if you want to slow down the mirror even further. Now if you want to make a backup of something, or download your favourite website for viewing when you're offline, you can do so with wget's mirror feature. To delve even further into this, check out wget's man page (man ZgeW) where there are further options, such as random delays, setting a custom user agent, sending cookies to the site and lots more. FolloZ us... Search SigQ IQ http:// CompacW MiUUoUV Hand decorated with crystals Beautiful and practical gift www.design-glassware.com LinX[ 150.000 Elektronik-Artikel online Keine Versandkosten ab 25¼ www.voelkner.de Xen SeUYeU BackXp Autonomously Backup Multiple Virtual Machines Running XenServer. www.PHDVirtual.com April 21, 2008 PeWer Upfold 40 Comment(s) 5 LiNe Continue FOSSZire All articles NeZs Tips & Tutorials Games Applications Programming
8

Create a mirror of a website with Wget _ FOSSwire

Mar 27, 2016

Download

Documents

sokoban

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

1/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

Create a mirror of a website with Wget

GNU's wget command line program for downloading is very popular, and

not without reason. While you can use it simply to retrieve a single file

from a server, it is much more powerful than that and offers many more

features.

One of the more advanced features in wget is the mirror feature. This

allows you to create a complete local copy of a website, including any

stylesheets, supporting images and other support files. All the (internal)

links will be followed and downloaded as well (and their resources), until you have a complete copy of

the site on your local machine.

In its most basic form, you use the mirror functionality like so:

$ w g e t - m h t t p : / / w w w . e x a m p l e . c o m /

There are several issues you might have with this approach, however.

First of all, it's not very useful for local browsing, as the links in the pages themselves still point to the

real URLs and not your local downloads. What that means is that, if, say, you downloaded

http://www.example.com/, the link on that page to http://www.example.com/page2.html would

still point to example.com's server and so would be a right pain if you're trying to browse your local

copy of the site while being offline for some reason.

To fix this, you can use the -k option in conjunction with the mirror option:

$ w g e t - m k h t t p : / / w w w . e x a m p l e . c o m /

Now, that link I talked about earlier will point to the relative page2.html. The same happens with all

images, stylesheets and resources, so you should be able to now get an authentic offline browsing

experience.

There's one other major issue I haven't covered here yet - bandwidth. Disregarding the bandwidth

you'll be using on your connection to pull down a whole site, you're going to be putting some strain on

the remote server. You should think about being kind and reduce the load on them (and you)

especially if the site is small and bandwidth comes at a premium. Play nice.

One of the ways in which you can do this is to deliberately slow down the download by placing a delay

between requests to the server.

$ w g e t - m k - w 2 0 h t t p : / / w w w . e x a m p l e . c o m /

This places a delay of 20 seconds between requests. Replace that number, and optionally you can

add a suffix of m for minutes, h for hours, and d for ... yes, days, if you want to slow down the mirror

even further.

Now if you want to make a backup of something, or download your favourite website for viewing when

you're offline, you can do so with wget's mirror feature. To delve even further into this, check out

wget's man page (man wget) where there are further options, such as random delays, setting a

custom user agent, sending cookies to the site and lots more.

Follow us...

Search

Sign In

http://

Compact Mirrors Hand decorated with crystals Beautiful and practical gift www.design-glassware.com

Linux 150.000 Elektronik-Artikel online Keine Versandkosten ab 25€ www.voelkner.de

Xen Server Backup Autonomously Backup Multiple Virtual Machines Running XenServer. www.PHDVirtual.com

April 21, 2008

Peter

Upfold

40 Comment(s)

5

Like

Continue

FOSSwire

All articles

News

Tips & Tutorials

Games

Applications

Programming

Page 2: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

2/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

Tips & Tutorials CLI GNU Linux tutorials command line client beginner

download wget

rsync (guest) POSTED ON 22 April 2008 AT 02:38 PM

rsync is a far more reasonable and well-suited tool for this purpose.

Using the right tool for the right job is a key to being a better admin.

fsdaily.com

POSTED ON 22 April 2008 AT 02:48 PM

<strong>Story added...</strong>

This story has been submitted to fsdaily.com! If you think this story

should be read by the free software community, come vote it up and

discuss it here:

http://www.fsdaily.com/EndUser/Create_a_mirror_of_a_website_with_Wget...

Kyle (guest) POSTED ON 22 April 2008 AT 03:07 PM

rsync is used for backing up a file system when you have ssh access to

it. wget on the other hand can be used on any public website even if you

dont have ssh/ftp access.

Shiv (guest) POSTED ON 22 April 2008 AT 03:19 PM

Awesome Tip ! It is very good for web developers who want to develop a

similar kind of website.

Thanks for the post.

Stuart (guest) POSTED ON 22 April 2008 AT 03:48 PM

@Shiv: remember, the author(s) of the website you're downloading has

copyright over the design, including whatever code or markup powers it.

Copyright does NOT just cover content!

So changing all the content but keeping the exact same layout and code

may, in some cases, be an infringement that leads to somebody getting

angry and asking you to remove your all-too-similar website.

Of course, if the design is very common, this probably doesn't apply, or

if the site design is open-sourced, e.g. Wordpress.

Peter Upfold - http://peter.upfold.org.uk/

Peter Upfold is a technology enthusiast from the UK. Peter’s interest in Linux stems back to 2003,

when curiosity got the better of him and he began using SUSE 9.0. Now he runs Linux Mint 9 on the

desktop, runs a CentOS-based web server from home for his personal website and dabbles in all

sorts of technology things across the Windows, Mac and open source worlds.

submit

�0

HOME » ARTICLES »

Discussion: Create a mirror of a website with Wget

Like

Quote

Quote

Quote

Quote

Page 3: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

3/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

IKTeroak :: Egizu zurewebgunearen segurtasunkopi

POSTED ON 22 April 2008 AT 05:11 PM

[...] FOSSwire gunean wget komandoaren erabilera azaltzen dute

zerbitzari batean duzun webgune baten kopia bat zure ekipora ekartzeko.

Tutorial txiki honen bitartez zure web orriaren CSS fitxategiak, irudiak eta

bestelako fitxategiak gorde ahal izango dituzu lokalean, barne lotura

guztiak errespetatzen direlarik. Prozesuan kontutan hartu beharreko

hainbat xehetasun ere oso modu garbian egiten dira. [...]

E (guest) POSTED ON 22 April 2008 AT 05:24 PM

This is only good if you want to make the end result static. If your site is

a true dynamic site running something like PHP and you run these

commands you only end up with a static representation of the site as it

was at that time. It's not a true mirror. This is only good if you want to

mirror static content like downloads or pictures to another site.

Simon Hibbs (guest) POSTED ON 22 April 2008 AT 06:18 PM

httrack is more feature complete for web site mirroring, but also more

complex.

Todd (guest) POSTED ON 22 April 2008 AT 08:44 PM

Another valuable option is -np for no parent. Say you just want to mirror

http://example.com/subfolder/. by default wget will mirror

http://example.com/subfolder/ and go up to the parent folder

(example.com in this case) and grab everything there. So the final

command that I usually use on sites:

wget -mk -w 20 -np http://example.com/subfolder/

Also look into the screen command so you can "background" this and

check on the status every so often.

Create a Local WebsiteMirror with Wget [LinuxTip

POSTED ON 22 April 2008 AT 10:14 PM

[...] is both considerate and wise. Hit the link for details on using wget for

offline website access. Create a mirror of a website with Wget [...]

Paul William Tenny(guest)

POSTED ON 22 April 2008 AT 10:58 PM

You probably want -N if you intend on doing subsequent mirrors, it'll only

refresh local copies of files if the remote version is newer.

用wget创建网站的镜像 -

Quote

Quote

Quote

Quote

Page 4: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

4/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

冰古blog POSTED ON 23 April 2008 AT 05:28 AM

[...] 更详细,请访问FOSSwire Tags: linux, shell, SSH, wget You can

follow any responses to this entry through the RSS 2.0 [...]

Zhenyi (guest) POSTED ON 23 April 2008 AT 11:59 AM

... Mirror the whole internet

FOSSwire &raquo; Moreadvanced wget usage

POSTED ON 23 April 2008 AT 05:25 PM

[...] recently covered how to make a mirror of a website with

GNU&#8217;s wget command line program and in the comments of that

post there were several [...]

Sharjeel Sayed (guest) POSTED ON 24 April 2008 AT 05:26 AM

Any idea how we can use this to mirror del.icio.us ?

Create a Local WebsiteMirror with Wget [LinuxTip

POSTED ON 25 April 2008 AT 03:18 PM

[...] is both considerate and wise. Hit the link for details on using wget for

offline website access. Create a mirror of a website with Wget [...]

Mirror sites with wget&laquo; 0ddn1x: trickswith

POSTED ON 25 April 2008 AT 05:35 PM

[...] Mirror sites with&nbsp;wget Filed under: Linux &#8212; 0ddn1x @

2008-04-25 17:35:03 +0000 http://fosswire.com/2008/04/21/create-a-

mirror-of-a-website-with-wget/ [...]

xajckop (guest) POSTED ON 30 April 2008 AT 01:25 PM

Serbian version of that tip added to my blog.

links fromdupola&#8217;sbookmarks.&raquo; BlogA

POSTED ON 08 May 2008 AT 04:36 PM

[...] FOSSwire » Create a mirror of a website with Wget (tags: wget putty

ssh) [...]

links for 2008-05-08&laquo; dupola&#8217;sweblog

Quote

Quote

Quote

Page 5: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

5/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

POSTED ON 08 May 2008 AT 04:37 PM

[...] FOSSwire » Create a mirror of a website with Wget (tags: wget putty

ssh) Possibly related posts: (automatically generated)links for 2008-04-

05links for 2008-03-07 Posted by dupola Filed in bookmarks [...]

Jan from fish forum(guest)

POSTED ON 06 June 2008 AT 04:06 PM

E is right, if your site is dynamic, then you will get files such as

something.php?this=that . This won't work unless your mirror server

serves php files as HTML ones.

So in order to mirror a dynamic website it is necessary to move

databases. I am not sure if wget is suitable for this purpose since there

is different access to databases than to public folders with website's

content.

r3d3ye (guest) POSTED ON 26 June 2008 AT 12:16 PM

https://:8043

Does anyone tried using "wget" in mirroring this kind of URL? Sites with

web certificate and different web port (in this case 8043).

Lokale Kopie einerWebseite mit wget

POSTED ON 10 July 2008 AT 03:24 PM

[...] Diese Variante eignet sich also dafür eine webseite lokal abzulegen

um offline zu browsen. [via fosswire] &nbsp;Tags:browse, browsen, get,

holen, komplett, Kopie, lokal, Mirror, sichern, speichern, Tip, [...]

Homolibere &raquo; BlogArchive &raquo;Создание з

POSTED ON 27 August 2008 AT 08:20 AM

[...] с FOSSwire переведено [...]

touranaga (guest) POSTED ON 16 September 2008 AT 08:32 AM

where does wget saves mirrors

note that i'm new in linux so dont mad at me

Peter (guest)

POSTED ON 16 September 2008 AT 08:34 AM

@touranaga - It should save them in whatever folder you ran wget from.

So if you just opened up a terminal and did it straight from there, they

should be in your home folder, under a directory of the website address

(for example /home/yourname/fosswire.com).

If you moved into a different directory with cd, then the mirrors will be

placed there.

Quote

Quote

Quote

Quote

Page 6: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

6/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

touranaga (guest) POSTED ON 16 September 2008 AT 08:49 AM

a found them its localy save wget make new folder for each mirror you

save, but there is an option to no to make folder, in man wget

touranaga (guest) POSTED ON 16 September 2008 AT 08:50 AM

Thank you Peter

Carstens Blog &raquo;links for 2009-01-17

POSTED ON 17 January 2009 AT 03:04 PM

[...] FOSSwire » Create a mirror of a website with Wget (tags: free tutorial

programming web tips howto linux download tools wget mirror website

internet utilities shell commandline ubuntu commands backup) [...]

spiny norman (guest) POSTED ON 22 May 2009 AT 02:21 PM

"if your site is dynamic, then you will get files such as something.php?

this=that . This won't work unless your mirror server serves php files as

HTML ones. So in order to mirror a dynamic website it is necessary to

move databases."

Wrong. Wget will convert all the "index.php?x=34" or whatever to HTML

files if you use the right options. You get a startic snapshot of the site at

that moment, as was mentioned, but it works. RTFM.

Tim Jeffries (guest)

POSTED ON 07 June 2009 AT 12:13 PM

Is there any way to use this command from Mac OS X? I'm being told

that the command isn't found ... :-(

Peter Upfold POSTED ON 07 June 2009 AT 12:21 PM

Tim Jeffries said:

curl ships with Mac OS X, but wget unfortunately does not. You can

either compile it yourself or there is a pre-built version in a zip archive

available at the Status-Q blog. In the latter case, you can simply copy

the wget binary in that zip archive to /usr/local/bin, or anywhere else in

your PATH.

Tim Jeffries (guest)

Quote

Quote

Quote

Quote

QuoteIs there any way to use this command from Mac OS X? I'm being

told that the command isn't found ... :-(

Quote

Page 7: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

7/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

POSTED ON 07 June 2009 AT 01:12 PM

Thanks Peter. I realise I'm probably asking super stupid questions. It's

been a long time since I've done any serious work at the command line

and even when I did it was always wise to get someone to look over my

shoulder.

I managed to install it and get it to work on OS X. It's such a great tool.

I'm wondering if you know why it wouldn't work on a Blogger blog. The

command works fine on my work website (http://www.urbanseed.org/)

but I can't seem to get it to work with an old blog of mine I'm trying to

archive so I can happily remove it.

http://www.afootinbothplaces.blogspot.com/

Thanks again. Tim.

Morgel (guest)

POSTED ON 22 June 2009 AT 11:05 AM

I use this method (It is taken from this post

http://www.sysadmin.md/how-to-retrieve-entire-site-via-command-line-

using-wget.html):

wget -rkpNl5 www.sysadmin.md

-r — Retrieve recursively

-k — Convert the links in the document to make them suitable for local

viewing

-p — Download everything (inlined images, sounds, and referenced

stylesheets)

-N — Turn on time-stamping

-l5 — Specify recursion maximum depth level 5

Morgel (guest) POSTED ON 22 June 2009 AT 11:06 AM

Strange formatting. Please edit above post :(

StefanLasiewski POSTED ON 08 September 2009 AT 09:02 PM

Morgel: Some of those options are redundant, and are already included

as the --mirror option.

phloating_man (guest) POSTED ON 03 June 2011 AT 07:40 AM

Thank you! This worked perfectly..

Angel (guest)

Quote

Quote

Quote

Quote

Page 8: Create a mirror of a website with Wget _ FOSSwire

12/7/11 Create a mirror of a website with Wget | FOSSwire

8/8fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

POSTED ON 09 July 2011 AT 07:16 PM

Thank you it worked.

HOME » ARTICLES » CREATE A MIRROR OF A WEBSITE WITH WGET

Quote

FOSSwire is an Oratos Media property. Content is made available under the CC-BY-SA 3.0 license.© 2006 - 2010 Oratos Media. About | Policies