Create a mirror of a website with Wget _ FOSSwire

Page 1

12/7/11

Create a mirror of a website with Wget FOSSwire

Compac Mi o Lin

Hand decorated with crystals Beautiful and practical gift

150.000 Elektronik-Artikel online Keine Versandkosten ab 25

www.design-glassware.com

www.voelkner.de

Xen Se e Back p Autonomously Backup Multiple Virtual Machines Running XenServer.

Create a mirror of a

ebsite ith Wget

April 21, 2008

Pe er Upfold

GNU's wget command line program for downloading is very popular, and not without reason. While you can use it simply to retrieve a single file from a server, it is much more powerful than that and offers many more features.

www.PHDVirtual.com

40 Comment(s) Li e

One of the more advanced features in wget is the mirror feature. This

5

allows you to create a complete local copy of a website, including any stylesheets, supporting images and other support files. All the (internal) links will be followed and downloaded as well (and their resources), until you have a complete copy of the site on your local machine. In its most basic form, you use the mirror functionality like so: $ get -m http://

FOSS ire

.e ample.com/

There are several issues you might have with this approach, however. First of all, it's not very useful for local browsing, as the links in the pages themselves still point to the

All articles Ne s

real URLs and not your local downloads. What that means is that, if, say, you downloaded h p://

.e ample.com/, the link on that page to h p://

.e ample.com/page2.h ml would

still point to example.com's server and so would be a right pain if you're trying to browse your local copy of the site while being offline for some reason.

Tips & Tutorials Games

To fix this, you can use the -k option in conjunction with the mirror option: $ get -mk http://

.e ample.com/

Now, that link I talked about earlier will point to the relative page2.h ml. The same happens with all

Applications Programming

images, stylesheets and resources, so you should be able to now get an authentic offline browsing experience.

Search

There's one other major issue I haven't covered here yet - bandwidth. Disregarding the bandwidth you'll be using on your connection to pull down a whole site, you're going to be putting some strain on the remote server. You should think about being kind and reduce the load on them (and you) especially if the site is small and bandwidth comes at a premium. Play nice.

Follo

us...

One of the ways in which you can do this is to deliberately slow down the download by placing a delay between requests to the server. $ get -mk - 20 http://

.e ample.com/

This places a delay of 20 seconds between requests. Replace that number, and optionally you can add a suffix of m for minutes, h for hours, and d for ... yes, days, if you want to slow down the mirror

Sig I http:// Continue

even further. Now if you want to make a backup of something, or download your favourite website for viewing when you're offline, you can do so with wget's mirror feature. To delve even further into this, check out wget's man page (man

ge ) where there are further options, such as random delays, setting a

custom user agent, sending cookies to the site and lots more.

fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

1/8


12/7/11

Create a mirror of a website with Wget FOSSwire Peter Upfold - http://peter.upfold.org.uk/ Peter Upfold is a technology enthusiast from the UK. Peter s interest in Linux stems b ack to 2003, when curiosity got the b etter of him and he b egan using SUSE 9.0. Now he runs Linux Mint 9 on the desktop, runs a CentOS-b ased web server from home for his personal web site and dab b les in all sorts of technology things across the Windows, Mac and open source worlds.

Tips & Tutorials do nload

CLI

ge

GNU Lin

o ial

command line

clien

beginne

Li e

bm i

HOME

ARTICLES

Disc ssion: Create a mirror of a ebsite ith Wget rs nc (guest)

POSTED ON 22 April 2008 AT 02:38 PM

nc i a fa mo e ea onable and ell-

Q oe

i ed ool fo hi p po e.

U ing he igh ool fo he igh job i a ke

o being a be e admin.

fsdail .com

POSTED ON 22 April 2008 AT 02:48 PM

<

ong>S o

Thi

o

added...</

ha been

ho ld be ead b di c h p://

ong>

bmi ed o f dail .com! If o

hink hi

he f ee of a e comm ni , come o e i

o p and

i he e: .f dail .com/EndU e /C ea e_a_mi o _of_a_ eb i e_ i h_Wge ...

K le (guest) Q oe

POSTED ON 22 April 2008 AT 03:07 PM

nc i i.

ed fo backing p a file

em

ge on he o he hand can be

don ha e

h/f p acce

h acce

o

eb i e e en if o

.

Shiv (guest) Q oe

hen o ha e

ed on an p blic

POSTED ON 22 April 2008 AT 03:19 PM

A e ome Tip ! I i

e

good fo

eb de elope

ho an o de elop a

imila kind of eb i e. Thank fo he po .

Stuart (guest) Q oe

POSTED ON 22 April 2008 AT 03:48 PM

@Shi : emembe , he a ho ( ) of he eb i e o ' e do nloading ha cop igh o e he design, incl ding ha e e code o ma k p po e Cop igh doe NOT j

So changing all he con en b

keeping he e ac

ame la o

ma , in ome ca e , be an inf ingemen ha lead ang Of co

and a king o

i.

co e con en !

o emo e o

e, if he de ign i

e

and code

o omebod ge ing

all- oo- imila

eb i e.

common, hi p obabl doe n' appl , o

if he i e de ign i open- o ced, e.g. Wo dp e

fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

.

2/8


12/7/11

Create a mirror of a website with Wget FOSSwire

IKTeroak :: Egi u ure webgunearen segurtasun kopi

POSTED ON 22 Ap il 2008 AT 05:11 PM

[...] FOSS . T

CSS

, ,

.P . [...]

E (guest)

POSTED ON 22 Ap il 2008 AT 05:24 PM

T

Q

.I PHP .I'

.T .

Simon Hibbs (guest)

POSTED ON 22 Ap il 2008 AT 06:18 PM

,

Q

.

Todd (guest)

POSTED ON 22 Ap il 2008 AT 08:44 PM

A

Q

:// ://

(

.

/

.

/

.S /. /

.

)

.S

I -

:

- 20 -

://

.

/

A

/ "

"

.

Create a Local Website Mirror with Wget [Linux Tip

POSTED ON 22 Ap il 2008 AT 10:14 PM

[...]

.H .C

Paul William Tenn (guest)

W

[...]

POSTED ON 22 Ap il 2008 AT 10:58 PM

Y

-N

, ' .

Q

用wget创建网站的 像 fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

3/8


12/7/11

Create a mirror of a website with Wget FOSSwire

冰 b g POSTED ON 23 April 2008 AT 05:28 AM

[...] 更详细,请访

FOSSwire Tags: linux, shell, SSH, wget You can

follow any responses to this entry through the RSS 2.0 [...]

Zhen i (g e )

POSTED ON 23 April 2008 AT 11:59 AM

... Mirror the whole internet

Quote

FOSS i e & a ad a ced ge

;M e age POSTED ON 23 April 2008 AT 05:25 PM

[...] recently covered how to make a mirror of a website with GNU’s wget command line program and in the comments of that post there were several [...]

Sharjeel Sa ed (g e )

POSTED ON 24 April 2008 AT 05:26 AM

Any idea how we can use this to mirror del.icio.us ?

Quote

C ea e a L ca Web i e Mi i h Wge [Li Ti POSTED ON 25 April 2008 AT 03:18 PM

[...] is both considerate and wise. Hit the link for details on using wget for offline website access. Create a mirror of a website with Wget [...]

Mi &a ih

i e i h ge ; 0dd 1 : ic POSTED ON 25 April 2008 AT 05:35 PM

[...] Mirror sites with wget Filed under: Linux — 0ddn1x @ 2008-04-25 17:35:03 +0000 http://fosswire.com/2008/04/21/create-amirror-of-a-website-with-wget/ [...]

ajckop (g e )

POSTED ON 30 April 2008 AT 01:25 PM

Serbian version of that tip added to my blog.

Quote

i d b A

f a’ a .& a

;B g POSTED ON 08 Ma 2008 AT 04:36 PM

[...] FOSSwire

Create a mirror of a website with Wget (tags: wget putty

ssh) [...]

i f 2008-05-08 & a ;d a’ eb g fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

4/8


12/7/11

Create a mirror of a website with Wget FOSSwire POSTED ON 08 Ma 2008 AT 04:37 PM

[...] FOSSwire » Create a mirror of a website with Wget (tags: wget putty ssh) Possibly related posts: (automatically generated)links for 2008-0405links for 2008-03-07 Posted by dupola Filed in bookmarks [...]

Jan from fish forum (guest) Quote POSTED ON 06 June 2008 AT 04:06 PM

E is right, if your site is dynamic, then you will get files such as something.php?this=that . This won't work unless your mirror server serves php files as HTML ones. So in order to mirror a dynamic website it is necessary to move databases. I am not sure if wget is suitable for this purpose since there is different access to databases than to public folders with website's content.

r3d3 e (guest) Quote

POSTED ON 26 June 2008 AT 12:16 PM

https://:8043 Does anyone tried using "wget" in mirroring this kind of URL? Sites with web certificate and different web port (in this case 8043).

Lokale Kopie einer Webseite mit wget POSTED ON 10 Jul 2008 AT 03:24 PM

[...] Diese Variante eignet sich also dafür eine webseite lokal abzulegen um offline zu browsen. [via fosswire]  Tags:browse, browsen, get, holen, komplett, Kopie, lokal, Mirror, sichern, speichern, Tip, [...]

Homolibere » Blog Archive » С зда ие з POSTED ON 27 August 2008 AT 08:20 AM

[...]

FOSSwire е еведе

[...]

touranaga (guest) Quote

POSTED ON 16 September 2008 AT 08:32 AM

where does wget saves mirrors note that i'm new in linux so dont mad at me

Peter (guest) Quote

POSTED ON 16 September 2008 AT 08:34 AM

@touranaga - It should save them in whatever folder you ran wget from. So if you just opened up a terminal and did it straight from there, they should be in your home folder, under a directory of the website address (for example /home/yourname/fosswire.com). If you moved into a different directory with cd, then the mirrors will be placed there.

fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

5/8


12/7/11

Create a mirror of a website with Wget FOSSwire

touranaga (g e ) Quote

POSTED ON 16 September 2008 AT 08:49 AM

a found them its localy save wget make new folder for each mirror you save, but there is an option to no to make folder, in man wget

touranaga (g e ) Quote

POSTED ON 16 September 2008 AT 08:50 AM

Thank you Peter

Ca en Blog & a o; link fo 2009-01-17 POSTED ON 17 Januar 2009 AT 03:04 PM

[...] FOSSwire

Create a mirror of a website with Wget (tags: free tutorial

programming web tips howto linux download tools wget mirror website internet utilities shell commandline ubuntu commands backup) [...]

spin norman (g e ) Quote

POSTED ON 22 Ma 2009 AT 02:21 PM

"if your site is dynamic, then you will get files such as something.php? this=that . This won't work unless your mirror server serves php files as HTML ones. So in order to mirror a dynamic website it is necessary to move databases." Wrong. Wget will convert all the "index.php?x=34" or whatever to HTML files if you use the right options. You get a startic snapshot of the site at that moment, as was mentioned, but it works. RTFM.

Tim Jeffries (g e ) Quote

POSTED ON 07 June 2009 AT 12:13 PM

Is there any way to use this command from Mac OS X? I'm being told that the command isn't found ... :-(

Peter Upfold Quote

POSTED ON 07 June 2009 AT 12:21 PM

Tim Jeffries said: Is there any way to use this command from Mac OS X? I'm being told that the command isn't found ... :-( curl ships with Mac OS X, but wget unfortunately does not. You can either compile it yourself or there is a pre-built version in a zip archive available at the Status-Q blog. In the latter case, you can simply copy the wget binary in that zip archive to /usr/local/bin, or anywhere else in your PATH.

Tim Jeffries (g e ) Quote

fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

6/8


12/7/11

Create a mirror of a website with Wget FOSSwire POSTED ON 07 J ne 2009 AT 01:12 PM

Thanks Peter. I realise I'm probably asking super stupid questions. It's been a long time since I've done any serious work at the command line and even when I did it was always wise to get someone to look over my shoulder. I managed to install it and get it to work on OS X. It's such a great tool. I'm wondering if you know why it wouldn't work on a Blogger blog. The command works fine on my work website (http://www.urbanseed.org/) but I can't seem to get it to work with an old blog of mine I'm trying to archive so I can happily remove it. http://www.afootinbothplaces.blogspot.com/ Thanks again. Tim.

Morgel (g e ) Quote

POSTED ON 22 J ne 2009 AT 11:05 AM

I use this method (It is taken from this post http://www.sysadmin.md/how-to-retrieve-entire-site-via-command-lineusing-wget.html): wget -rkpNl5 www.sysadmin.md -r

Retrieve recursively

-k

Convert the links in the document to make them suitable for local

viewing -p

Download everything (inlined images, sounds, and referenced

stylesheets) -N

Turn on time-stamping

-l5

Specify recursion maximum depth level 5

Morgel (g e ) Quote

POSTED ON 22 J ne 2009 AT 11:06 AM

Strange formatting. Please edit above post :(

StefanLasie ski Quote

POSTED ON 08 Sep ember 2009 AT 09:02 PM

Morgel: Some of those options are redundant, and are already included as the --mirror option.

phloating_man (g e ) Quote

POSTED ON 03 J ne 2011 AT 07:40 AM

Thank you! This worked perfectly..

Angel (g e )

fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

7/8


12/7/11

Create a mirror of a website with Wget FOSSwire POSTED ON 09 July 2011 AT 07:16 PM

Quote

HOME

ARTICLES

Thank you it worked.

CREATE A MIRROR OF A WEBSITE WITH WGET

Š 2006 - 2010 Oratos Media. About Policies

fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/

FOSSwire is an Oratos Media property. Content is made available under the CC-BY-SA 3.0 license.

8/8


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.