BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] What's the best site-crawler utility?

Subject: [Discuss] What's the best site-crawler utility?
From: me at mattgillen.net (Matthew Gillen)
Date: Tue, 07 Jan 2014 20:12:25 -0500
In-reply-to: <52CC9B92.4030104@mattgillen.net>
References: <52CC9277.4010309@horne.net> <52CC9B92.4030104@mattgillen.net>

On 1/7/2014 7:28 PM, Matthew Gillen wrote:
> On 1/7/2014 6:49 PM, Bill Horne wrote:
>> I need to copy the contents of a wiki into static pages, so please
>> recommend a good web-crawler that can download an existing site into
>> static content pages. It needs to run on Debian 6.0.
> 
>   wget -k -m -np http://mysite
> 
> is what I used to use.  -k converts links to point to the local copy of
> the page, -m turns on options for recursive mirroring, and -np enforces
> that only urls "below" the initial one will be downloaded.  (the
> recursive option by itself is pretty dangerous, since most sites have a
> banner or something that points to a top level page, which then pulls in
> the whole rest of the site).

Now that I read more of the other thread you posted before asking this
question, depending on your intentions you might actually want to skip
'-k'.  I used -k because I was taking a wiki offline and didn't want to
figure out how to get twiki set up in two years when I needed to look up
something in the old wiki.  So I wanted a raw html version for archival
purposes that was suitable for browsing using just a local filesystem
with a browser.  '-k' is awesome for that.

However, it may or may not produce what you want if you want to actually
replace the old site, with the intention of accessing it through a web
server.

Matt

References:
- [Discuss] What's the best site-crawler utility?
  - From: bill at horne.net (Bill Horne)
- [Discuss] What's the best site-crawler utility?
  - From: me at mattgillen.net (Matthew Gillen)

Prev by Date: [Discuss] What's the best site-crawler utility?
Next by Date: [Discuss] What's the best site-crawler utility?
Previous by thread: [Discuss] What's the best site-crawler utility?
Next by thread: [Discuss] What's the best site-crawler utility?
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org