in Uncategorized

linux环境下的”蚂蚁”-wget使用简介

wget的使用形式是:
wget [参数列表] URL
首先来介绍一下wget的主要参数:
・ -b:让wget在后台运行,记录文件写在当前目录下"wget-log"文件中;
・ -t [nuber of times]:尝试次数,当wget无法与服务器建立连接时,尝试连接多少次
。比如"-t
120"表示尝试120次。当这一项为"0"的时候,指定尝试无穷多次直到连接成功为止,这个
设置非常有用,当对方服务器突然关机或者网络突然中断的时候,可以在恢复正常后继续
下载没有传完的文件;
・ -c:断点续传,这也是个非常有用的设置,特别当下载比较大的文件的时候,如果中
途意外中断,那么连接恢复的时候会从上次没传完的地方接着传,而不是又从头开始,使
用这一项需要远程服务器也支持断点续传,一般来讲,基于UNIX/Linux的Web/FTP服务器
都支持断点续传;
・ -T [number of seconds]:超时时间,指定多长时间远程服务器没有响应就中断连接
,开始下一次尝试。比如"-T
120"表示如果120秒以后远程服务器没有发过来数据,就重新尝试连接。如果网络速度比
较快,这个时间可以设置的短些,相反,可以设置的长一些,一般最多不超过900,通常
也不少于60,一般设置在120左右比较合适;
・ -w [number of seconds]:在两次尝试之间等待多少秒,比如"-w 100"表示两次尝试
之间等待100秒;
・ -Y on/off:通过/不通过代理服务器进行连接;
・ -Q [byetes]:限制下载文件的总大小最多不能超过多少,比如"-Q2k"表示不能超过2K
字节,"-Q3m"表示最多不能超过3M字节,如果数字后面什么都不加,就表示是以字节为单
位,比如"-Q200"表示最多不能超过200字节;
・ -nd:不下载目录结构,把从服务器所有指定目录下载的文件都堆到当前目录里;
・ -x:与"-nd"设置刚好相反,创建完整的目录结构,例如"wget -nd
http://www.gnu.org"将创建在当前目录下创建"www.gnu.org"子目录,然后按照服务器
实际的目录结构一级一级建下去,直到所有的文件都传完为止;
・ -nH:不创建以目标主机域名为目录名的目录,将目标主机的目录结构直接下到当前目
录下;
・ –http-user=username
・ –http-passwd=password:如果Web服务器需要指定用户名和口令,用这两项来设定;
・ –proxy-user=username
・ –proxy-passwd=password:如果代理服务器需要输入用户名和口令,使用这两个选项

・ -r:在本机建立服务器端目录结构;
・ -l [depth]:下载远程服务器目录结构的深度,例如"-l 5"下载目录深度小于或者等
于5以内的目录结构或者文件;
・ -m:做站点镜像时的选项,如果你想做一个站点的镜像,使用这个选项,它将自动设
定其他合适的选项以便于站点镜像;
・ -np:只下载目标站点指定目录及其子目录的内容。这也是一个非常有用的选项,我们
假设某个人的个人主页里面有一个指向这个站点其他人个人主页的连接,而我们只想下载
这个人的个人主页,如果不设置这个选项,甚至–有可能把整个站点给抓下来,这显然是
我们通常不希望的;
ü 如何设定wget所使用的代理服务器
wget可以使用用户设置文件".wgetrc"来读取很多设置,我们这里主要利用这个文件来是
设置代理服务器。使用者用什么用户登录,那么什么用户主目录下的".wgetrc"文件就起
作用。例如,"root"用户如果想使用".wgetrc"来设置代理服务器,"/root/.wgert"就起
作用,下面给出一个".wgetrc"文件的内容,读者可以参照这个例子来编写自己的"wgetrc"文件:
http-proxy = 111.111.111.111:8080
ftp-proxy = 111.111.111.111:8080
这两行的含义是,代理服务器IP地址为:111.111.111.111,端口号为:80。第一行指定
HTTP协议所使用的代理服务器,第二行指定FTP协议所使用的代理服务器。

wgetrc initalization file command.

Startup File





Once you know how to change default settings of Wget through command
line arguments, you may wish to make some of those settings permanent.
You can do that in a convenient way by creating the Wget startup
file—`.wgetrc'.

Besides `.wgetrc' is the "main" initialization file, it is
convenient to have a special facility for storing passwords. Thus Wget
reads and interprets the contents of `$HOME/.netrc', if it finds
it. You can find `.netrc' format in your system manuals.

Wget reads `.wgetrc' upon startup, recognizing a limited set of
commands.

Wgetrc Location


When initializing, Wget will look for a global startup file,
`/usr/local/etc/wgetrc' by default (or some prefix other than
`/usr/local', if Wget was not installed there) and read commands
from there, if it exists.

Then it will look for the user’s file. If the environmental variable
WGETRC is set, Wget will try to load that file. Failing that, no
further attempts will be made.

If WGETRC is not set, Wget will try to load `$HOME/.wgetrc'.

The fact that user’s settings are loaded after the system-wide ones
means that in case of collision user’s wgetrc overrides the
system-wide wgetrc (in `/usr/local/etc/wgetrc' by default).
Fascist admins, away!

Wgetrc Syntax


The syntax of a wgetrc command is simple:

variable = value

The variable will also be called command. Valid
values are different for different commands.

The commands are case-insensitive and underscore-insensitive. Thus
`DIr__PrefiX’ is the same as `dirprefix’. Empty lines, lines
beginning with `#’ and lines containing white-space only are
discarded.

Commands that expect a comma-separated list will clear the list on an
empty command. So, if you wish to reset the rejection list specified in
global `wgetrc', you can do it with:

reject =

Wgetrc Commands

The complete set of commands is listed below, the letter after `=’
denoting the value the command takes. It is `on/off’ for `on’
or `off’ (which can also be `1′ or `0′), string for
any non-empty string or N for a positive integer. For example,
you may specify `use_proxy = off’ to disable use of PROXY
servers by default. You may use `inf’ for infinite values, where
appropriate.

Most of the commands have their equivalent command-line option
(See section Invoking), except some more obscure or rarely used ones.

accept/reject = string
     Same as `-A’/`-R’ (See section Types of Files).

add_hostdir = on/off
     Enable/disable host-prefixed file names. `-nH’ disables it.

always_rest = on/off
     Enable/disable continuation of the retrieval, the same as `-c’.

base = string
     Set base for relative URLs, the same as `-B’.

convert links = on/off
Convert non-relative links locally. The same as `-k’.

debug = on/off
     Debug mode, same as `-d’.

delete_after = on/off
     Delete after download, the same as `–delete-after’.

dir_mode = N
     Set permission modes of created subdirectories (default is 0755).

dir_prefix = string
     Top of directory tree, the same as `-P’.

dirstruct = on/off
     Turning dirstruct on or off, the same as `-x’ or `-nd’,
respectively.

domains = string
     Same as `-D’ (See section Domain Acceptance).

dot_bytes = N
     Specify the number of bytes "contained" in a dot, as seen throughout
the retrieval (1024 by default). You can postfix the value with
`k’ or `m’, representing kilobytes and megabytes,
respectively. With dot settings you can tailor the dot retrieval to
suit your needs, or you can use the predefined styles
(See section Advanced Options).

dots_in_line = N
     Specify the number of dots that will be printed in each line throughout
the retrieval (50 by default).

dot_spacing = N
     Specify the number of dots in a single cluster (10 by default).

dot_style = string
     Specify the dot retrieval style, as with `–dot-style’.

exclude_directories = string
     Specify a comma-separated list of directories you wish to exclude from
download, the same as `-X’ (See section Directory-Based Limits).

exclude_domains = string
     Same as `–exclude-domains’ (See section Domain Acceptance).

follow_ftp = on/off
     Follow FTP links from HTML documents, the same as `-f’.

force_html = on/off
     If set to on, force the input filename to be regarded as an HTML
document, the same as `-F’.

ftp_proxy = string
     Use string as FTP proxy, instead of the one specified in
environment.

glob = on/off
     Turn globbing on/off, the same as `-g’.

header = string
     Define an additional header, like `–header’.

http_passwd = string
     Set HTTP password.

http_proxy = string
     Use string as HTTP proxy, instead of the one specified in
environment.

http_user = string
     Set HTTP user to string.

ignore_length = on/off
     When set to on, ignore Content-Length header; the same as
`–ignore-length’.

include_directories = string
     Specify a comma-separated list of directories you wish to follow when
downloading, the same as `-I’.

input = string
     Read the URLs from string, like `-i’.

kill_longer = on/off
     Consider data longer than specified in content-length header
as invalid (and retry getting it). The default behaviour is to save
as much data as there is, provided there is more than or equal
to the value in Content-Length.

logfile = string
     Set logfile, the same as `-o’.

login = string
     Your user name on the remote machine, for FTP Defaults to
`anonymous’.

mirror = on/off
     Turn mirroring on/off. The same as `-m’.

noclobber = on/off
     Same as `-nc’.

no_proxy = string
     Use string as the comma-separated list of domains to avoid in
PROXY loading, instead of the
     one specified in environment.

no_parent = on/off
     Disallow retrieving outside the directory hierarchy, like
`–no-parent’ (See section Directory-Based Limits).

output_document = string
     Set the output filename, the same as `-O’.

passive_ftp = on/off
     Set passive FTP, the same as `–passive-ftp’.

passwd = string
     Set your FTP password to password. Without this setting, the
password defaults to
     `username@hostname.domainname’.

proxy_user = string
      Set PROXY authentication user name to string, like
`–proxy-user’.

proxy_passwd = string
      Set PROXY authentication password to string, like
`–proxy-passwd’.

quiet = on/off
      Quiet mode, the same as `-q’.

quota = quota
      Specify the download quota, which is useful to put in global
wgetrc. When download quota is specified, Wget will stop retrieving
after the download sum has become greater than quota. The quota can be
specified in bytes (default), kbytes `k’ appended) or mbytes
(`m’ appended). Thus `quota = 5m’ will set the quota to 5
mbytes. Note that the user’s startup file overrides system settings.
reclevel = N
      Recursion level, the same as `-l’.

recursive = on/off
      Recursive on/off, the same as `-r’.

relative_only = on/off
 Follow only relative links, the same as `-L’ (See section Relative Links).

remove_listing = on/off
      If set to on, remove FTP listings downloaded by Wget. Setting it
to off is the same as `-nr’.

retr_symlinks = on/off
      When set to on, retrieve symbolic links as if they were plain files; the
same as `–retr-symlinks’.
robots = on/off
      Use (or not) `/robots.txt' file (See section Robots). Be sure to know
what you are doing before changing the default (which is `on’).
server_response = on/off
      Choose whether or not to print the HTTP and FTP server
responses, the same as `-S’.

simple_host_check = on/off
      Same as `-nh’ (See section Host Checking).

span_hosts = on/off
      Same as `-H’.

timeout = N
      Set timeout value, the same as `-T’.

timestamping = on/off
      Turn timestamping on/off. The same as `-N’ (See section Time-Stamping).

tries = N
      Set number of retries per URL, the same as `-t’.

use_proxy = on/off
      Turn PROXY support on/off. The same as `-Y’.

verbose = on/off
      Turn verbose on/off, the same as `-v’/`-nv’.

wait = N
      Wait N seconds between retrievals, the same as `-w’.

Sample Wgetrc

This is the sample initialization file, as given in the distribution.
It is divided in two section–one for global usage (suitable for global
startup file), and one for local usage (suitable for
`$HOME/.wgetrc'). Be careful about the things you change.

Note that all the lines are commented out. For any line to have effect,
you must remove the `#’ prefix at the beginning of line.

###
### Sample Wget initialization file .wgetrc
###

## You can use this file to change the default behaviour of wget or to
## avoid having to type many many command-line options. This file does
## not contain a comprehensive list of commands -- look at the manual
## to find out what you can put into this file.
##
## Wget initialization file can reside in /usr/local/etc/wgetrc
## (global, for all users) or $HOME/.wgetrc (for a single user).
##
## To use any of the settings in this file, you will have to uncomment
## them (and probably change them).

##
## Global settings (useful for setting up in /usr/local/etc/wgetrc).
## Think well before you change them, since they may reduce wget's
## functionality, and make it behave contrary to the documentation:
##

# You can set retrieve quota for beginners by specifying a value
# optionally followed by 'K' (kilobytes) or 'M' (megabytes). The
# default quota is unlimited.
#quota = inf

# You can lower (or raise) the default number of retries when
# downloading a file (default is 20).
#tries = 20

# Lowering the maximum depth of the recursive retrieval is handy to
# prevent newbies from going too "deep" when they unwittingly start
# the recursive retrieval. The default is 5.
#reclevel = 5

# Many sites are behind firewalls that do not allow initiation of
# connections from the outside. On these sites you have to use the
# `passive' feature of FTP. If you are behind such a firewall, you
# can turn this on to make Wget use passive FTP by default.
#passive_ftp = off

##
## Local settings (for a user to set in his $HOME/.wgetrc). It is
## *highly* undesirable to put these settings in the global file, since
## they are potentially dangerous to "normal" users.
##
## Even when setting up your own ~/.wgetrc, you should know what you
## are doing before doing so.
##

# Set this to on to use timestamping by default:
#timestamping = off

# It is a good idea to make Wget send your email address in a `From:'
# header with your request (so that server administrators can contact
# you in case of errors). Wget does *not* send `From:' by default.
#header = From: Your Name <username@site.domain>

# You can set up other headers, like Accept-Language. Accept-Language
# is *not* sent by default.
#header = Accept-Language: en

# You can set the default proxy for Wget to use. It will override the
# value in the environment.
#http_proxy = http://proxy.yoyodyne.com:18023/

# If you do not want to use proxy at all, set this to off.
#use_proxy = on

# You can customize the retrieval outlook. Valid options are default,
# binary, mega and micro.
#dot_style = default

# Setting this to off makes Wget not download /robots.txt. Be sure to
# know *exactly* what /robots.txt is and how it is used before changing
# the default!
#robots = on

# It can be useful to make Wget wait between connections. Set this to
# the number of seconds you want Wget to wait.
#wait = 0

# You can force creating directory structure, even if a single is being
# retrieved, by setting this to on.
#dirstruct = off

# You can turn on recursive retrieving by default (don't do this if
# you are not sure you know what it means) by setting this to on.
#recursive = off

# To have Wget follow FTP links from HTML files by default, set this
# to on:
#follow_ftp = off

Write a Comment

Comment