linux网站爬取,Kali下httrack 爬取网站页面
簡介:
HTTrack 是一個免費開源的網站離線瀏覽器。通過它可以將整個網站下載到本地的某個目錄,包括html、圖片和腳本以及樣式文件,并對其中的鏈接進行重構以便于在本地進行瀏覽。
官網:http://www.httrack.com/
1,安裝
# yum install httrack –y
# httrack
2,使用用法
2.1)可以直接使用命令行進行爬取。
usage: httrack [-option] [+] [-] [+] [-]
#httrack "http://www.linuxea.com" -O "/web/www.linuxea.com" "+*.linuxea.com*" –v
2.2)也可以使用交互界面來爬取
# httrack
Welcome to HTTrack Website Copier (Offline Browser) 3.48-21
Copyright (C) 1998-2015 Xavier Roche and other contributors
To see the option list, enter a blank line or try httrack --help
Enter project name :linuxea
Base path (return=/root/websites/) :/linuxea
Enter URLs (separated by commas or blank spaces) :www.linuxea.com
Action:
(enter) 1 Mirror Web Site(s)
2 Mirror Web Site(s) with Wizard
3 Just Get Files Indicated
4 Mirror ALL links in URLs (Multiple Mirror)
5 Test Links In URLs (Bookmark Test)
0 Quit
: 2
Proxy (return=none) :
You can define wildcards, like: -*.gif +www.*.com/*.zip -*img_*.zip
Wildcards (return=none) :
You can define additional options, such as recurse level (-r), separed by blank spaces
To see the option list, type help
Additional options (return=none) :
---> Wizard command line: httrack www.linuxea.com -W -O "/linuxea/linuxea" -%v
Ready to launch the mirror? (Y/n) :yes
WARNING! You are running this program as root!
It might be a good idea to run as a different user
Mirror launched on Fri, 07 Oct 2016 03:22:51 by HTTrack Website Copier/3.48-21 [XR&CO'2014]
3,檢查爬取后的情況
總結
以上是生活随笔為你收集整理的linux网站爬取,Kali下httrack 爬取网站页面的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: linux cpu平均负载,关于linu
- 下一篇: c语言编译机器码,[转载]单片机C语言到