生活随笔
收集整理的這篇文章主要介紹了
java爬取小说
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
使用Java爬取網站:http://www.shicimingju.com的小說內容
代碼詳解
1.在本地創建存儲位置
2.編寫正則表達式
3.循環獲取內容
4.把內容存入文件夾中
5.判斷成功或失敗
效果演示
代碼展示
package text
;import java
.io
.BufferedReader
;import java
.io
.BufferedWriter
;import java
.io
.File
;import java
.io
.FileOutputStream
;import java
.io
.InputStreamReader
;import java
.io
.OutputStreamWriter
;import java
.net
.URL
;import java
.util
.regex
.Matcher
;import java
.util
.regex
.Pattern
;public class text {public static void main(String
[] args
) {File file
= new File("D:\\Text\\text.txt");String regex_content
= "<p.*?>(.*?)</p>";String regex_title
= "<title>(.*?)</title>";Pattern p_content
= Pattern
.compile(regex_content
);Pattern p_title
= Pattern
.compile(regex_title
);Matcher m_content
;Matcher m_title
;for (int i
= 1; i
<= 120; i
++) {System
.out
.println("第" + i
+ "章開始下載。。。");try {URL url
= new URL("http://www.shicimingju.com/book/sanguoyanyi/" + i
+ ".html");BufferedReader reader
= new BufferedReader(new InputStreamReader(url
.openStream(), "utf8"));String str
= null
;BufferedWriter writer
= new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file
, true)));while ((str
= reader
.readLine()) != null
) {m_title
= p_title
.matcher(str
.toString());m_content
= p_content
.matcher(str
.toString());boolean isEx
= m_title
.find();if (isEx
) {String title
= m_title
.group();title
= title
.replace("<title>", "").replace("</title>", "");System
.out
.println(title
);writer
.write("第" + i
+ "章:" + title
+ "\n");}while (m_content
.find()) {String content
= m_content
.group();content
= content
.replace("<p>", "").replace("</p>", "").replace(" ", "").replace("?", "");writer
.write(content
+ "\n");}}System
.out
.println("第" + i
+ "章下載完成.........");writer
.write("\n\n");writer
.close();reader
.close();} catch (Exception e
) {System
.out
.println("很遺憾,本次下載失敗!!!");e
.printStackTrace();}}}}
了解更多關注我喲!!!
總結
以上是生活随笔為你收集整理的java爬取小说的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。