HtmlUnit设置代理并解析IFrame页面
生活随笔
收集整理的這篇文章主要介紹了
HtmlUnit设置代理并解析IFrame页面
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
1、需求:支持代理設(shè)置訪問,并采集網(wǎng)頁(yè)下iframe框架內(nèi)的頁(yè)面內(nèi)容
2、參考代碼如下:
package com;import java.io.FileInputStream;import org.apache.poi.hssf.usermodel.HSSFRow; import org.apache.poi.hssf.usermodel.HSSFSheet; import org.apache.poi.hssf.usermodel.HSSFWorkbook; import org.apache.poi.poifs.filesystem.POIFSFileSystem;import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController; import com.gargoylesoftware.htmlunit.ProxyConfig; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.DomNodeList; import com.gargoylesoftware.htmlunit.html.HtmlElement; import com.gargoylesoftware.htmlunit.html.HtmlPage;public class EBayHU {public static void main(String[] args) {//打開excel表,準(zhǔn)備采集入表try { //創(chuàng)建一個(gè)webclientWebClient webClient = new WebClient(BrowserVersion.CHROME); // 啟動(dòng)JSwebClient.getOptions().setJavaScriptEnabled(true); //忽略ssl認(rèn)證webClient.getOptions().setUseInsecureSSL(true);//禁用Css,可避免自動(dòng)二次請(qǐng)求CSS進(jìn)行渲染webClient.getOptions().setCssEnabled(false);//運(yùn)行錯(cuò)誤時(shí),不拋出異常webClient.getOptions().setThrowExceptionOnScriptError(false);// 設(shè)置Ajax異步webClient.setAjaxController(new NicelyResynchronizingAjaxController());//設(shè)置代理ProxyConfig proxyConfig = webClient.getOptions().getProxyConfig(); proxyConfig.setProxyHost("IP"); proxyConfig.setProxyPort(port);//獲取頁(yè)面HtmlPage page = webClient.getPage("url"); webClient.waitForBackgroundJavaScript(10000);//商品標(biāo)題HtmlElement itemTitle =page.getHtmlElementById("itemTitle");System.out.println(itemTitle.asText());//商品圖片HtmlElement propic =page.getHtmlElementById("vi_main_img_fs");DomNodeList<HtmlElement> picnodes=propic.getElementsByTagName("img");for(int m=0;m<picnodes.size();m++){HtmlElement pic=picnodes.get(m);page=(HtmlPage)pic.click();webClient.waitForBackgroundJavaScript(10000);HtmlElement bigpic =page.getHtmlElementById("icImg");String bigpicsrc=bigpic.getAttribute("src");System.out.println(bigpicsrc);}//賣家信息HtmlElement seller =page.getHtmlElementById("mbgLink");String href=seller.getAttribute("href");System.out.println(href);System.out.println(seller.asText());//商品詳情HtmlElement descifr =page.getHtmlElementById("desc_ifr");//切換到iframeString src=descifr.getAttribute("src");HtmlPage ifrpage=webClient.getPage(src);//讀取iframe網(wǎng)頁(yè)webClient.waitForBackgroundJavaScript(10000);HtmlElement desc =ifrpage.getHtmlElementById("desc");System.out.println(desc.asText());}catch (Exception e) {System.err.println( "Exception: " + e ); }} }總結(jié)
以上是生活随笔為你收集整理的HtmlUnit设置代理并解析IFrame页面的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Hadoop平台安全机制Kerberos
- 下一篇: 分布式发布订阅消息系统Kafka单实例测