當前位置：首頁 > 编程语言 > c/c++ >内容正文

c/c++

c++ curl 超时_cc++写网络爬虫，curl+gumbo配合使用

發布時間：2024/8/23 c/c++ 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 c++ curl 超时_cc++写网络爬虫，curl+gumbo配合使用小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

是的，你沒有聽錯。就是用c++或者說c語言寫爬蟲。

其實不難，雖然沒有Python寫起來那么簡單。但是也不是那么復雜啦，畢竟好多大佬都寫了那么多庫，我們只要會用大佬寫的庫就行。

網址：https://acm.sjtu.edu.cn/OnlineJudge/status

我們就爬取這個頁面的評審狀態的所有內容。

代碼如下：

iostreamfstream c.nodeNum(); i++)\n\t{\n\t\tfor (int j = 0; j > c.nodeAt(i).childNum(); j++)\n\t\t{\n\t\t\tCNode nd = c.nodeAt(i).childAt(j);\n\t\t\tcout >> MyStringFormat::UTF_82ASCII(nd.text()).c_str() >> \" \";\n\t\t}\n\t\tcout >> endl;\n\t}\n}\n\nstatic size_t OnWriteData(void* buffer, size_t size, size_t nmemb, void* lpVoid)\n{\n\tstring* str = dynamic_cast>string*#include #include #include "gumbo/Document.h"#include "gumbo/Node.h"#include "MyStringFormat.h"#include "curl/curl.h"using namespace std;#define URL_REFERER "https://acm.sjtu.edu.cn/OnlineJudge/"void printFunc(string page){ CDocument doc; doc.parse(page.c_str()); CSelection c = doc.find("#status tr"); for (int i = 0; i < c.nodeNum(); i++) { for (int j = 0; j < c.nodeAt(i).childNum(); j++) { CNode nd = c.nodeAt(i).childAt(j); cout << MyStringFormat::UTF_82ASCII(nd.text()).c_str() << " "; } cout << endl; }}static size_t OnWriteData(void* buffer, size_t size, size_t nmemb, void* lpVoid){ string* str = dynamic_cast<string*>((string *)lpVoid); if (NULL == str || NULL == buffer) { return -1; } char* pData = (char*)buffer; str->append(pData, size * nmemb); return nmemb;}bool HttpRequest(const char* url, string& strResponse, bool get/* = true*/, const char* headers/* = NULL*/, const char* postdata/* = NULL*/, bool bReserveHeaders/* = false*/, int timeout/* = 10*/){ CURLcode res; CURL* curl = curl_easy_init(); if (NULL == curl) { return false; } curl_easy_setopt(curl, CURLOPT_URL, url); //響應結果中保留頭部信息 if (bReserveHeaders) curl_easy_setopt(curl, CURLOPT_HEADER, 1); curl_easy_setopt(curl, CURLOPT_COOKIEFILE, ""); curl_easy_setopt(curl, CURLOPT_READFUNCTION, NULL); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, OnWriteData); curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void *)&strResponse); curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1); //設定為不驗證證書和HOST //curl_easy_setopt(curl, CURLOPT_PROXY, "127.0.0.1:8888");//設置代理 //curl_easy_setopt(curl, CURLOPT_PROXYPORT, 9999); //代理服務器端口 curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, false); curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, false); //設置超時時間 curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, timeout); curl_easy_setopt(curl, CURLOPT_TIMEOUT, timeout); curl_easy_setopt(curl, CURLOPT_REFERER, URL_REFERER); curl_easy_setopt(curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"); //不設置接收的編碼格式或者設置為空，libcurl會自動解壓壓縮的格式，如gzip //curl_easy_setopt(curl, CURLOPT_ACCEPT_ENCODING, "gzip, deflate, br"); //設置hostConnection: Keep-Alive struct curl_slist *chunk = NULL; chunk = curl_slist_append(chunk, "Host: acm.sjtu.edu.cn"); chunk = curl_slist_append(chunk, "Connection: Keep-Alive"); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk); //添加自定義頭信息 if (headers != NULL) { chunk = curl_slist_append(chunk, headers); curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk); } if (!get && postdata != NULL) { curl_easy_setopt(curl, CURLOPT_POSTFIELDS, postdata); } res = curl_easy_perform(curl); bool bError = false; if (res == CURLE_OK) { int code; res = curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &code); if (code != 200 && code != 302) { bError = true; } } else { bError = true; } curl_easy_cleanup(curl); return !bError;}int main(int argc, char * argv[]){ string response; HttpRequest("https://acm.sjtu.edu.cn/OnlineJudge/status", response, true, NULL, NULL, false, 10); printFunc(response); system("pause"); return 0;}

我知道，我貼出這些代碼，也沒法運行，所以我把工程文件也發出來。為了不被大家說我騙積分，我的所有東西都貼出百度云鏈接。

鏈接：https://pan.baidu.com/s/1jBZ-6tT-4ne0uTMw4jFvKA?
提取碼：pmg6?

喜歡的歡迎關注我的公眾號，歡迎關注我的csdn：wu_lian_nan

總結

以上是生活随笔為你收集整理的c++ curl 超时_cc++写网络爬虫，curl+gumbo配合使用的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： linux内核态获取ip地址,Linux
下一篇：进阶清单 | 这份码农修炼指南，助你掌控