解析poj页面获取题目
生活随笔
收集整理的這篇文章主要介紹了
解析poj页面获取题目
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
頁面是這樣的:http://poj.org/problem?id=3334
? ? 要從這樣的頁面里面提取題目標題,時間限制,內存限制,題目描述,輸入,輸出,示例輸入,示例輸出,提示,來源等信息,獲取必要的題目中的圖片。
#!/usr/bin/env?python#coding=utf-8
from?BeautifulSoup?import?BeautifulSoup
import?urllib
import?re
def?getpojhtml(pid):
????url?=?"http://poj.org/problem?id="+str(pid)
????html?=?urllib.urlopen(url)? ??
????soup?=?BeautifulSoup(html)
????title?=?soup.title.string[7:]
????time_limit?=?soup.findAll(text?=?re.compile("Time?Limit"))[0].next
????mem_limit?=?soup.findAll(text?=?re.compile("Memory?Limit"))[0].next
????description?=?soup.findAll(text?=?re.compile("Description"))[0].next.contents
????input?=?soup.findAll(text?=?re.compile("Input"))[0].next.contents
????output?=?soup.findAll(text?=?re.compile("Output"))[0].next.contents
????sim_input?=?soup.findAll(text?=?re.compile("Sample?Input"))[0].next.contents
????sim_output?=?soup.findAll(text?=?re.compile("Sample?Output"))[0].next.contents
????try:
????????hint?=?soup.findAll(text?=?re.compile("Hint"))[0].next.contents
????except:
????????hint?=?[]
????try:
????????source?=?soup.findAll(text?=?re.compile("Source"))[0].next.contents
????except?:
????????source?=?[]
????pattern?=?re.compile('images/\d{4}[.\w]*')
????pic?=??pattern.findall(html)
????pic_url=[]
????for?item?in?pic:
????????pic_url.append(?'http://poj.org/'+str(item))
????return?title,time_limit,mem_limit,description,input,output,sim_input,sim_output,hint,source,pic_url
if?__name__=='__main__':
????ret?=?getpojhtml(3344)
????for?item?in?ret:
? ? ? ? print?item?
實現方案
? ? 首先用urllib模塊獲取整個頁面,然后用beautifulsoup來解析,由于個別頁面沒有hint或者source,所以用try避免出錯退出
? ? 圖片可以選擇用beautifulsoup來解析,但是我還是選擇了用正則表達式來解析,因為用正則表達式可以準確地定位到題目描述中的圖片,而beautifulsoup把整個頁面中的所有圖片都找出來了,有些并不是我需要的。??
運行結果?
?Chessboard?Dance?2000MS
?65536K
[<div><p>Another?boring?Friday?afternoon,?Betty?the?Beetle?thinks?how?to?amuse?herself.?She?goes?out?of?her?hiding?place?to?take?a?walk?around?the?living?room?in?Bennett's?house.?Mr.?and?Mrs.?Bennett?are?out?to?the?theatre?and?there?is?a?chessboard?on?the?table!?"The?best?time?to?practice?my?chessboard?dance,"?Betty?thinks!?She?gets?so?excited?that?she?does?not?note?that?there?are?some?pieces?left?on?the?board?and?starts?the?practice?session!?She?has?a?script?showing?her?how?to?move?on?the?chessboard.?The?script?is?a?sequence?like?the?following?example:</p><p><center><img?src="images/3344_1.GIF"?/></center></p><p>At?each?instant?of?time?Betty,?stands?on?a?square?of?the?chessboard,?facing?one?of?the?four?directions?(up,?down,?left,?right)?when?the?board?is?viewed?from?the?above.?Performing?a?"move?<i>n</i>"?instruction,?she?moves?<i>n</i>?squares?forward?in?her?current?direction.?If?moving?<i>n</i>?squares?goes?outside?the?board,?she?stays?at?the?last?square?on?the?board?and?does?not?go?out.?There?are?three?types?of?turns:?turn?right,?turn?left,?and?turn?back,?which?change?the?direction?of?Betty.?Note?that?turning?does?not?change?the?position?of?Betty.</p><p>If?Betty?faces?a?chess?piece?when?moving,?she?pushes?that?piece,?together?with?all?other?pieces?behind?(a?tough?beetle?she?is!).?This?may?cause?some?pieces?fall?of?the?edge?of?the?chessboard,?but?she?doesn't?care!?For?example,?in?the?following?figure,?the?left?board?shows?the?initial?state?and?the?right?board?shows?the?state?after?performing?the?script?in?the?above?example.?Upper-case?and?lower-case?letters?indicate?the?white?and?black?pieces?respectively.?The?arrow?shows?the?position?of?Betty?along?with?her?direction.?Note?that?during?the?first?move,?the?black?king?(r)?falls?off?the?right?edge?of?the?board!</p><p><center><img?src="images/3344_2.GIF"?/></center></p><p>You?are?to?write?a?program?that?reads?the?initial?state?of?the?board?as?well?as?the?practice?dance?script,?and?writes?the?final?state?of?the?board?after?the?practice.</p></div>]
[<div><p>There?are?multiple?test?cases?in?the?input.?Each?test?case?has?two?parts:?the?initial?state?of?the?board?and?the?script.?The?board?comes?in?eight?lines?of?eight?characters.?The?letters?r,?d,?t,?a,?c,?p?indicate?black?pieces,?R,?D,?T,?A,?C,?P?indicate?the?white?pieces?and?the?period?(dot)?character?indicates?an?empty?square.?The?square?from?which?Betty?starts?dancing?is?specified?by?one?of?the?four?characters?<,?>,?^,?and?v?which?also?indicates?her?initial?direction?(left,?right,?up,?and?down?respectively).?Note?that?the?input?is?not?necessarily?a?valid?chess?game?status.</p><p>The?script?comes?immediately?after?the?board.?It?consists?of?several?lines?(between?0?and?1000).?In?each?line,?there?is?one?instruction?in?one?of?the?following?formats?(<i>n</i>?is?a?non-negative?integer?number):</p><p>move?<i>n</i><br?/>turn?left<br?/>turn?right<br?/>turn?back</p><p>At?the?end?of?each?test?case,?there?is?a?line?containing?a?single?#?character.?The?last?line?of?the?input?contains?two?dash?characters.</p></div>]
[<p>The?output?for?each?test?case?should?show?the?state?of?the?board?in?the?same?format?as?the?input.?Write?an?empty?line?in?the?output?after?each?board.</p>]
[u'.....c..\r\n.p..A..t\r\nD..>T.Pr\r\n....aP.P\r\np.d.C...\r\n.....p.R\r\n........\r\n........\r\nmove?2\r\nturn?right\r\nmove?3\r\nturn?left\r\nturn?left\r\nmove?1\r\n#\r\n--\r\n']
[u'.....c..\r\n.p..A..t\r\nD.....TP\r\n....a..P\r\np.d.C^..\r\n.......R\r\n.....P..\r\n.....p..\r\n']
[]
[<a?href="searchproblem?field=source&key=Tehran+2006">Tehran?2006</a>]
['http://poj.org/images/3344_1.GIF',?'http://poj.org/images/3344_2.GIF']?
博主ma6174對本博客文章(除轉載的)享有版權,未經許可不得用于商業用途。轉載請注明出處http://www.cnblogs.com/ma6174/
對文章有啥看法或建議,可以評論或發電子郵件到ma6174@163.com
本文轉自ma6174博客園博客,原文鏈接:http://www.cnblogs.com/ma6174/archive/2012/08/04/2623159.html,如需轉載請自行聯系原作者
總結
以上是生活随笔為你收集整理的解析poj页面获取题目的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 支付宝为何放弃社交梦?
- 下一篇: 最大回文长度