re.search实例

提取csdn帖子地址

foundLastListPageUrl = re.search('<a\s+?href="(?P<lastListPageUrl>/\w+?/article/list/\d+)">尾页</a>', homeRespHtml, re.I)
logging.debug("foundLastListPageUrl=%s", foundLastListPageUrl)

if(foundLastListPageUrl):
    lastListPageUrl = foundLastListPageUrl.group("lastListPageUrl")

详见:

https://github.com/crifan/BlogsToWordpress/blob/master/libs/crifan/blogModules/BlogCsdn.py

从内容中

<a href="/chenglinhust/article/list/22">尾页</a>

提取出

/chenglinhust/article/list/22

提取csdn帖子的标题

foundTitle = re.search('<span class="link_title"><a href="[\w/]+?">\s*(<font color="red">\[置顶\]</font>)?\s*(?P<titleHtml>.+?)\s*</a>\s*</span>', html, re.S)
logging.debug("foundTitle=%s", foundTitle)

if(foundTitle):
    titleHtml = foundTitle.group("titleHtml")
    logging.debug("titleHtml=%s", titleHtml)

详见:

https://github.com/crifan/BlogsToWordpress/blob/master/libs/crifan/blogModules/BlogCsdn.py

从内容中

<span class="link_title"><a href="/v_july_v/article/details/6543438">
<font color="red">[置顶]</font>
程序员面试、算法研究、编程艺术、红黑树4大系列集锦与总结
</a></span>

<span class="link_title"><a href="/chdhust/article/details/7252155">
         windows编程中wParam和lParam消息        

         </a>
         </span>

提取出

程序员面试、算法研究、编程艺术、红黑树4大系列集锦与总结

windows编程中wParam和lParam消息

results matching ""

    No results matching ""