爬虫利器 xpath 实践案例_xpath 如何不取a标签中的最后一个-CSDN博客

本文链接：https://blog.csdn.net/csdn_leidada/article/details/121694384

xpath

. 表示当前 //表示当前标签下的所有标签
注：要配合使用
/@匹配某标签的属性值： /@属性名称
获取弟节点- following-sibling::
获取兄弟节点- preceding-sibling::
获取父节点- parent::
排除某个class属性标签 [not(contains(@class, "disabled"))]
排除某个标签 [name(.)!="a"]
忽略xpath中的第一个和最后一个元素 [position()> 1] 和 [position()< last()]
多标签分隔字符串查找 text()=" 楼" and text()="栋"and text()="数： "]
排除有class属性标签 //tbody/tr[not(@class)]
去除字符串空格换行 normalize-space()
模糊查找标签内容字符保护层的元素 //div[@class="tr-line clearfix"]/div/div[contains(text(),"层")]/text()
匹配包含多个属性的标签: and

匹配所有的tr中不包含 tbhead 属性和包含 head 的tr标签
xpath(’//table/tr[not(@class=“tbhead”) and @class=“head”]’)

获取a标签最后一个元素 a[last()]
在一个xpath中写的多个表达式用 | 分开，每个表达式互不干扰。
例：xpath("//tr[6]/td[2]/text() | //tr[7]/td[2]/text()")
样式模糊 [contains(@class,'Number Skill')]
返回标签内包含字符串 string(//p)
starts-with 定位属性值id以cb开头的a元素

$x("//a[starts-with(@id,‘cb’)]/text()")

$x("//a/text()") # 获取页面所有a标签包含的字符