Seleniumメモ - pikesaku’s blog

参考

seleniumの2つの待機方法 - ストックドッグ
 Selenium API(逆引き)
4. 要素を見つける — Selenium Python Bindings 2 ドキュメント
 Selenium webdriverよく使う操作メソッドまとめ - Qiita
4. Locating Elements — Selenium Python Bindings 2 documentation

要素特定のポイント

大きく2つ方法があり。

①パブリックメソッド
find_element(s)_by_XXXXX(〜)
②プライベートメソッド
find_element(s)(By.XXXXX, 〜)

参考引用

上記のパブリックメソッドとは別に、ページオブジェクト内のロケータで便利なプライベートメソッドが2つあります。これらは、find_elementおよびfind_elementsの2つのプライベートメソッドです。

xpathは重要。xpahtはXMLのデータ特定言語。ID等で対象を特定できない場合に便利。

参考引用

<html>
 <body>
  <form id="loginForm">
   <input name="username" type="text" />
   <input name="password" type="password" />
   <input name="continue" type="submit" value="Login" />
   <input name="continue" type="button" value="Clear" />
  </form>
</body>
<html>

上記のid="loginForm"を指定したい時は以下記述が可能。"/"はツリー指定。"//"はツリー省略記述。

login_form = driver.find_element_by_xpath("/html/body/form[1]")
login_form = driver.find_element_by_xpath("//form[1]")
login_form = driver.find_element_by_xpath("//form[@id='loginForm']")

待機も重要

5. 待機 — Selenium Python Bindings 2 ドキュメント

サンプルコード

JVN iPediaから2018年度のレッドハットのCVSS V3で深刻度が9以上の脆弱性出力

# -*- coding: utf-8 -*-

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time


def check_by_xpath(wd, xpath):
    em = wd.find_element_by_xpath(xpath)
    em.click()


def select_by_text(wd, select_tag, text):
    # nameで指定
    em = wd.find_element_by_name(select_tag)
    # selectタグの値をテキストで指定
    em = Select(em)
    # optionsでselect選択肢を取得可能
    #for i in em.options:
    #    print(i.text)
    em.select_by_visible_text(text)


def click_by_link_text(wd, text):
    # リンクテキストで指定
    em = wd.find_element_by_link_text(text)
    em.click()


def click_by_name(wd, name):
    # リンクテキストで指定
    em = wd.find_element_by_name(name)
    em.click()


def scraping():
    options = webdriver.ChromeOptions()
#    options.add_argument('--headless')
    wd = webdriver.Chrome(options=options)
    wd.implicitly_wait(20)
    wd.get('https://jvndb.jvn.jp/')
    click_by_link_text(wd, '詳細検索')
    select_by_text(wd, 'vendor', 'レッドハット')
    select_by_text(wd, 'product', 'Red Hat Enterprise Linux Server')
    select_by_text(wd, 'datePublicFromYear', '2018')
    select_by_text(wd, 'datePublicFromMonth', '04')
    select_by_text(wd, 'datePublicToYear', '2019')
    select_by_text(wd, 'datePublicToMonth', '03')
    check_by_xpath(wd, '//input[@class=' + '"cvss_v3" and @value="01"]')
    click_by_name(wd, 'search')
    time.sleep(10)
    wd.quit()


if __name__ == '__main__':
    scraping()