Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: python programming error  (Read 107490 times)

0 Members and 1 Guest are viewing this topic.

Organ

    Topic Starter


    Rookie

    • Experience: Beginner
    • OS: Windows 7
    python programming error
    « on: July 22, 2021, 02:03:11 AM »
    Code: [Select]

    import requests
    from bs4 import BeautifulSoup
    url = "https://[Moderator edit: host removed]/bizhitupian/meinvbizhi/yangyanmeinv.htm"
    dicc = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"}
    a = requests.get(url, headers = dicc)
    a.encoding = 'utf-8'
    b = BeautifulSoup(a.text,'html.parser')
    c = b.find("div", class_="Typelist").find_all("a")
    print(c)

    AttributeError: 'NoneType' object has no attribute 'find_all'

    ____
    Moderator edit: removed unknown domain from code example
    « Last Edit: July 22, 2021, 06:07:57 AM by nil »

    nil

    • Global Moderator


    • Intermediate
    • Thanked: 15
      • Experience: Experienced
      • OS: Linux variant
      Re: python programming error
      « Reply #1 on: July 22, 2021, 05:49:32 AM »
      The code looks OK, my guess is that your html does not contain a div with class="Typelist". BS4 would return None for b.find(), then try to run None.find_all("a") and you would get that error.

      But the code itself should be fine, this works:

      Code: [Select]
      import requests
      from bs4 import BeautifulSoup
      url = "https://www.computerhope.com"
      dicc = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"}
      a = requests.get(url, headers = dicc)
      a.encoding = 'utf-8'
      b = BeautifulSoup(a.text,'html.parser')
      c = b.find("div", class_="skip").find_all("a")
      print(c)

      output:

      Code: [Select]
      [<a href="#main-content">Skip to Main Content</a>]
      If you're processing multiple files, and some of them might not have a <div class="Typelist"> then you should capture the b.find() return value separately, and branch if it doesn't have a value e.g.

      Code: [Select]
      b = BeautifulSoup(a.text,'html.parser')
      bf = b.find("div", class_="Typelist")
      if bf:      # if the result of b.find() was truthy, then
          c = bf.find_all("a")
          print(c)
      else:       # otherwise, bf is falsy (None, False, zero, empty string, etc.), so run this code instead
          print("Typelist not found")

      https://docs.python.org/3/library/stdtypes.html#truth-value-testing
      Do not communicate by sharing memory; instead, share memory by communicating.

      --Effective Go