Scraping of Fake Jobs Webpage in Python with Beautiful Soup


Fake Jobs Webpage.

Exhibit 25.33 Fake Jobs webpage.

The Fake Jobs Webpage is a mock website often used for educational purposes, particularly in web scraping tutorials and exercises. As shown in Exhibit 25.33 the site simulates a real job listing site but with fictitious data, providing a safe and controlled environment for learning and practicing web scraping techniques.

Source code of Fake Jobs Webpage.

Exhibit 25.34 Source code of Fake Jobs’ webpage.

As can be seen from the source code of Fake Jobs’ webpage, job details appear within card-content div tags. To scraper will have to find and scrape the job details from all of these tags.

Exhibit 25.35 demonstrates the scraping of the Fake Jobs Webpage with the Beautiful Soup Python library. The process flow is as follows:

  1. Send a Request: Use libraries like requests to fetch a webpage.
  2. Parse the HTML: Use Beautiful Soup to parse the HTML content.
  3. Navigate the Parse Tree: Access the desired data by navigating through tags, attributes, or text.
  4. Extract and Use Data: Extract the data and use it for your application, such as saving it to a file or database.

Scraping Fake Jobs Webpage
import requests
from bs4 import BeautifulSoup

# (1) Send Request
URL = "https://realpython.github.io/fake-jobs/" # page URL
page = requests.get(URL) # get the webpage

# (2) Parse the HTML
soup = BeautifulSoup(page.content, "html.parser") # parse page

# (3) Navigate the Parse Tree
results = soup.find(id="ResultsContainer") # jobs are listed in this div ResultsContainer
# print(results.prettify())

# card-content divs contain the job details 
job_elements = results.find_all("div", class_="card-content")

# (4) Extract and Print Jobs Data
# for job_element in job_elements:
#     print(job_element, end="\n"*2)

# use lambda function to extract jobs titles containing the string 'python'
python_jobs = results.find_all(
    "h2", string=lambda text: "python" in text.lower()
)

print(len(python_jobs)) # number of jobs listed on page

# fetch great-grandparent elements to get access to all the information you want
python_job_elements = [
    h2_element.parent.parent.parent for h2_element in python_jobs
]

# print out the job detail - title, company and location
for job_element in python_job_elements:
    title_element = job_element.find("h2", class_="title")
    company_element = job_element.find("h3", class_="company")
    location_element = job_element.find("p", class_="location")
    print(title_element.text.strip())
    print(company_element.text.strip())
    print(location_element.text.strip())
    
    '''
    Print all links
    links = job_element.find_all("a")
    for link in links:
         print(link.text.strip())
         link_url = link["href"]
         print(f"Apply here: {link_url}\n")
    '''
        
    # Print only the 2nd link
    link_url = job_element.find_all("a")[1]["href"]
    print(f"Apply here: {link_url}\n")

    print()
10
Senior Python Developer
Payne, Roberts and Davis
Stewartbury, AA
Apply here: https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html


Software Engineer (Python)
Garcia PLC
Ericberg, AE
Apply here: https://realpython.github.io/fake-jobs/jobs/software-engineer-python-10.html


Python Programmer (Entry-Level)
Moss, Duncan and Allen
Port Sara, AE
Apply here: https://realpython.github.io/fake-jobs/jobs/python-programmer-entry-level-20.html


Python Programmer (Entry-Level)
Cooper and Sons
West Victor, AE
Apply here: https://realpython.github.io/fake-jobs/jobs/python-programmer-entry-level-30.html


Software Developer (Python)
Adams-Brewer
Brockburgh, AE
Apply here: https://realpython.github.io/fake-jobs/jobs/software-developer-python-40.html


Python Developer
Rivera and Sons
East Michaelfort, AA
Apply here: https://realpython.github.io/fake-jobs/jobs/python-developer-50.html


Back-End Web Developer (Python, Django)
Stewart-Alexander
South Kimberly, AA
Apply here: https://realpython.github.io/fake-jobs/jobs/back-end-web-developer-python-django-60.html


Back-End Web Developer (Python, Django)
Jackson, Ali and Mckee
New Elizabethside, AA
Apply here: https://realpython.github.io/fake-jobs/jobs/back-end-web-developer-python-django-70.html


Python Programmer (Entry-Level)
Mathews Inc
Robertborough, AP
Apply here: https://realpython.github.io/fake-jobs/jobs/python-programmer-entry-level-80.html


Software Developer (Python)
Moreno-Rodriguez
Martinezburgh, AE
Apply here: https://realpython.github.io/fake-jobs/jobs/software-developer-python-90.html

Exhibit 25.35 Scraping of Fake Jobs Webpage in Python with Beautiful Soup. Jupyter notebook.


Previous     Next

Use the Search Bar to find content on MarketingMind.