Skip to content

08brt/scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraper Application

Overview

The Scraper Application is a Spring Boot-based project designed to automate the process of scraping business details from Google Maps, extracting contact emails from business websites, and sending promotional emails to these businesses. It utilizes the Google Places API for data extraction and processes the information to manage and execute email communications. The email body content is dynamically generated by ChatGPT to ensure engagement and to reduce the risk of emails being flagged as spam. The application is designed to run as a scheduled task, allowing for regular updates and communications with newly scraped businesses.

Features

  • Automated Data Scraping: Retrieves business information from Google Maps based on specified locations and keywords.
  • Email Extraction: Scrapes emails from the websites of the scraped businesses.
  • Email Communication: Sends promotional emails to businesses using predefined email templates and AI-generated content from ChatGPT.
  • Status Management: Tracks and updates the status of businesses throughout the scraping and email-sending processes.

Application Workflow

  • Data Scraping: The ScrapedBusinessProcessor retrieves unprocessed locations and fetches business details using the Google Maps API. These details are stored in the database as ScrapedBusiness entities.

  • Email Extraction: The ScrapedEmailProcessor processes businesses with websites, extracting emails using the Jsoup library and updates the ScrapedBusiness entities.

  • Email Sending: The SendEmailProcessor retrieves businesses with email addresses and sends promotional emails using predefined templates and content generated by ChatGPT. The status of each business is updated accordingly.

  • Error Handling: Throughout the process, any errors encountered are logged, and the status of the affected business is updated to reflect the issue.

Setup and Usage

Prerequisites

  • Java 17
  • Maven
  • Docker (for running PostgreSQL integration tests)

Requirements

  • Google Cloud Platform (GCP) API Key: Required for accessing the Google Places API to scrape business details.
  • OpenAI ChatGPT API Key: Required for generating dynamic email content.

Configuration

  1. Clone the repository.

  2. Configure the application properties in src/main/resources/application.properties to include your Google API key, OpenAI ChatGPT API key, database connection details, and other necessary configurations. Here is an example of the configuration:

    # Google Places API Key
    google.api.key=YOUR_GCP_API_KEY
    
    # OpenAI ChatGPT API Key
    openai.api.key=YOUR_CHATGPT_API_KEY
    
    # Database Configuration
    spring.datasource.url=jdbc:postgresql://localhost:5432/yourdatabase
    spring.datasource.username=yourusername
    spring.datasource.password=yourpassword
    
    
  3. Execute the SQL scripts provided in the Database Schema section to set up the required database tables.

Running the Application

  1. Build the project using Maven:

    mvn clean install
    
  2. Run the application (CORE):

    mvn spring-boot:run -pl :core
    
  3. Run the application (SCHEDULED):

    mvn spring-boot:run -pl :scheduled

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages