Contents
Overview
Key Features
- Dual-Site Navigation with Selenium: The scraper used Selenium to drive a headless browser against both ford.ca and fordtodealers.ca in a single run. Pricing was captured at two points on each site: the navigation menu, which surfaces prices at the model level, and the individual model page, which shows prices at the trim level. Selenium was required because pricing content on both sites is rendered by JavaScript after page load and is not present in the initial HTML response.
- Trim-Level Price Extraction: For each model, the scraper collected the price of every available trim from each site, building a structured dataset keyed by model and trim name that could be joined for comparison.
- Price Comparison and Diff Calculation: Once data was collected from both sites, the scraper joined the datasets by model and trim, computed the price difference for each row, and flagged any discrepancies. Trims present on one site but absent on the other were captured separately rather than silently dropped.
- Conditional Email Subject Line: The email subject line included the word "Mismatch" only when at least one price discrepancy was detected. A subject line without "Mismatch" signalled immediately — before opening the email — that all prices were in sync for that day's run.
- Summary Table with Anchor Links: The email opened with a summary table listing every model and its comparison status. Each model name in the table was an anchor link that jumped directly to that model's section further down in the email, allowing the recipient to scan the summary and navigate to a specific mismatch without scrolling through the full report.
- Red Mismatch Badges: Any model with at least one price discrepancy was marked with a red badge in the summary table, making mismatches immediately visible at a glance. Models where all prices matched carried no badge — reducing visual noise and keeping the focus on the rows that needed attention.
- Per-Model Source Links: Each model section in the email included source links to the exact pages on ford.ca and fordtodealers.ca from which the prices were scraped. The recipient could click directly to the live page to verify a price or investigate a discrepancy, without having to navigate the site manually.
- Back-to-Top Anchor Links: Each model section ended with a "back to top" anchor link returning the reader to the summary table. On a long email covering many models, this made it practical to work through mismatches one at a time without losing context.
- Containerised with Docker: The entire scraper — Python runtime, Selenium, Chrome driver, and dependencies — was packaged into a Docker container, making the environment fully reproducible and removing any dependency on the host machine's configuration.
- Azure Container Instances: Hosting in Azure removed the need for a dedicated local machine to run the scraper. A locally-hosted solution would have required a machine to be on and available at the scheduled time every day — unreliable and impractical for a side project. Running on ACI meant the pipeline would execute reliably regardless of any local environment. The container spun up, executed the full scrape-compare-email pipeline in under two minutes, and exited. Because ACI bills only for the time the container is actually running, the execution cost was minimal — at most a couple of minutes of compute per day.
- Azure Logic Apps Trigger: A Logic Apps workflow triggered the container once per day on a fixed schedule, replacing any need for an always-running cron job or dedicated scheduler.
Screenshots
fordtodealers.ca
Dealer Portal