-- CH-01: WORK

Pipeline runs.

Every project in pipeline format. Input, process, output. Grouped by signal type. The full chain from source to delivery.

5 Scraping and Data Extraction 2 Data Pipelines and Automations 2 Analytics and Dashboards 2 Predictions / Modeling 2 Environment / GIS
CLUSTER 01 -- 5 PROJECTS

Scraping and Data Extraction

[JOB 001]

Reddit Job Intelligence Platform

COMPLETE

Built a full scraping and intelligence system that monitors Reddit job communities in real time, classifies posts using NLP, and serves insights through a live dashboard. Designed to cut through noise and surface actionable job market signals.

INPUT Reddit job communities -- dynamic content, pagination, anti-bot barriers
PROCESS Selenium + BeautifulSoup scraper -- LLM-assisted NLP classification -- SQL storage
OUTPUT Real-time Streamlit dashboard for job market intelligence
Multi-community coverageAutomated NLP classificationLive dashboard
PythonSeleniumBeautifulSoupSQLStreamlitNLP
[JOB 002]

Sydney Commercial Property Lead Gen

COMPLETE

High-concurrency async scraper targeting Sydney commercial property developers. Extracted structured lead data from DA PDFs and leasing brochures across multiple council portals, delivering clean prospect lists for a real estate client.

INPUT Sydney council DA portals, leasing brochures, commercial property directories
PROCESS Async Python + Apify actors -- pdfplumber PDF extraction -- deduplication and enrichment
OUTPUT Structured CSV of developer contacts with company, email, phone, and project details
Multiple council sourcesPDF + web extractionAsync pipeline
PythonAsyncioApifypdfplumberPandas
[JOB 003]

B2B Lead Generation -- Australian Market

COMPLETE

Targeted lead list for a concrete cutting company expanding into Melbourne growth corridors. Extracted verified contacts from construction directories with strict accuracy requirements on ABN, service area, and contact validity.

INPUT Australian construction and trade directories -- Melbourne metro focus
PROCESS Multi-source scraping -- field validation -- deduplication -- manual QA pass
OUTPUT 1,200+ verified B2B contacts with email, phone, ABN, and service area. Clean CSV delivery
1,200+ verified contacts48hr turnaroundStrict accuracy QA
PythonBeautifulSoupPandas
[JOB 004]

Swedish Metal & Steel Company Leads

COMPLETE

Compiled a targeted email list of Swedish metal and steel manufacturers for a B2B outreach campaign. Extracted company profiles, decision-maker contacts, and verified emails from European industrial directories.

INPUT European industrial directories and company registries -- Sweden focus
PROCESS Directory scraping -- company profiling -- email extraction and verification
OUTPUT Verified lead list with company name, industry segment, contact person, and email
Targeted industry verticalVerified emailsEuropean market
PythonBeautifulSoupPandas
[JOB 006]

Lagos Rent Price Predictor

COMPLETE

End-to-end system: scraped 10,000+ rental listings from a JS-rendered Nigerian property platform, engineered location and property features, trained a Random Forest model, and deployed predictions via a Flask API.

INPUT 10,000+ property listings from a JS-rendered real estate platform
PROCESS Multi-level scraper -- feature engineering pipeline -- Random Forest model
OUTPUT Flask API delivering real-time rent predictions for Lagos properties
10,000+ listings scrapedRandom Forest modelLive API endpoint
PythonBeautifulSoupSeleniumScikit-learnFlaskPandas
CLUSTER 02 -- 2 PROJECTS

Data Pipelines and Automations

[JOB 007]

PSX ESG Controversy Validation Pipeline

COMPLETE

Validated 480+ ESG controversy records for Pakistan Stock Exchange listed firms. Built an LLM-powered pipeline that cross-referenced claims against source data, flagged inaccuracies, and produced a fully traceable correction log.

INPUT 500+ ESG controversy records requiring factual accuracy validation across 480 firms
PROCESS Apify extraction -- sequential LLM batching via OpenRouter -- quality-control checks
OUTPUT Validated, corrected ESG dataset with traceable error documentation
480 firms processedLLM-validatedTraceable corrections
PythonApifyOpenRouterClaude APIPandas
[JOB 008]

Automated X (Twitter) Content Pipeline

LIVE

Built a fully automated content pipeline that generates, schedules, and publishes posts to X (Twitter). Uses Claude API for content generation, Make for orchestration, and Buffer for scheduling. Runs hands-free.

INPUT Content prompts and topic seeds
PROCESS Claude API content generation -- Make scenario orchestration -- Buffer scheduling
OUTPUT Automated daily X posts published on schedule without manual intervention
Fully automatedClaude API poweredHands-free publishing
MakeClaude APIBufferPython
CLUSTER 03 -- 2 PROJECTS

Analytics and Dashboards

[JOB 009]

Restaurant Menu Profitability Analysis

LIVE

Analyzed 547,918 POS transactions ($6.2M revenue) for a restaurant chain using menu engineering methodology. Classified every menu item as Star, Plowhorse, Puzzle, or Dog. Identified $209K-$271K in annual profit improvement opportunities.

INPUT 547,918 POS transactions totaling $6.2M in revenue
PROCESS Pandas ETL -- menu engineering matrix (Stars, Plowhorses, Puzzles, Dogs) -- profitability modeling
OUTPUT $209K-$271K projected annual profit improvement. Live Streamlit dashboard
547K transactions$6.2M revenue analyzed$209K-$271K improvement
PythonPandasPlotlyStreamlit
[JOB 010]

Shopify Sales Performance Dashboard

COMPLETE

Built an interactive dashboard for a Shopify store processing 65,000+ transactions ($2.69M revenue). Covers KPI tracking, seasonal trends, payment method analysis, and heatmap visualizations for sales patterns.

INPUT 65,000+ e-commerce transactions totaling $2.69M revenue
PROCESS Pandas ETL -- KPI computation -- seasonal trend and payment method analysis
OUTPUT Interactive Streamlit dashboard with heatmaps, KPI cards, and trend charts
65K+ transactions$2.69M revenueMulti-dimensional analysis
PythonPandasPlotlyStreamlit
CLUSTER 04 -- 2 PROJECTS

Predictions / Modeling

[JOB 011]

Customer Churn Prediction Pipeline

COMPLETE

Full ML pipeline for predicting customer churn in a telecom dataset. Automated preprocessing, feature engineering, and model training with Random Forest and Logistic Regression. Evaluated with accuracy, precision, recall, F1, and ROC-AUC.

INPUT Telecom customer dataset with behavioral and usage features
PROCESS Automated preprocessing -- feature engineering -- Random Forest and Logistic Regression
OUTPUT Full ML pipeline with accuracy, precision, recall, F1, and ROC-AUC metrics
Multi-model comparisonFull evaluation suiteAutomated pipeline
PythonScikit-learnPandasJupyter
[JOB 012]

Lagos Rent Price Predictor -- ML Model

COMPLETE

The modeling layer of the Lagos Rent system. Trained a Random Forest regressor on engineered features from 10,000+ scraped listings. Deployed as a Flask API for real-time price predictions based on location, size, and property type.

INPUT 10,000+ cleaned property listings with engineered features
PROCESS Feature engineering -- Random Forest training -- hyperparameter tuning -- Flask deployment
OUTPUT Production Flask API serving rent price predictions for Lagos neighborhoods
10K+ training samplesRandom Forest regressorProduction API
PythonScikit-learnFlaskPandas
CLUSTER 05 -- 2 PROJECTS

Environment / GIS

[JOB 013]

Groundwater Heavy Metal Contamination Study

COMPLETE

Spatial analysis of heavy metal concentrations in groundwater samples. Mapped contamination hotspots using GIS tools, assessed health risk indices, and produced visualizations for environmental compliance reporting.

INPUT Groundwater sampling data with heavy metal concentrations across multiple sites
PROCESS Spatial interpolation -- contamination mapping -- health risk index calculation
OUTPUT GIS maps of contamination hotspots with risk assessment documentation
Multi-site analysisHealth risk indicesCompliance-ready output
PythonQGISGeoPandasMatplotlib
[JOB 014]

Global Electricity vs GDP Dashboard

COMPLETE

Interactive dashboard exploring the relationship between electricity consumption and GDP across countries. Visualizes development patterns, energy intensity trends, and regional comparisons.

INPUT Global electricity consumption and GDP datasets across countries and years
PROCESS Data cleaning -- cross-country normalization -- trend and correlation analysis
OUTPUT Interactive Power BI dashboard with global maps, trend lines, and regional filters
Global coverageMulti-year trendsInteractive filtering
Power BIPythonPandas