ποΈ Agentic Feature Mart β System Architecture
End-to-end ML Feature Platform with LangGraph Multi-Agent Orchestration Β· CarTrade Tech
βοΈ Snowflake
π€ LangGraph
π Python + dbt
π Argo Workflows
π§ GPT-4o
π Google Maps API
β‘ Kafka + Spark
π³ Docker + K8s
βοΈ AWS S3
β DATA SOURCES β Raw ingestion from 5 verticals
π£ Marketing CRM
Campaign events, email opens, WhatsApp delivery, ROAS signals
MySQL CDCREST API
π·οΈ Seller Platform
Listing data, renewals, seller profile, pricing history
PostgreSQLBatch ETL
β‘ Clickstream
Page views, session depth, funnel events, dwell time per seller page
Kafka TopicsSpark Stream
π Google Maps API
POI data, geocoding, distance matrix, reverse geocoding
REST APIGeoPandas
π Web + Census
Regional intelligence, population density, income proxy, market data
ScrapyCSV Ingest
β‘ ETL + FEATURE ENGINEERING + LLM AUGMENTATION β Compute, transform & enrich
π Python ETL
Data extraction, type normalization, deduplication, Snowflake connector writes
pandassnowflake-connectorpydantic
ποΈ SQL + dbt
Feature computation in Snowflake SQL. dbt models with tests, lineage DAG & docs
dbt-snowflakedbt testSQL macros
β‘ Kafka + Spark
Clickstream streaming. 5-min micro-batch agg β Snowpipe into staging tables
KafkaPySparkSnowpipe
πΊοΈ Geo Computation
POI radius counts, polygon intersections, catchment areas, density grids
GeoPandasShapelyH3 Index
π§ GPT-4o LLM Layer
Area quality scoring, POI context analysis, neighbourhood classification, 1536-dim embeddings
GPT-4o APItext-embed-3JSON mode
β’ PIPELINE ORCHESTRATION β CI/CD β Monthly automated, zero-touch
π Argo Workflows
CronWorkflow triggers monthly. DAG with 6 sequential steps. 3-retry with backoff on failure. Slack alerts.
CronWorkflowDAG stepsretry:3
π³ Docker + Kubernetes
Each pipeline step runs as isolated K8s pod. Namespace: feature-pipelines. Auto-scaled by Argo.
DockerK8s PodsHelm Charts
β
Quality Gates
Null rate <2%, PSI drift check, row count sanity, schema validation. Pipeline halts if any gate fails.
Great ExpectationsPSI check
π·οΈ Feature Stamping
Every row tagged with v{YYYY}.{MM} stamp. Enforces train-serve consistency. Registered in FEATURE_REGISTRY table.
version stampFEATURE_REGISTRY
βοΈ AWS S3
Intermediate artifacts, pipeline logs, final feature matrix exports. Prefixed per stamp version.
S3 VersionedParquet
β£ SNOWFLAKE FEATURE MART β Single source of truth for all ML features
βοΈ Snowflake Β· FEATURE_MART Schema
Warehouse: FEATURE_WH Β· Database: ANALYTICS_DB Β· Schema: FEATURE_MART
SELLER_GEO_FEATURES
SELLER_ENGAGEMENT_FEATURES
CITY_DEMOGRAPHICS_FEATURES
LLM_LOCATION_FEATURES
CLICKSTREAM_SELLER_90D
SELLER_TREND_FEATURES
PROPERTY_MARKET_FEATURES
MARKETING_ENGAGEMENT_FEATURES
FEATURE_REGISTRY
+ 41 more tables
β€ AGENTIC LAYER β LangGraph Multi-Agent System β Conversational feature discovery & retrieval
π€ Orchestrator Agent
Entry Point Β· Router
β’ GPT-4 function calling
β’ Parses NL feature requests
β’ Decomposes into sub-tasks
β’ Routes to specialist agents
β’ StateGraph shared memory
β’ Maintains conversation turns
LangGraphGPT-4StateGraph
β
ποΈ Table Recommender Agent
Semantic Search Β· Discovery
β’ Indexes FEATURE_REGISTRY
β’ Semantic match on 15K features
β’ Returns ranked table list
β’ Provides join keys & grain
β’ Relevance score per table
β’ Recommends filters
TF-IDF / EmbedFEATURE_REGISTRY
β
π Data Fetcher Agent
SQL Gen Β· Snowflake Exec
β’ Auto-generates JOIN SQL
β’ Handles multi-grain alignment
β’ Executes on Snowflake
β’ Measures join success rates
β’ Returns quality report
β’ Null rate per column
SQL GenSnowflakeToolNode
β
π§ͺ Feature Engineer Agent
Enrichment Β· Output
β’ Validates base feature set
β’ Proposes enrichment groups
β’ Asks clarifying questions
β’ Fetches additional tables
β’ Applies version stamp
β’ Exports to S3 as Parquet
Human-in-loopS3 Export
π¬
Flow: User types natural language feature request β Orchestrator decomposes β Table Recommender surfaces tables β Data Fetcher writes & runs SQL on Snowflake β Feature Engineer enriches + asks follow-ups β Final versioned 1,141-feature matrix exported in ~10 min
β₯ MODEL CONSUMERS β 5+ product lines training on Feature Mart
π Seller Prioritization
XGBoost ranker. Ranks sellers for sales outreach. βΉ5Cr+ annual revenue attribution.
XGBoost1,141 features
πΈ CRM Scoring
Identifies sellers who don't need WhatsApp outreach. βΉ1Cr/yr savings. 40% cost reduction.
Propensitysklearn
π» Used Car Loan
Loan eligibility scoring using seller geo & engagement features as credit proxies.
LightGBMGeo features
π¦ Personal Loan
Buyer income proxies and engagement signals for personal loan pre-qualification.
LogRegDemographics
π§βπΌ Buyer Model
Buyer intent & conversion scoring using clickstream and location context features.
EnsembleClickstream
Agentic Feature Mart Β· CarTrade Tech Β· ML Platform
15,000+ Features
50+ Snowflake Tables
βΉ6Cr+ Annual Value
5+ Product Lines
Monthly Auto-Pipeline