Job Description
SHYVA | Founding Engineer Stealth · Enterprise AI · Remote
About
We are building something that should have existed a decade ago — in a market where the data is fragmented, unverified, and nobody has fixed it yet. The founder has been the customer for 25 years and knows exactly what is broken. Six Fortune 500 enterprises are already committed as design partners. We are in stealth and will stay there for now.
The Role
You will be one of the first engineering hires, working directly with the founder to build the core platform — a large-scale data intelligence system with an AI-native interface. The hard problems are data, not models: ingestion at volume, entity resolution across heterogeneous sources, auditability of every output, and a graph-based data model built to compound over time.
No platform team. No DevOps org. No PM handing you specs. Full architectural ownership from day one.
Must-Have
Full-Stack Engineering
• Python backend (FastAPI/Django) and React/Next.js frontend — you own the entire stack
• Cloud-native: AWS or GCP, Docker/Kubernetes
Large-Scale Data Engineering
• ETL/ELT pipelines at 10M+ record scale — Spark, dbt, Kafka, Airflow
• Experience ingesting and normalising licensed third-party commercial data feeds — bulk files, schema inconsistency, freshness tracking, provenance management
• Data lineage and auditability: every output traceable to a source record, timestamp, and confidence level
• Batch and event-driven ingestion patterns
Graph Data Modeling
• Neo4j or graph layers on relational DBs: node/edge schema design, relationship versioning, provenance preservation
• Graph traversal for network analysis and entity influence ranking
Entity Resolution & Deduplication
• Probabilistic record linkage, fuzzy matching, multi-attribute scoring at volume
• Blocking strategies for large record pools (LSH, phonetic encoding, prefix blocking)
• Canonical entity management with merge history and audit trail
LLM & Agent Orchestration
• LangChain, LangGraph, CrewAI or custom orchestrators — shipped multi-step agent workflows in production
• RAG pipelines: hybrid retrieval, chunking, reranking
• Guardrail architecture: post-generation validation, uncertainty flagging, stale-data detection
Document Extraction
• OCR pipelines and structured extraction from complex business documents
• Field normalisation across currencies, date formats, and units of measure
Semantic & Vector Search
• Elasticsearch, pgvector, Weaviate, or Pinecone — hybrid retrieval at scale
Background
• CS or Electrical Engineering degree from a strong institution
• 6–10 years hands-on; at least one role with genuine end-to-end ownership
Strong Plus
• Startup or early-stage experience — comfortable without guardrails
• Supply chain, procurement, or trade finance domain knowledge
• Multi-source data reconciliation across heterogeneous commercial providers
• Enterprise system connectors (SAP Ariba, Oracle, or similar)
What We Offer
• Founding engineer equity
• Direct collaboration with a domain expert founder — no translation layer between you and the customer problem
• Real customers from day one — six Fortune 500 design partners already committed
• Full architectural ownership
• India remote
How to Apply
Skip the cover letter. Answer three questions:
• What is the most technically complex data system you have built? What made it hard?
• Describe an architectural decision you made with incomplete information. What did you decide and why?
• What draws you to a role where the hardest problems are data quality and trust, not model performance?
Include a link to something you built that is running right now.