**Job Title: Senior AI Systems / ML Infrastructure EngineerLocation :** Onsite @ EMILO VENTURES PRIVATE LIMITED, Raipur, Chhattisgarh**Experience :** 1+ years**About the Role**
We are building a **scalable, multi-modal AI system** that handles text, image, and audio workloads using a combination of CPU and GPU services.
We are looking for a **Senior AI Systems Engineer** who can design, optimize, and scale AI pipelines — focusing on **performance, cost efficiency, and reliability**.
This role sits at the intersection of:
- AI/ML
- Backend systems
- Distributed architecture
- Infrastructure & performance optimization
**Key Responsibilities**
- Design and build **scalable AI pipelines** for text, image, and audio processing
- Optimize **GPU and CPU utilization** for cost and performance
- Implement **batching, queuing, and concurrency control** for high throughput
- Architect **CPU vs GPU service separation**
- Integrate and manage models such as:
- LLMs (Qwen / similar)
- Whisper (audio)
- Embedding models
- Moderation models (toxicity, sentiment, etc.)
- Build and manage **event-driven systems (NATS/Kafka)**
- Optimize model loading strategies (lazy loading, caching, quantization)
- Handle **OOM issues, latency bottlenecks, and scaling challenges**
- Design **fallback systems** (CPU fallback when GPU unavailable)
- Collaborate with product teams to balance **quality vs cost**
**Required SkillsAI / ML (Practical)**
- Experience with Transformers (HuggingFace ecosystem)
- Understanding of embeddings, NLP, and basic CV/audio models
- Experience deploying models in production (not just training)
**Backend & Systems**
- Strong Python (asyncio, multithreading, multiprocessing)
- Experience with FastAPI / backend frameworks
- Experience with **message queues (NATS / Kafka / RabbitMQ)**
**GPU & Performance Optimization**
- Experience working with GPU workloads (CUDA usage basics)
- Understanding of:
- VRAM management
- Batching
- Quantization (4-bit / 8-bit)
- Model loading/unloading strategies
- Ability to debug **OOM and latency issues**
**Infrastructure & DevOps**
- Docker (must)
- Cloud platforms (AWS / GCP / Azure)
- Experience with GPU instances (T4, A10, etc.)
- Autoscaling and cost optimization
**System Design**
- Microservices architecture
- CPU vs GPU workload separation
- High-throughput system design
- Fault-tolerant distributed systems
**Good to Have**
- Experience with **vLLM / TGI**
- Experience with **Prometheus / Grafana (monitoring)**
- Knowledge of **Kubernetes**
- Experience handling **real-time AI systems**
**What We Care About**
- Ability to **optimize systems, not just write code**
- Strong understanding of **trade-offs (cost vs latency vs quality)**
- Real-world experience with **production AI systems**
- Ownership mindset and problem-solving ability
**Why Join Us?**
- Work on **real-world AI systems at scale**
- Solve **challenging performance and cost problems**
- Build systems involving **LLMs, audio, and multi-modal AI**
- High ownership and impact
Job Types: Full-time, Permanent
Pay: ₹300,000.00 - ₹900,000.00 per year
Work Location: In person