SDE III – AI Software Engineer- RAG- Vector Database

Job Details

Full Time

What You’ll Do

Architect, build, and scale agentic RAG and text-to-SQL copilots supporting 50K+ daily queries, delivering 99.9% uptime, low latency, and high semantic accuracy.
Design, operate, and continuously optimize a production-grade LLMOps platform, leveraging LangGraph, LangSmith, MLflow, Kubernetes, async inference, and leading cloud LLM providers such as AWS Bedrock, Google Vertex AI, Azure OpenAI, and Anthropic.
Develop and own MCP server integrations, ensuring reliable, efficient, and secure runtime execution across multi-agent workflows and toolchains.
Implement evaluation and guardrail frameworks (AI-as-a-Judge, grounding checks, safety filters, regression tests) to minimize hallucinations, control model drift, and reduce token usage and inference costs by 30%+.
Own end-to-end system observability and performance, including latency, throughput, reliability, cost optimization, caching strategies, and retrieval quality.
Optimize inference, retrieval, and orchestration pipelines to support high-traffic, enterprise-scale workloads.
Partner closely with product, infrastructure, and leadership teams to define SLAs, unblock customer requirements, and deliver robust, enterprise-ready AI capabilities.
Leverage AI-assisted development tools (GitHub Copilot, MCP-enabled IDEs, Claude, GPT, etc.) to improve development velocity, code quality, and system reliability.

What We’re Looking For

5+ years of experience in software engineering or ML engineering, with hands-on ownership of production-grade LLM, RAG, or agent-based systems.
Strong Python engineering expertise, with deep experience building RAG pipelines, agent architectures, tool-calling workflows, and text-to-SQL copilots.
Proven experience working with MCP servers, vector databases, and retrieval-augmented system architectures.
Strong understanding of agent development, LLM integration patterns, prompt engineering, and runtime orchestration frameworks.
Hands-on experience with cloud-native infrastructure, including Kubernetes, async workers, queueing systems, and observability/monitoring stacks.
Demonstrated ability to build LLM evaluation pipelines, guardrails, monitoring, experiment tracking, and regression testing for AI systems.
Experience with multiple agent SDKs, such as:
Anthropic SDK
ClaudeAgent SDK
Google ADK (Agent Developer Kit)
Bonus: LangChain, LlamaIndex, AutoGen, or custom agent runtimes
Strong ownership mindset, with a track record of taking AI prototypes from concept to scalable, reliable, high-traffic production systems.

High Impact Jobs: CareerXperts Jobs

Follow CareerXperts on LinkedIn: CareerXperts Consulting

Job Details