SDE III – AI Software Engineer- RAG- Vector Database

Job Details
Full Time

Full Job Description

What You’ll Do

  • Architect, build, and scale agentic RAG and text-to-SQL copilots supporting 50K+ daily queries, delivering 99.9% uptime, low latency, and high semantic accuracy.
  • Design, operate, and continuously optimize a production-grade LLMOps platform, leveraging LangGraph, LangSmith, MLflow, Kubernetes, async inference, and leading cloud LLM providers such as AWS Bedrock, Google Vertex AI, Azure OpenAI, and Anthropic.
  • Develop and own MCP server integrations, ensuring reliable, efficient, and secure runtime execution across multi-agent workflows and toolchains.
  • Implement evaluation and guardrail frameworks (AI-as-a-Judge, grounding checks, safety filters, regression tests) to minimize hallucinations, control model drift, and reduce token usage and inference costs by 30%+.
  • Own end-to-end system observability and performance, including latency, throughput, reliability, cost optimization, caching strategies, and retrieval quality.
  • Optimize inference, retrieval, and orchestration pipelines to support high-traffic, enterprise-scale workloads.
  • Partner closely with product, infrastructure, and leadership teams to define SLAs, unblock customer requirements, and deliver robust, enterprise-ready AI capabilities.
  • Leverage AI-assisted development tools (GitHub Copilot, MCP-enabled IDEs, Claude, GPT, etc.) to improve development velocity, code quality, and system reliability.

What We’re Looking For

  • 5+ years of experience in software engineering or ML engineering, with hands-on ownership of production-grade LLM, RAG, or agent-based systems.
  • Strong Python engineering expertise, with deep experience building RAG pipelines, agent architectures, tool-calling workflows, and text-to-SQL copilots.
  • Proven experience working with MCP servers, vector databases, and retrieval-augmented system architectures.
  • Strong understanding of agent development, LLM integration patterns, prompt engineering, and runtime orchestration frameworks.
  • Hands-on experience with cloud-native infrastructure, including Kubernetes, async workers, queueing systems, and observability/monitoring stacks.
  • Demonstrated ability to build LLM evaluation pipelines, guardrails, monitoring, experiment tracking, and regression testing for AI systems.
  • Experience with multiple agent SDKs, such as:
  • Anthropic SDK
  • ClaudeAgent SDK
  • Google ADK (Agent Developer Kit)
  • Bonus: LangChain, LlamaIndex, AutoGen, or custom agent runtimes
  • Strong ownership mindset, with a track record of taking AI prototypes from concept to scalable, reliable, high-traffic production systems.

High Impact Jobs: CareerXperts Jobs 

Follow CareerXperts on LinkedIn: CareerXperts Consulting