Senior Data Engineer | Greenfield Data Engineering Role | Unifying 100+ Security Tools | Led by Seasoned Cybersecurity Founders| Pune
- 55K-65K
Full Job Description
Role Overview
This is a greenfield Senior Data Engineer (8+ years) role focused on building the core data foundation for a cybersecurity product. You’ll design, build, and operate large-scale data pipelines and connector systems that handle high-volume security data from multiple external platforms. The ownership is end to end. Ingestion, normalization, processing, reliability, and data quality. You’ll work closely with backend, platform, and AI teams to ensure the data is production-grade, scalable, and ML-ready.
What you’ll own
Build and run scalable, fault-tolerant data pipelines for cybersecurity data such as logs, events, alerts, and vulnerabilities
Design and maintain Apache Airflow DAGs for batch, incremental, and event-driven ingestion, including backfills and reprocessing
Define schemas and normalization layers across diverse security data sources
Implement strong data quality checks including validation, deduplication, and consistency guarantees
Build and extend a connector framework to ingest data from SIEM, EDR, VM, CSPM, and CNAPP platforms
Design API-based, streaming, and batch ingestion pipelines with proper auth, secrets management, rate limits, retries, and state handling
Optimize pipelines for performance, scale, and cost in cloud environments
Set up observability for pipelines with metrics, logging, alerting, and health monitoring
Own data systems from design to production support
Partner with backend and AI/ML teams to deliver analytics- and ML-ready datasets and validate data accuracy using real security tools
What you bring
8+ years of hands-on data engineering experience
Strong track record of building production-grade data pipelines at scale
Deep expertise with Apache Airflow including DAG design, scheduling, monitoring, and backfills
Strong Python skills are mandatory; Go is a plus
Experience working with large volumes of structured and semi-structured data like JSON, logs, and events
Solid understanding of distributed systems and data processing patterns
Experience with cloud platforms such as AWS, GCP, or Azure
Familiarity with CI/CD, Docker, Kubernetes, and cloud-native deployments
Strong preference for experience with cybersecurity or observability data including SIEM, EDR, Vulnerability Management, CSPM, CNAPP, or SOC platforms
High Impact Jobs: CareerXperts Jobs
Follow CareerXperts on LinkedIn: CareerXperts Consulting