Implement and manage AIOps platforms to enable intelligent alerting, anomaly detection, and root cause analysis
Integrate AIOps solutions with observability tools (e.g., New Relic) and incident management platforms (e.g., PagerDuty)
Develop event correlation rules, noise reduction strategies, and predictive analytics for proactive incident response
Collaborate with SRE, cloud, and application teams to embed AIOps into CI/CD pipelines and production workflows
Automate operational tasks and remediation workflows using scripting and orchestration tools
Monitor and fine-tune AIOps models for accuracy, performance, and relevance
Support observability and RunOps strategies with data-driven insights and continuous improvement initiatives
Hands-on experience with AIOps platforms such as Moogsoft, BigPanda, Dynatrace, or Splunk ITSI
Strong understanding of observability practices: metrics, logs, traces
Familiarity with tools like New Relic, Datadog, Prometheus
Proficiency in scripting languages: Python, Bash, PowerShell
Experience with cloud platforms: AWS (preferred), Azure, or GCP
Solid grasp of DevOps/SRE practices and CI/CD integration
Experience with event correlation, anomaly detection, and ML-based alerting
Familiarity with IDP platforms and developer enablement tools
Experience using tools like GitHub Copilot for automation and code generation