Site Reliability Engineer
Company: The Josef Group
Location: Herndon
Posted on: February 1, 2025
Job Description:
Site Reliability Engineer
TS/SCI or TS/SCI Poly
Herndon, VA
Seeking a Site Reliability Engineer (SRE) our OpenShift PaaS
organization, you will be responsible for ensuring the
availability, performance, and scalability of our OpenShift
environments. You will collaborate with development, operations,
and product teams to automate processes, build robust monitoring
systems, and enhance the overall reliability of our platforms.
Key Responsibilities:
- System Reliability & Scalability: Design, implement, and
maintain highly available OpenShift clusters to support
mission-critical applications.
- Automation & Infrastructure as Code (IaC): Develop and maintain
automation scripts and tools to streamline deployment, scaling, and
recovery processes using tools like Ansible, Terraform, and
Helm.
- Monitoring & Incident Management: Build and enhance monitoring
and alerting systems (e.g., Prometheus, Grafana, ELK). Respond to
and resolve incidents, conducting post-mortem analyses to identify
root causes.
- Performance Optimization: Analyze and optimize system
performance, ensuring minimal latency and maximum throughput.
- Collaboration: Work closely with development teams to implement
DevOps best practices, CI/CD pipelines, and platform
enhancements.
- Security & Compliance: Ensure platforms meet security and
compliance requirements by integrating tools for vulnerability
scanning, policy enforcement, and logging.Required Skills
- Bachelor's degree in Computer Science, Engineering, or
equivalent experience.
- Minimum 5+ years of experience as an SRE, DevOps Engineer, or
related role.
- Expertise in OpenShift or Kubernetes platform
administration.
- Strong knowledge of Linux systems, networking, and
containerization technologies (Docker).
- Proficiency in scripting languages such as Python, Bash, or
Go.
- Experience with CI/CD pipelines (e.g., Jenkins, GitLab
CI/CD).
- Familiarity with monitoring and logging tools like Prometheus,
Grafana, ELK, or Splunk.Desired Skills (Optional)
- OpenShift certification (e.g., Red Hat Certified Specialist in
OpenShift Administration).
- Experience with cloud platforms (AWS, Azure, or GCP).
- Knowledge of service mesh technologies (Istio, Linkerd).
- Strong understanding of microservices and distributed systems
architecture.
#J-18808-Ljbffr
Keywords: The Josef Group, Leesburg , Site Reliability Engineer, Professions , Herndon, Virginia
Didn't find what you're looking for? Search again!
Loading more jobs...