Our client, a leading financial services company is hiring for a Site Reliability Engineer on a long term contract basis.
Alpharetta, GA (Hybrid Working Model)
As a Site Reliability Engineer (SRE), you’ll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure, and reducing work through automation. Our team of Site Reliability Engineers partners with product development teams to design, code, test, run and evolve systems that maximize feature velocity while achieving high availability, performance, and correctness goals. With this partnership with the product development team you will help to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, follow-the-sun on-call, incident retrospectives, chaos testing, infrastructure as code, and end-to-end ownership. Our work integrates with many teams across many organizations and requires highly technical, innovative, flexible thinkers with excellent communication and a passion to create efficiencies through automation and tooling to help enable developer velocity in our growing software development community.
The successful candidate will have excellent written and oral communication skills, cross-organizational skills and will thrive in a highly innovative, fast paced, evolving and ambiguous environment. This requires the ability to work well in a team environment and build strong partnerships across different groups. SREs are crucial contributors to the product teams that are delivering application that have big traffic, big data, or high availability and performance requirements. In this position, you’ll take an active role in pushing the envelope in AWS/Azure to proactively improve infrastructure delivery to be the best it can be.
A successful candidate….
– Will want to be a key part of enhancing customer engagement and enabling business processes through modern technology platforms and solutions;
– Is interested in the deployment, configuration, monitoring, security, performance, high-availability of cloud infrastructure and cloud-hosted applications;
– Cares about infrastructure as code and continuous compliance;
– Seeks a collaborative relationship with your business partners and your peer team;
– Is an active learner and routinely engage in the global development community through books, blogs, podcasts, local communities or social media;
– Will want to work with people that you can learn from and who are open to your views, expertise and mentorship.
– Work with internal teams to deliver substantially improved platforms to manage our hybrid cloud environment
– Share responsibility for health, scalability and availability of our cloud services Automate deployment of AWS of infrastructure and services
– Work with the team to ensure cloud architecture meets scalability, availability, and cost requirements
– Follow good operational practices such as the creation of incident tickets along with maintaining documentation as needed
– Design, code, test, and deliver software to automate manual operational work
– Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
– Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
– Identify application patterns and analytics in support of better service level objectives
– Design self-healing and resiliency patterns
– Design automated software and product upgrades, and release management solutions
– Design, develop, ship, and motivate the creation of software and systems to increase product reliability and organizational efficiency.
– Protect the infrastructure and core applications from configuration drift using config management tooling for compliance
– Be an AWS platform expert in the core infrastructure areas (VPC, EC2, RDS, S3, etc.)
– Contribute to codebase with Terraform, and other automation and scripting languages
Required Skills & Experience:
– BS degree in Computer Science or related technical field or equivalent practical experience.
– 5+ years of IT infrastructure experience overall
– Obsessive need to automate. Relevant languages include: bash, python, PowerShell, AWS CLI
– 3+ years of experience migrating and managing AWS workloads
– Familiarity with AWS platforms such as EC2, VPC, S3, ELB, RDS, Route53, and more
– AWS Certified Solutions Architect (CSA) certification
– Experience being an on-call DevOps, SRE, or Cloud Operations senior engineer (at least 3 years)
– Experience implementing Terraform best practices for infrastructure in AWS (at least 2 years)
– Proven track record of designing, building, sizing, optimizing, and maintaining cloud infrastructure in AWS and Azure
– Proven track record of designing, implementing, and maintaining full build/release pipelines in a cloud environment (Jenkins/TeamCity/GitLab/GitHub Actions experience preferred)
– Understanding and experience with implementing best security practices in AWS / Linux / Kubernetes, pen testing and internal vulnerability analysis / incident response
– Experience in monitoring, system performance data collection and analysis, and reporting