Site Reliability Engineer: Search Platforms
Site Reliability Engineer: Platforms
At Wayfair, we are looking to strengthen and grow our Production Operations team by bringing on board talented SRE engineers to join our platform team that manages large scale physical and virtual server environment that underpins our global e-commerce platform. In this role, you will be involved with and exposed to a wide variety of systems and technologies. The team is aggressively moving towards “Infrastructure as Code” model. We are looking for someone with “automation” mindset.
What You’ll Do:
- Own SLA for Production Systems
- Drive faster MTTD/MTTR for critical systems
- Troubleshoot independently Sev1/Sev2 incidents
- Own/Manage/Support baseline Operating System image/templates
- Support CI/CD
- Maintain and review DSC/puppet modules and support the provisioning infrastructure
- Own and operate all package repos (python/java/rpm/etc.) using tools such as Pulp, Artifactory, etc.
- Drive efficiencies across hybrid cloud (GCP, Azure, OnPrem)
- Invent innovative ways to drive production operational efficiency
- Drive scalability and operability of supported systems/infrastructure
- Own production systems scaling & throughput
- Work with other teams to provide consultations in systems architecture support for new and existing production systems
- Participate in on-call rotation
- Create and maintain detailed documentation
Some of our larger initiatives include:
- End to End Automation of system builds
- Ongoing scaling of our platform to support forecasted holiday traffic
- Puppet module standardization and improving automation of systems, processes, and services
- Data center expansion and moving to the cloud
What You’ll Need:
- BA or BS degree from a 4-year college or university desired
- Minimum three years systems administration/Site Reliability/Platform/DevOps background
- Experience with infrastructure including but not limited to data center operations, server hardware, web servers (IIS, jboss, etc.), databases (MS SQL, mySQL, mongoDB), virtualization (VMware), networking, storage, monitoring, etc.
- Experience with structured programing languages (PowerShell, Python, etc.)
- Experience with .NET and RestAPI is a plus
- Experience with continuous integration platforms such as Jenkins, Bamboo, Gitlab CI etc.
- Understanding of Agile, ITIL, DevOps practices such as CI/CD, automated testing etc.
- Experience with JVM tuning and optimal configuration
- Java Development Experience.
- Experience with Solr, Cassandra, Hadoop, Lucene and/or Elasticsearch.
- Experience with A/B and multivariate testing.