SalesforceIQ Senior Site Reliability Engineer|
SalesforceIQ is built to power the world's customer relationships with products they love . Our relationship intelligence platform reimagines how sales organizations can automate their businesses by capturing and processing the millions of digital signals sent every day. Constructing delightful solutions from a sea of data and complexity is what we do. Our team moves with the speed, independence, and culture of a startup, but with the full support of a rapidly growing Fortune 500 business. We believe strongly in empowering engineers to create elegant and beautiful solutions for our customers: we push code daily, prize unique ideas, and take the time to enjoy the moments along the way.
Senior Site Reliability Engineers at SalesforceIQ are hybrid software/systems engineers who ensure that SalesforceIQ's services run smoothly and have the capacity for future growth. You will be responsible for managing our production services and will be working very closely with developers and other Ops teams to ensure reliability, scalability and performance of our cloud infrastructure.
- Develop and deliver configuration and deployment automation required for improving the functionality, availability, and manageability of our microservices using Python or Ruby and configuration automation tools such as Puppet, Chef, or Ansible.
- Build infrastructure and application monitoring by gathering application and system metrics and implement tools for recoveries.
- Troubleshoot availability/performance problems and build software-based solutions to prevent recurrences.
- Define and evangelize cloud-related optimizations and best practices to improve reliability and performance.
- Perform code reviews, evaluate implementations, and provide feedback about potential tool improvements.
- Partake in an on-call rotation alongside the engineers.
- BS in Computer Science (or equivalent experience) or equivalent practical experience.
- Minimum 4 years experience in production service troubleshooting that spans applications, systems and network.
- Experience building systems on cloud technology (AWS, GCE, Rackspace, Openstack).
- Experience with queuing/data-pipelining solutions (Kafka, Storm, Flink, Spark, Amazon Kinesis, etc).
- Configuration management experience with one or more configuration management tools such as Puppet, Chef, Ansible.
- Experience with container technologies and orchestration layers (Docker, Vagrant, Mesos, Marathon, etc).
- Demonstrated coding skills, preferably in Python, Ruby, Java.
- Demonstrable knowledge of UDP, TCP/IP, HTTP, distributed systems.
- Solid understanding of application design, including the operational trade-offs of various designs.
- Experience working in Unix/Linux operating systems and shell scripting.
- Excellent analytical skills, coupled with a strong sense of ownership, urgency, and drive.
- Ability to work independently and collaboratively with multiple partners.
- Comfortable with Agile methodologies and working within small teams.