The main purpose of this position is to implement and maintain a comprehensive Enterprise Monitoring solution supporting high availability services through proactive and predictive monitoring and alerting in a 24x7 enterprise production environment, covering network infrastructure, server hardware, operating systems, applications, and business processes.
The Enterprise Monitoring Engineer will be focused on aligning our current monitoring and automation tools and technologies (AppDynamics, SCOM, Nagios, OEM), into a comprehensive solution encompassing complete business transactions across the enterprise, and providing alerting and management reports for key metrics, measurements, and trending.
The Enterprise Monitoring Engineer will also assist in identifying and implementing new monitoring capabilities and/or tools across IT systems, by working with IT departments and third-party vendors to produce and document a monitoring schema for every technology tier and stack comprising each enterprise business function.
ROLES AND RESPONSIBILITIES:
- Provide system engineering for Enterprise Monitoring Systems (AppDynamics, OEM, SCOM, Entuity, xMatters, Nagios) including systems architecture, monitoring strategy, operational deployments, application design and maintenance/administration.
- Engage with subject matter experts ranging from network to applications to define, deploy and maintain system and service monitors.
- Engage Help Desk and Support teams/management to determine current policies and procedures in place, identify any and all shortcomings, and create plan of action to correct areas of needed improvement.
- Work with other IT departments and vendors to plan and implement new features, enhancements, and upgrades.
- Document supporting policies, processes and procedures.
- Provide training as needed to operations teams regarding alarm correlation and threshold setting.
- Assists in the installation, maintenance, and general support of monitoring systems.
- Routinely review monitoring systems and services to ensure stability and security.
- Assist in interpretation of diagnostic data obtained from monitoring solutions.
- Provide implementation support for custom monitoring requirements.
- Create and test all monitoring scripts.
- Manage the installation of new software releases and patch installs that resolves monitoring related software problems.
- Participates as an Enterprise Monitoring resource on Business and IT projects.
- Participates during application troubleshooting efforts to identify monitoring gaps, and areas for improvement.
- Provide planning and monitoring guidance to support teams.
- Identify, diagnose, and resolve technical monitoring problems.
- Serve as the system administrator for all Enterprise monitoring systems.
- Provide Capacity, Performance and Availability reports for assigned systems.
- Define and recommend monitoring standards for fault-detection, availability, capacity and performance trending.
- Develop and distribute trend reports detailing availability, performance & capacity metrics.
- Engineer methods to optimize the availability, capacity, performance and cost of assigned applications and services.
- Research/Design new monitors that meet the needs of the engineering teams.
REQUIRED TECHNICAL SKILLS:
- Minimum 3+ years system administration of Enterprise Monitoring Systems.
- Minimum 3+ years of experience in Server Management products, i.e.: AppDynamics, OEM and Microsoft SCOM.
- Minimum 3+ years networking experience in an enterprise environment.
- Minimum 3+ years working with both Windows and UNIX based systems in an enterprise environment, including advanced shell scripting.
- Advanced knowledge of Enterprise Monitoring metrics, reporting, logging and best practices.
- Experience with SNMP, TCP/IP and core LAN/WAN principles.
- Ability to perform network traffic analysis using network capture tools.
- Ability to translate low-level monitoring metrics into consolidated management level reports.
- Bachelor's degree in engineering, computer science, management information systems or related field
OTHER KEY QUALIFICATIONS:
- Creativity in proactively developing approaches to problems, recommending mitigation actions to management, and implementing those recommendations.
- Highly motivated self-starter with the ability to work independently.
- Strong team player with a willingness to do whatever it takes to complete assignments on time, with quality.
- Excellent communication skills, both written and oral.
- Proven ability to work cross-functionally within IT organization and with vendors.
PREFERRED SKILLS AND EXPERIENCE:
- ITIL Certification