HPC Systems Administrator (Hardware & Infrastructure Operations)

📁
Information Technology Services
📅
108777 Requisition #

Please note: Visa Sponsorship is not provided for this position. 

The Sherlock HPC cluster is the flagship of Stanford’s research computing environment, supporting thousands of users and a massive variety of scientific workloads. We are looking for an HPC Systems Administrator who thrives at the intersection of high-density hardware and Linux systems engineering.

In this role, you will be the primary steward of the physical infrastructure on Sherlock and other platforms. You will ensure that our 1,500+ compute nodes, high-density GPU racks, and petabyte-scale storage arrays are meticulously maintained, expertly tuned, and highly available.

 

Why Stanford?

You won't just be swapping parts; you will be managing the physical backbone of a world-class research environment. From debugging errors on NVIDIA H200s to optimizing InfiniBand cabling for our Lustre scratch tiers, your work is the foundation upon which Nobel-caliber research is built.

Primary Responsibilities

  • Hardware Lifecycle & Deployment: Lead the physical deployment, burn-in, troubleshooting, and decommissioning of compute nodes, GPU servers, and high-density storage systems.

  • Diagnostics & Root Cause Analysis: Perform troubleshooting on hardware issues—such as memory errors, GPU thermal throttling, network failures and coordinate with vendors for support and replacements.

  • Data Center Operations: Collaborate with the data centers team to plan and manage hardware deployments.

  • Provisioning & Automation: Work with lead platform administrators on testing and provisioning to ensure rapid, consistent deployment of cluster images across the fleet.

  • Health & Telemetry: Refine hardware-level monitoring to proactively identify failing components before they impact active research jobs.

 

Required Qualifications:

  • Education: Bachelor’s degree and eight years of relevant experience, or a combination of education and relevant experience.

  • Experience: 3-5+ years of experience in Linux Systems Administration, with a strong preference for candidates from HPC, larges-scale data center, or research environments. 

  • Hardware Proficiency: Solid understanding of x86 server architecture, GPU systems, ethernet,and high-performance interconnects.

  • Scripting: Proficiency in scripting languages for automating hardware health checks, log parsing, and routine maintenance tasks.

  • Infrastructure Management: Experience using configuration management tools to manage hardware settings and firmware versions at scale. Experience working with data center teams to populate and maintain DCIM solutions preferred.

  • Physical Requirements: Ability to lift up to 50 lbs and work comfortably in a data center environment, including racking equipment and managing complex cable topologies.

  • Communication: Strong written and verbal communication skills.

Preferred Skills

      Direct experience maintaining hardware for HPC systems and large scale storage systems.

      Familiarity with the Slurm workload manager and how hardware health impacts job scheduling.

      Exposure to liquid cooling solutions or high-density rack power management.

Physical Requirements*:

      Constantly perform desk-based computer tasks.

      Frequently sit, grasp lightly/fine manipulation.

      Occasionally stand/walk, writing by hand.

      Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds.

 

Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form.

 

Working Conditions:

      May work extended hours, evenings, and weekends.

Work Standards:

      Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations.

      Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned.

      Subject to and expected to stay in sync with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide, http://adminguide.stanford.edu.

 

The expected pay range for this position is $150,289 to $171,674 per annum.

Stanford University provides pay ranges representing its good faith estimate of the salary or hourly wage the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs.

 

At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (https://cardinalatwork.stanford.edu/benefits-rewards) provides detailed information on Stanford’s extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process.

 

The job duties listed are typical examples of work performed by position in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties assigned. 

 

Stanford is an equal employment opportunity and affirmative action employer. All qualifies applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. 

 

 

My Submissions

Track your opportunities.

My Submissions

Similar Listings

Business Affairs: University IT (UIT), Redwood City, California, United States

📁 Information Technology Services

Business Affairs: University IT (UIT), Redwood City, California, United States

📁 Information Technology Services

Business Affairs: University IT (UIT), Redwood City, California, United States

📁 Information Technology Services

Global Impact
We believe in having a global impact

Climate and Sustainability

Stanford's deep commitment to sustainability practices has earned us a Platinum rating and inspired a new school aimed at tackling climate change.

Medical Innovations

Stanford's Innovative Medicines Accelerator is currently focused entirely on helping faculty generate and test new medicines that can slow the spread of COVID-19.

Technology

From Google and PayPal to Netflix and Snapchat, Stanford has housed some of the most celebrated innovations in Silicon Valley.

Advancing Education

Through rigorous research, model training programs and partnerships with educators worldwide, Stanford is pursuing equitable, accessible and effective learning for all.

Working Here
We believe you matter as much as the work

Group Dance Class In A Gym
Nora Cata Portrait

I love that Stanford is supportive of learning, and as an education institution, that pursuit of knowledge extends to staff members through professional development, wellness, financial planning and staff affinity groups.

Nora Cata

School of Engineering

Students Working With A Robot Arm
Philip Cheng Portrait

I get to apply my real-world experiences in a setting that welcomes diversity in thinking and offers support in applying new methods. In my short time at Stanford, I've been able to streamline processes that provide better and faster information to our students.

Phillip Cheng

Office of the Vice Provost for Student Affairs

Students Working With A Robot Arm
Denisha Clark Portrait

Besides its contributions to science, health, and medicine, Stanford is also the home of pioneers across disciplines. Joining Stanford has been a great way to contribute to our society by supporting emerging leaders.

Denisha Clark

School of Medicine

Students Working With A Robot Arm
Laura Lind Portrait

I like working in a place where ideas matter. Working at Stanford means being part of a vibrant, international culture in addition to getting to do meaningful work.

Laura Lind

Office of the President and Provost

Getting Started
We believe that you can love your job

Join Stanford in shaping a better tomorrow for your community, humanity and the planet we call home.

  • 4.2 Review Ratings
  • 81% Recommend to a Friend

View All Jobs