Hpc-syspros-basics.github.io

Installing NHC · HPC Basics

WEBExecuting the make test step in the previous example is optional but recommended as this will run NHC’s built-in unit test suite to make sure everything is functioning properly.

Actived: Just Now

URL: https://hpc-syspros-basics.github.io/HPC_Basics_menu/Node_Health_Check/Installing_NHC.html

Configuring NHC · HPC Basics

WEBThe starting configuration is meant to highlight the range of checks that you could have NHC check to ensure a node is healthy before it will run jobs. When configuring your nhc.conf file you should keep in mind that checks are executed from the top of the file down. Therefore you may want to place the checks that you care about most at the top

Category:  Health Go Health

Standalone Running NHC · HPC Basics

WEBThere is also value in running NHC as a standalone process to pick up any misconfiguration or missing components for instance when you are checking on a node after hardware repairs or checking all nodes in a cluster following a maintenance period. To run NHC as a standalone process to check a node's health you can just run the binary from /usr

Category:  Health Go Health

Writing a custom check for NHC · HPC Basics

WEBWriting a custom check for NHC. Occasionally you will run into an instance where one of the built in checks does not satisfy a use case you have for checking the health of a node.

Category:  Health Go Health

Scheduler Integration · HPC Basics

WEBSlurm Integration. Add the following to /etc/slurm.slurm.conf (or wherever your slurm.conf file is located in your environment) on your controller node (s) AND your compute nodes (because, even though the HealthCheckProgram only runs on the nodes, your slurm.conf file must be the same across your entire system): This will execute NHC every 300

Category:  Health Go Health

HPC Basics · Documentation home for HPC Basics instruction

WEBWe speak directly to the state of the practice of standing up and operating high performance systems with an emphasis on solutions that can be implemented by systems staff at other institutions. HPC Basics by SIGHPCSYSPROS Topics Introduction to HPC Designing a cluster Introduction to HPC Storage Parallel Filesystems Cluster Stack Basics Pr

Category:  Health Go Health

Benchmarking · HPC Basics

WEBBenchmarking. Benchmarking is the practice of running tests on your hardware to verify performance and in some cases stress the hardware to ensure that the hardware is capable of sustained workloads. When we talk about gathering performance numbers or ensuring that the performance of a node is within an acceptable range we use the term benchmark.

Category:  Health Go Health

HPC Basics · HPC Basics

WEBHPC Basics {% include list.liquid all=true %} What is a Supercomputer? What does a Supercomputer do? Moore’s Law Scalability Benchmarking Supercomp

Category:  Health Go Health

Monitoring HPC Systems and Infrastructure components

WEBMonitoring HPC Systems and Infrastructure components

Category:  Health Go Health

Reference Materials · HPC Basics

WEBReference Materials. Here you will find a collection of reference material that were either used to pull information from to fill in the topics covered on the website, or meant as additional supplemental resources to consult for further information on topics.

Category:  Health Go Health

Login Node Resource Management · HPC Basics

WEBLogin Node Resource Management. The login nodes of an HPC cluster are a shared, yet finite and small, resource. Users can easily overload the login nodes creating a poor outcome for everyone.

Category:  Health Go Health

Advanced Topics · HPC Basics

WEBAdvanced Topics {% include list.liquid all=true %} Benchmarking CPU Benchmarks Memory Benchmarks Storage Benchmarks GPU Benchmarks Network Benchmar

Category:  Health Go Health

What is a Supercomputer

WEBWhat is a Supercomputer? A supercomputer is not simply a fast or very large computer: it works in an entirely different way, typically using parallel processing instead of the serial processing that an ordinary computer uses.

Category:  Health Go Health

Reference Books · HPC Basics

WEBReference Books High Performance Computing: Modern Systems and Practices. High Performance Computing: Modern Systems and Practices is a fully comprehensive and easily accessible treatment of high performance computing, covering fundamental concepts and essential knowledge while also providing key skills training. With this book, domain …

Category:  Health Go Health

Style Guide · HPC Basics

WEBStyle Guide. When writing new documentation please try to follow the style guide for adding in new content. If you don't follow the guide it will cause a delay in how quickly the content is pulled into the documentation as editors will need to adjust or correct the content.

Category:  Health Go Health

GPU Benchmarks · HPC Basics

WEBgpu-burn. While we have this in the benchmarks section, it is hardly a benchmark, though it does output Floating point operations per second readouts. This piece of code can be used to stress the GPU processors you have in a node and ensure that the GPU can run optimately at fully utilized workloads. The code will easily push the GPU so that it

Category:  Health Go Health

Contribution Guide · HPC Basics

WEBContribution Guide. We would greatly appreciate your help in developing this body of knowledge to educate and assist folks that are new to administration of high performance computing resources.

Category:  Health Go Health

Reference Materials · HPC Basics

WEBReference Materials {% include list.liquid all=true %} Reference Materials Reference Books

Category:  Health Go Health

Contributing to the HPC Basics documentation project

WEBContributing to the HPC Basics documentation project Here we have resources and information needed to effectively contribute to the HPC Basics docu

Category:  Health Go Health

Audience · HPC Basics

WEBPrerequisites. In order for much of the documentation to make sense we encourage folks to have a basic knowledge of a few concepts in order to get the most out of our documentation. Audience When writing documentation for hpc-syspros-basics you should keep in mind the intended audience for this documentation.

Category:  Health Go Health

CPU Benchmarks · HPC Basics

WEBThe High Performance Linpack benchmark is the oldest and most widely accepted benchmark that measures the double precision floating point performance of distributed memory clusters. The benchmark is the standard for benchmarking the collective CPU performance of an entire system and is used as the primary means of measurement for …

Category:  Health Go Health