Commit Graph

41 Commits

Author SHA1 Message Date
Wirawan Purwanto
db2ca075ed * Must re-add SLURM if it is not loaded. 2019-04-25 17:35:34 -04:00
Wirawan Purwanto
a65338a8bf * Added "wgo" (what's going on) to check the status of processes
on login node. Currently supposed to be used for Turing only.
2019-04-25 17:34:23 -04:00
Wirawan Purwanto
1387997010 * Prints an error message and quit in case Jupyter
did not start after 2 minutes.
2019-03-27 13:49:19 -04:00
Wirawan Purwanto
ba6c9f53ed * Added accommodation for Anaconda as well as site-provided python.
* Added support for fully headless mode (connect via local browser).
* Added some safeguards against job failing to start due to executable
  not found, etc.
2019-03-27 13:28:33 -04:00
Wirawan Purwanto
1c8a5da492 * Imported launch_jupyter from Turing.
Originally furnished by Min Dong, 2019-03-01 14:30 EST.
2019-03-27 11:25:32 -04:00
Wirawan Purwanto
28eb7a0d98 * sq: Customizable squeue wrapper: introduce new defaults / default
behavior on squeue.
2018-07-20 11:20:22 -04:00
Wirawan Purwanto
94e0aa9490 * jupyter-anaconda2: A script that will start Jupyter notebook process
for Anaconda2 distribution.

  Note: for now it is a script that has to be submitted in the compute
  node. I will upgrade this to become a self-submitting script eventually.
2018-07-20 09:19:59 -04:00
Wirawan Purwanto
95034685ff * interact: Tool to allocate an interactive session on a regular
compute node under SLURM.
2018-06-01 12:50:56 -04:00
Wirawan Purwanto
dbce662c5a * interact-gpu: Tool to allocate an interactive session on a GPU
compute node under SLURM.
2018-06-01 12:49:31 -04:00
Wirawan Purwanto
82ea3bc689 * Archived: SGE version of pwscf-5.3 script. 2017-05-24 14:37:32 -04:00
Wirawan Purwanto
9c82c4d465 * Update bash module support: with recent changes on Turing,
'module' seems to be supported out of the box for bash.
  If 'module' environment is detected, we skip the initiation step.
2017-05-24 14:28:24 -04:00
Wirawan Purwanto
68c0e70d4d * Added "regular" runsas which runs with more limited memory. 2016-11-10 12:48:43 -05:00
Wirawan Purwanto
27b8ccd6ae * Added runsas-himem from earlier consultation this year. 2016-11-10 12:45:29 -05:00
Wirawan Purwanto
df6facce86 * pwscf: Ad-hoc fix for Turing after 2016 upgrade.
We force using the old (TCL) module system since the new module
  system (LMOD) always executes itself whenever a bash batch script
  is executed on Turing right now.
2016-11-07 11:57:34 -05:00
Wirawan Purwanto
739d765f53 * Added convenience for gathering & analyzing CPUs on the cluster.
* Documentation update.
2016-10-31 15:21:10 -04:00
Wirawan Purwanto
aa597b907c * In hoststats subcommand: Also print node status flags if they exist. 2016-10-20 10:11:18 -04:00
Wirawan Purwanto
999fe5f571 * Add more info to gather. 2016-10-20 10:10:31 -04:00
Wirawan Purwanto
cabacb58cb * Also added dump for mount points and disk free for compute nodes. 2016-09-26 13:04:30 -04:00
Wirawan Purwanto
b6d22cf68b * Added "hoststats" subcommand for summarizing host occupancy statistics
irrespective of queue.
2016-09-20 17:47:24 -04:00
Wirawan Purwanto
ebdc93e80f * Also collect dmesg snapshot. 2016-09-20 17:46:50 -04:00
Wirawan Purwanto
879927f16e * Added python workbench hpl_timing, for estimating/analyzing HPL timing. 2016-09-20 17:46:19 -04:00
Wirawan Purwanto
d6d71364de * Imported initial tools for extracting HPL benchmark results. 2016-09-20 17:44:39 -04:00
Wirawan Purwanto
483c6874c0 * Use getopt to handle command-line option.
* Include a help command.
2016-09-14 13:39:55 -04:00
Wirawan Purwanto
67bc899f4a * Fixes for unhandled/unrecognized command options.
* Documentation update.
* Added help command.
2016-09-14 13:38:29 -04:00
Wirawan Purwanto
850bd34377 * Documentation update. 2016-09-14 10:29:46 -04:00
Wirawan Purwanto
e382a5eb35 * Minor fix to strip domain name (can be truncated). 2016-09-14 10:27:45 -04:00
Wirawan Purwanto
f06803ba6c * show-node-status.py: A toolbox to analyze node status returned by SGE. 2016-09-14 10:16:35 -04:00
Wirawan Purwanto
acfb11e010 * Initial form of documentation. 2016-09-09 16:50:40 -04:00
Wirawan Purwanto
04515dcd35 * Allow external qstat-f file for raw node status dump. 2016-09-09 16:41:15 -04:00
Wirawan Purwanto
7957b28a05 * show-node-status.py: Initial tool to replace node-slot-status.sh.
This initial edition contains only "--raw" command.
2016-09-09 16:39:28 -04:00
Wirawan Purwanto
34a7659f3d * show-cluster-usage.py: A tool to summarize the usage of an
SGE cluster at a given snapshot in time.

  At present the usage is broken down by the user;
  other categories can be added in the future.
2016-08-29 19:16:11 -04:00
Wirawan Purwanto
4f28615bf0 * Added variants of node status to display (still work in progress). 2016-08-29 13:08:10 -04:00
Wirawan Purwanto
a0ad7c25bc * Added analysis tool to summarize CPUs or group compute nodes based
on their CPUs.
2016-08-29 13:04:25 -04:00
Wirawan Purwanto
52619c3688 * Added tools to dump compute node info in batch. 2016-08-26 15:09:36 -04:00
Wirawan Purwanto
f1327c9562 * SGE: Added qconf dump tools.
Added from my hpc-explore/sge tools from late 2015 time frame.
2016-08-26 10:00:46 -04:00
Wirawan Purwanto
79e5b77df2 * bash-module-env.sh: Update was required due to incomplete pre-existing
MODULEPATH in some of Turing's compute nodes.
2016-08-23 13:55:51 -04:00
Wirawan Purwanto
6f0880c547 * bash-module-env.sh: an effort to facilitate bash support for batch
scripts.

  Last modified date of this script: 2016-06-23.
2016-08-23 13:45:21 -04:00
Wirawan Purwanto
bf43a3b0b5 * sge-dump-job-status.sh: Initial version of a tool to dump desirable
SGE info from a running job.
2016-08-08 10:00:32 -04:00
Wirawan Purwanto
8b99995409 * Added find-run-hosts.sh: swiss-army tool to find hosts where a job
run, dump the process trees, etc.
2016-07-14 00:24:48 -04:00
Wirawan Purwanto
8ae0841ca6 * pwscf: Initial version of self-submitting script to launch pwscf
calculation (version 5.3).
2016-07-08 23:33:51 -04:00
Wirawan Purwanto
7f83f897c8 * sge: Added node-slot-status.sh to aggregate slot availability per node type. 2015-10-28 11:35:49 -04:00