RELEASE NOTES FOR SLURM VERSION 17.11
24 October 2017

IMPORTANT NOTES:
THE MAXJOBID IS NOW 67,108,863. ANY PRE-EXISTING JOBS WILL CONTINUE TO RUN BUT
NEW JOB IDS WILL BE WITHIN THE NEW MAXJOBID RANGE. Adjust your configured
MaxJobID value as needed to eliminate any confusion.

If using the slurmdbd (Slurm DataBase Daemon) you must update this first.

NOTE: If using a backup DBD you must start the primary first to do any
database conversion, the backup will not start until this has happened.

The 17.11 slurmdbd will work with Slurm daemons of version 16.05 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it.  No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do.  Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line

innodb_buffer_pool_size=64M

under the [mysqld] reference in the my.cnf file and restarting the mysqld. The
buffer pool size must be smaller than the size of the MySQL tmpdir. This is
needed when converting large tables over to the new database schema.

Slurm can be upgraded from version 16.05 or 17.02 to version 17.11 without loss
of jobs or other state information. Upgrading directly from an earlier version
of Slurm will result in loss of state information.

If using SPANK plugins that use the Slurm APIs, they should be recompiled when
upgrading Slurm to a new major release.

NOTE FOR THOSE RUNNING 17.11.[0|1]: It was found a seeded MySQL auto_increment
      value would be lost eventually if used.  This was found in the tres_table
      which tracks static TRES under 1001.  This was fixed in MariaDB >=10.2.4,
      but at the time of writing this was still around in MySQL.  Regardless if
      you are tracking licenses or GRES in the database
      (i.e. AccountingStorageTRES=gres/gpu) you might have this this issue.
      This would mean the id for gres/gpu could have been issued 5 instead of
      1001.  This is fine uptil 17.11 where a new static TRES was added taking
      up the id of 5.  If you are already running 17.11 you can easily check to
      see if you hit this problem by running 'sacctmgr list tres'.  If you see
      any Name for the Type 'billing' TRES (id=5) you are unfortunately hit
      with the bug. The fix for this issue requires manual intervention with the
      database.  Most likely if you started a slurmctld up against the slurmdbd
      the overwritten TRES is now at a different id.  You can fix the double
      issue by altering all the tables with the new TRES id back to 5, remove
      that entry in the tres_table, and then change the Type of billing back to
      the original Type and restart the slurmdbd which should finish the
      conversion.  SchedMD can assist with this. Supported sites please open a
      ticket at https://bugs.schedmd.com/.  Non-supported sites please contact
      SchedMD at sales@schedmd.com if you would like to discuss commercial
      support options.

NOTE: The slurm.spec file used to build RPM packages has been aggressively
      refactored, and some package names may now be different. Notably,
      the three daemons (slurmctld, slurmd/slurmstepd, slurmdbd) each
      have their own separate package with the binary and the appropriate
      systemd service file, which will be installed automatically (but
      not enabled).
      The slurm-plugins, slurm-munge, and slurm-lua package has been removed,
      and the contents moved in to the main slurm package.
      The slurm-sql package has been removed, and merged in with the slurm
      (job_comp_mysql.so) and slurm-slurmdbd (accounting_storage_mysql)
      packages.
      The example configuration files have been moved to slurm-example-configs.

NOTE: The refactored slurm.spec file now requires systemd to build. When
      building on older distributions, you must use the older variant which
      has been preserved as contribs/slurm.spec-legacy.

NOTE: The slurmctld is now set to fatal if there are any problems with
      any state files.  To avoid this use the new '-i' flag.

NOTE: systemd services files are installed automatically, but not enabled.
      You will need to manually enable them on the appropriate systems:
      - Controller: systemctl enable slurmctld
      - Database: systemctl enable slurmdbd
      - Compute Nodes: systemctl enable slurmd

NOTE: If you are not using Munge, but are using the "service" scripts to
      start Slurm daemons, then you will need to remove this check from the
      etc/slurm*service scripts.

NOTE: If you are upgrading with any jobs from 14.03 or earlier
      (i.e. quick upgrade from 14.03 -> 15.08 -> 17.02) you will need
      to wait until after those jobs are gone before you upgrade to 17.02
      or 17.11.

NOTE: If you interact with any memory values in a job_submit plugin, you will
      need to test against NO_VAL64 instead of NO_VAL, and change your printf
      format as well.

NOTE: The SLURM_ID_HASH used for Cray systems has changed to fully use the
      entire 64 bits of the hash.  Previously the stepid was multiplied by
      10,000,000,000 to make it easy to read both the jobid as well as the
      stepid in the hash separated by at least a couple of zeros, but this
      lead to overflow on the hash with steps like the batch and extern step
      where they used all 32 bits to represent the step.  While the new method
      ruins the easy readability it fixes the more important overflow issue.
      This most likely will go unnoticed by most, just a note of the change.

NOTE: Starting in 17.11 the slurm commands and daemons dynamically link to
      libslurmfull.so instead of statically linking.  This dramatically reduces
      the footprint of Slurm.  If for some reason this creates issues with
      your build you can configure slurm with --without-shared-libslurm.

NOTE: Spank options handled in local and allocator contexts should be able to
      handle being called multiple times. An option could be set multiple times
      through environment variables and command line options. Environment
      variables are processed first.

NOTE: IBM BlueGene/Q and Cray/ALPS modes are deprecated and will be removed
      in an upcoming release. You must add the --enable-deprecated option to
      configure to build these targets.

NOTE: Built-in BLCR support is deprecated, no longer built automatically, and
      will be removed in an upcoming release. You must add --with-blcr and
      --enable-deprecated options to configure to build this plugin.

NOTE: srun will now only read in the environment variables SLURM_JOB_NODES and
      SLURM_JOB_NODELIST instead of SLURM_NNODES and SLURM_NODELIST.  These
      latter variables have been obsolete for some time please update any
      scripts still using them.

HIGHLIGHTS
==========
 -- Support for federated clusters to manage a single work-flow across a set of
    clusters.
 -- Support for heterogeneous job allocations (various processor types, memory
    sizes, etc. by job component). Support for heterogeneous job steps within a
    single MPI_COMM_WORLD is not yet supported for most configurations.
 -- X11 support is now fully integrated with the main Slurm code. Remove any
    X11 plugin configured in your plugstack.conf file to avoid errors being
    logged about conflicting options.
 -- Added new advanced reservation flag of "flex", which permits jobs requesting
    the reservation to begin prior to the reservation's start time and use
    resources inside or outside of the reservation. A typical use case is to
    prevent jobs not explicitly requesting the reservation from using those
    reserved resources rather than forcing jobs requesting the reservation to
    use those resources in the time frame reserved.
 -- The sprio command has been modified to report a job's priority information
    for every partition the job has been submitted to.
 -- Group ID lookup performed at job submit time to avoid lookup on all compute
    nodes. Enable with PrologFlags=SendGIDs configuration parameter.
 -- Slurm commands and daemons dynamically link to libslurmfull.so instead of
    statically linking.  This dramatically reduces the footprint of Slurm.  If
    for some reason this creates issues with your build you can configure slurm
    with --without-shared-libslurm.
 -- In switch plugin, added plugin_id symbol to plugins and wrapped
    switch_jobinfo_t with dynamic_plugin_data_t in interface calls in
    order to pass switch information between clusters with different switch
    types.
 -- Changed default ProctrackType to cgroup.
 -- Changed default sched_min_interval from 0 to 2 microseconds.
 -- CRAY: --enable-native-cray is no longer an option and is on by default.
    If you want to run with ALPS please configure with --disable-native-cray.
 -- Added new 'scontrol write batch_script <jobid>' command to fetch a job's
    batch script. Removed the ability to see the script as part of the
    'scontrol -dd show job' command.
 -- Add new "billing" TRES which allows jobs to be limited based on the job's
    billable TRES calculated by the job's partition's TRESBillingWeights.
 -- Regular user use of "scontrol top" command is now disabled. Use the
    configuration parameter "SchedulerParameters=enable_user_top" to enable
    that functionality. The configuration parameter
    "SchedulerParameters=disable_user_top" will be silently ignored.
 -- Change default to let pending jobs run outside of reservation after
    reservation is gone to put jobs in held state. Added NO_HOLD_JOBS_AFTER_END
    reservation flag to use old default.
 -- Support for PMIx v2.0 as well as UCX support.

RPMBUILD CHANGES
================
 -- Components rearranged into obvious packages.
    - slurm - libraries and all commands.
    - slurmctld - daemon binary and service file for head node.
    - slurmd - daemon binary and service file for compute nodes.
    - slurmdbd - daemon binary and service file for database.

CONFIGURATION FILE CHANGES (see man appropriate man page for details)
=====================================================================
 -- Add FederationParameters=fed_display slurm.conf option to configure status
    commands to display a federated view by default if the cluster is a member
    of a federation.
 -- Add PrivateData=events configuration option.
 -- Change name of acct_gather_infiniband to acct_gather_interconnect.
 -- Add NoDecay flag to QOS.
 -- Add SchedulerParameters option of bf_max_job_user_part to specifiy the
    maximum number of jobs per user for any single partition. This differs from
    bf_max_job_user in that a separate counter is applied to each partition
    rather than having a single counter per user applied to all partitions.
 -- Changed default ProctrackType to cgroup.
 -- Add bf_max_time to SchedulerParameters.
 -- Add bf_max_job_assoc to SchedulerParameters.
 -- Add new SchedulerParameters option bf_window_linear to control the rate at
    which the backfill test window expands. This can be used on a system with
    a modest number of running jobs (hundreds of jobs) to help prevent expected
    start times of pending jobs to get pushed forward in time. On systems with
    large numbers of running jobs, performance of the backfill scheduler will
    suffer and fewer jobs will be evaluated.
 -- Add slurm.conf configuration parameters SlurmctldSyslogDebug and
    SlurmdSyslogDebug to control which messages from the slurmctld and slurmd
    daemons get written to syslog.
 -- Add slurmdbd.conf configuration parameter DebugLevelSyslog to control which
    messages from the slurmdbd daemon get written to syslog.
 -- Add slurmdbd.conf configuration parameter MaxQueryTimeRange to prevent
    overly broad accounting database lookups from overwhelming SlurmDBD.

COMMAND CHANGES (see man pages for details)
===========================================
 -- The '-q' option to srun has changed from being the short form of
    '--quit-on-interrupt' to '--qos'.
 -- Modify all daemons to re-open log files on receipt of SIGUSR2 signal. This
    is much than using SIGHUP to re-read the configuration file and rebuild
    various tables.
 -- Add scancel "--hurry" option to avoid staging out any burst buffer data.
 -- Add new advanced reservation flags of "weekday" (repeat on each weekday;
    Monday through Friday) and "weekend" (repeat on each weekend day; Saturday
    and Sunday).
 -- Node "OS" field expanded from "sysname" to "sysname release version" (e.g.
    change from "Linux" to
    "Linux 4.8.0-28-generic #28-Ubuntu SMP Sat Feb 8 09:15:00 UTC 2017").
 -- scontrol modified to report core IDs for reservation containing individual
    cores.
 -- Add ability to define features on clusters for directing federated jobs to
    different clusters.
 -- Modify slurm_load_jobs() function to load job information from all clusters
    in a federation.
 -- Add squeue --local and --sibling options to modify filtering of jobs on
    federated clusters.
 -- Add sprio -p/--partition option to filter jobs by partition name.
 -- Add sprio --local and --sibling options for use in federation of clusters.
 -- Add sprio "%c" format to print cluster name in federation mode.
 -- Extend the output of the seff utility to also include the job's wall-clock
    time.
 -- The --cpu_bind and --mem_bind options have been renamed to --cpu-bind and
    --mem-bind for consistency with the rest of the command line options.
    (Both old and new syntax are supported for now.)

OTHER CHANGES
=============
 -- KNL features: Always keep active and available features in the same order:
    first site-specific features, next MCDRAM modes, last NUMA modes.

API CHANGES
===========

Changed members of the following structs
========================================

Added members to the following struct definitions
=================================================
In slurmbdb_cluster_fed_t: Added feature_list to hold cluster features.
In job_desc_msg_t: Added cluster_features for passing cluster features to
	controller.
		   Renamed fed_siblings to fed_siblings_active.
		   Added fed_siblings_viable.
In job_info_t: Added cluster_features for passing back a job's cluster features
	from the controller.
               Renamed fed_siblings[_str] fed_siblings_active[_str]
	       Added fed_siblings_viable[_str].
In struct job_details: Added cluster_features to hold requestsed cluster
	features.
In job_fed_details_t: Rename siblings to siblings_active.
		      Added siblings_viable.
In job_info_request_msg: Added job_ids to be able to request job info for
		       specific jobs.
In job_step_kill_msg_t: Added sibling string to remove active sibling job.
In slurm_ctl_conf_t: Added group_force and group_time. (Were both in group_info
		     previously.)

Added the following struct definitions
======================================
In job_alloc_info_msg_t: add req_cluster to indicate where the request is coming
	from.
Added reroute_msg_t to route a message to a different cluster.

Removed members from the following struct definitions
=====================================================
In slurm_ctl_conf_t: Removed group_info and slurmd_plugstack.

Changed the following enums and #defines
========================================

Added the following API's
=========================
Added slurm_kill_job_msg: to send prepared job_step_kill_msg_t.
Added slurm_job_batch_script: to retrieve the batch script for a given jobid
	into a local file.

Changed the following API's
============================

Removed the following API's
===========================
Removed unused slurm_allocation_lookup() and rename slurm_allocation_lookup_lite() to
	slurm_allocation_lookup().
Removed unused slurm_get_slurmd_plugstack() function.
