JaguarCNL Calendar

Availability Calendar

April 2007
S M T W T F S
 1    2    3    4    5    6    7  
 8    9    10   11   12   13   14 
 15   16   17   18   19   20   21 
 22   23   24   25   26   27   28 
 29   30           
             
 
May 2007
S M T W T F S
 1    2    3    4    5  
 6    7    8    9    10   11   12 
 13   14   15   16   17   18   19 
 20   21   22   23   24   25   26 
 27   28   29   30   31     
             
 
June 2007
S M T W T F S
 1    2  
 3    4    5    6    7    8    9  
 10   11   12   13   14   15   16 
 17   18   19   20   21   22   23 
 24   25   26   27   28   29   30 
             
July 2007
S M T W T F S
 1    2    3    4    5    6    7  
 8    9    10   11   12   13   14 
 15   16   17   18   19   20   21 
 22   23   24   25   26   27   28 
 29   30   31         
             
 
August 2007
S M T W T F S
 1    2    3    4  
 5    6    7    8    9    10   11 
 12   13   14   15   16   17   18 
 19   20   21   22   23   24   25 
 26   27   28   29   30   31   
             
 
September 2007
S M T W T F S
 1  
 2    3    4    5    6    7    8  
 9    10   11   12   13   14   15 
 16   17   18   19   20   21   22 
 23   24   25   26   27   28   29 
 30             
October 2007
S M T W T F S
 1    2    3    4    5    6  
 7    8    9    10   11   12   13 
 14   15   16   17   18   19   20 
 21   22   23   24   25   26   27 
 28   29   30   31       
             
 
November 2007
S M T W T F S
 1    2    3  
 4    5    6    7    8    9    10 
 11   12   13   14   15   16   17 
 18   19   20   21   22   23   24 
 25   26   27   28   29   30   
             
 
December 2007
S M T W T F S
 1  
 2    3    4    5    6    7    8  
 9    10   11   12   13   14   15 
 16   17   18   19   20   21   22 
 23   24   25   26   27   28   29 
 30   31           
January 2008
S M T W T F S
 1    2    3    4    5  
 6    7    8    9    10   11   12 
 13   14   15   16   17   18   19 
 20   21   22   23   24   25   26 
 27   28   29   30   31     
             
 
February 2008
S M T W T F S
 1    2  
 3    4    5    6    7    8    9  
 10   11   12   13   14   15   16 
 17   18   19   20   21   22   23 
 24   25   26   27   28   29   
             
 
March 2008
S M T W T F S
 1  
 2    3    4    5    6    7    8  
 9    10   11   12   13   14   15 
 16   17   18   19   20   21   22 
 23   24   25   26   27   28   29 
 30   31           
April 2008
S M T W T F S
 1    2    3    4    5  
 6    7    8    9    10   11   12 
 13   14   15   16   17   18   19 
 20   21   22   23   24   25   26 
 27   28   29   30       
             
 
May 2008
S M T W T F S
 1    2    3  
 4    5    6    7    8    9    10 
 11   12   13   14   15   16   17 
 18   19   20   21   22   23   24 
 25   26   27   28   29   30   31 
             
 
June 2008
S M T W T F S
 1    2    3    4    5    6    7  
 8    9    10   11   12   13   14 
 15   16   17   18   19   20   21 
 22   23   24   25   26   27   28 
 29   30           
             

    Full System Full Day         Full System Partial Day         Notable Event    
    Partial System Full Day         Partial System Partial Day         Compute Partition Resize    

Current Events

  • Since hardware maintenance was completed during the outage last night, the scheduled downtime for today has been cancelled.
    Updated: 2008-04-29 08:30:00

Outage Details

Downtime for April 2007

StartEndComments


Downtime for May 2007

StartEndComments


Downtime for June 2007

StartEndComments


Downtime for July 2007

StartEndComments


Downtime for August 2007

StartEndComments
19 Aug
10:32
19 Aug
13:42
System rebooted after an OSS panic caused Lustre to become unresponsive. Jobs running at the time of the outage were killed; those in the queue (but not yet running) were not affected.
20 Aug
23:40
21 Aug
02:30
Many nodes reported "out of memory". System was rebooted to bring these back into the compute pool. Jobs running at the time of the outage were killed; those in the queue (but not yet running) were not affected.
21 Aug
16:44
21 Aug
18:57
System rebooted to enable a new version of ALPS. During the outage, a module was replaced. Jobs running at the time of the outage were killed; those in the queue (but not yet running) were not affected.
24 Aug
18:44
24 Aug
20:50
System rebooted due to lustre panic. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
25 Aug
12:57
25 Aug
14:06
System rebooted. Jobs running prior to the outage were killed; jobs in the queue (but not yet running) were not affected.


Downtime for September 2007

StartEndComments
02 Sep
11:12
02 Sep
12:08
System rebooted after report of jobs hanging. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
04 Sep
13:00
04 Sep
15:00
Downtime to install kernel patch. During the outage, a VRTY was replaced and diagnostic testing was performed.
05 Sep
16:29
05 Sep
17:33
System rebooted due to lustre problems and slow response time. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected).
06 Sep
15:05
06 Sep
15:44
System rebooted after lustre and several OSSs stopped responding. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
07 Sep
08:46
07 Sep
09:12
System crashed after portals problems caused lustre to fail. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
07 Sep
17:43
07 Sep
18:06
System crashed and was rebooted. Jobs running at the time of the crash were killed; jobs in the queue (but not yet running) were not affected.
08 Sep
05:00
08 Sep
15:19
System unavailable while NFS mounted directories were moved to a new server. During the outage, additional system testing was performed and as a result the system was rebooted.
12 Sep
13:00
12 Sep
16:07
System taken down for dedicated application testing.
13 Sep
11:54
13 Sep
17:37
System crashed due to a failed hardware link. During the outage, maintenance was performed. Failed VRTYs and processors were replaced. In addition, system patches were applied. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
13 Sep
20:54
13 Sep
21:51
System crashed due to failed hardware link. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
17 Sep
21:20
17 Sep
21:52
Portals errors caused system performance to degrade. System was rebooted to clear the errors. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
18 Sep
08:49
18 Sep
09:13
System crashed due to a problem with Global Arrays. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
21 Sep
15:05
21 Sep
17:05
System rebooted after PBS node became unresponsive. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
22 Sep
11:37
22 Sep
13:43
System rebooted after a module powered off and would not power back on. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.


Downtime for October 2007

StartEndComments
02 Oct
08:00
02 Oct
14:25
Installed patches and performed hardware maintenance.
04 Oct
08:00
10 Oct
08:00
System unavailable while a new Lustre filesystem was built. Additionally, several dedicated runs were performed.
22 Oct
08:30
22 Oct
11:52
System rebooted after many nodes became unresponsive. Jobs running at the time of the outage were killed; jobs in the queue (but not yet running) were not affected.
26 Oct
08:00
26 Oct
18:30
OS upgraded to UNICOS/lc 2.0.26.
30 Oct
08:00
30 Oct
12:00
Scheduled maintenance


Downtime for November 2007

StartEndComments
01 Nov
08:00
01 Nov
15:00
Moved 8 cabinets from jaguar to jaguarcnl. Jaguarcnl now has 40 cabinets.
01 Nov
17:16
01 Nov
21:46
System rebooted due to portals problems. Jobs running at the time of the outage were killed; jobs in the queue (but not tyet running) were not affected.
02 Nov
07:39
02 Nov
17:03
System crashed. Jobs running at the time of the outage were killed; jobs in the queue (but not tyet running) were not affected.
14 Nov
08:00
14 Nov
15:16
Hardware maintenance followed by system testing.
20 Nov
08:00
20 Nov
12:00
System maintenance
20 Nov
12:00
20 Nov
22:07
System reboot following the downtime failed due to several hardware problems. These problems were corrected and the system was rebooted with one node disabled.
27 Nov
22:15
27 Nov
23:25
System rebooted after one of the OSS nodes crashed.
29 Nov
11:37
29 Nov
17:55
System rebooted due to a hardware link failure. During the outage, a hardware module and a DDN controller were replaced.
29 Nov
19:55
30 Nov
04:12
System rebooted due to problems with the Lustre filesystem.


Downtime for December 2007

StartEndComments
06 Dec
08:00
06 Dec
15:00
OS upgraded to UNICOS/lc 2.0.33
07 Dec
16:07
07 Dec
20:15
System rebooted due to problems with a hardware module.
12 Dec
07:30
12 Dec
09:47
System down due to maintenance on the site chilled water system. Installed a patch during the outage.
30 Dec
16:40
30 Dec
17:16
System performance had been degrading (node panics, login node problems, etc.). A debug patch was removed and the system was rebooted to clear these problems.


Downtime for January 2008

StartEndComments
04 Jan
08:17
04 Jan
10:35
System rebooted due to a failed VRTY.
15 Jan
05:45
15 Jan
10:45
System rebooted.
15 Jan
23:08
15 Jan
23:31
System rebooted due to failed hardware link.
18 Jan
15:52
18 Jan
16:25
Several compute nodes were marked 'up' but were causing jobs to hang. The system was rebooted to clear the problems on these nodes.
22 Jan
08:00
22 Jan
11:30
Replaced a mezzanine card and relocated several hardware modules during scheduled maintenance.
24 Jan
20:00
24 Jan
21:09
System rebooted. During the outage, a new portals patch was installed.
28 Jan
00:09
28 Jan
00:45
System rebooted after a module powered off.
28 Jan
01:51
28 Jan
03:09
System rebooted after a module powered off.
29 Jan
07:20
29 Jan
11:02
During maintenance, replaced a mezzanine card and two DIMMS. Additionally, replaced three VRTYs that caused modules to power off earlier in the week.
29 Jan
20:11
29 Jan
21:23
System rebooted after hardware link failure.
31 Jan
10:02
31 Jan
11:38
System was unavailable. During the outage, a portals patch was installed.


Downtime for February 2008

StartEndComments
01 Feb
16:32
01 Feb
17:45
System rebooted after a panic.
05 Feb
07:30
05 Feb
12:24
Maintenance to repair failed hardware links
12 Feb
09:39
12 Feb
10:51
System rebooted after an OSS panic.
19 Feb
08:00
19 Feb
12:00
System testing
19 Feb
20:39
19 Feb
21:51
System rebooted to clear problems with one of OSS nodes.
26 Feb
08:00
26 Feb
09:00
System maintenance


Downtime for March 2008

StartEndComments
11 Mar
08:00
11 Mar
11:43
System maintenance.
14 Mar
16:00
17 Mar
05:47
System unavailable due to site power outage.
18 Mar
07:30
18 Mar
14:01
System unavailable
28 Mar
11:20
28 Mar
13:49
System rebooted due to HSN hang.


Downtime for April 2008

StartEndComments
01 Apr
08:00
01 Apr
11:23
Replaced a hardware module during system maintenance.
17 Apr
18:32
17 Apr
19:17
System rebooted to repair failed hardware link
19 Apr
10:14
19 Apr
11:55
Several nodes powered off. They were disabled and the system was rebooted.
25 Apr
22:08
25 Apr
23:58
System rebooted due to problems on an OSS node.
28 Apr
16:00
28 Apr
18:19
System crashed due to a DDN failure. During the outage, other hardware was replaced so the maintenance scheduled for 29 April is cancelled.
29 Apr
18:30
29 Apr
22:30
System down due to a failure on a hardware module. The module was disabled and the system was rebooted.


Downtime for May 2008

StartEndComments
01 May
04:25
01 May
05:23
System was rebooted due to an error on one of the hardware modules.


Downtime for June 2008

StartEndComments