Hello,
I'm wondering if this is a known issue. We are seeing the "UNMOUNT DATABASE" command hang sometimes.
The way I've been able to reproduce the issue is by filling up the disk space prior to issuing a "MOUNT DATABASE" command to force that command to fail. Then I try to issue the "UNMOUNT" command to recover from it (to rollback to a good state).
ASE Version:
Adaptive Server Enterprise/15.7/EBF 22235 SMP SP121 /P/Solaris AMD64/OS 5.10/ase157sp12x/3660/64-bit/FBO/Thu Mar 20 06:07:20 2014
Reproduction:
1. Unmount a database to create a manifest file:
UNMOUNT DATABASE [dxs5r7xPNcv733OTr3_marke] to '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/manifest' WITH OVERRIDE
2. Fill the disk space where the manifest file and devices reside.
3. Attempt to mount the database. This should fail due to the full disk similarly to:
1> MOUNT DATABASE [dxs5r7xPNcv733OTr3_marke] AS [dxs5r7xPNcv733OTr3_marke] FROM '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/manifest' USING
2> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dxvfr723W37I7s0Vrzewc2_7.dat' = 'dxvfr723W37I7s0Vrzewc2_7_dat',
3> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dxnF6dfL3fLo0zr3nbn4gv_2.dat' = 'dxnF6dfL3fLo0zr3nbn4gv_2_dat',
4> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dxtyv8fZ914w4uyw9eP80t_5.dat' = 'dxtyv8fZ914w4uyw9eP80t_5_dat',
5> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dxvelXNt54UW9xedxZa8uT_1.dat' = 'dxvelXNt54UW9xedxZa8uT_1_dat',
6> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dx6S8O9r7Qkt7pe5749db7_0.dat' = 'dx6S8O9r7Qkt7pe5749db7_0_dat',
7> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dxu09nflgYuv91eO1vXso7_4.dat' = 'dxu09nflgYuv91eO1vXso7_4_dat',
8> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dxuuNmX99b0Ip7fr9O9a4q_3.dat' = 'dxuuNmX99b0Ip7fr9O9a4q_3_dat',
9> '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/dxr998wB92dYpu95Vv1ked_6.dat' = 'dxr998wB92dYpu95Vv1ked_6_dat'
10> go
Msg 823, Level 24, State 2:
Server 'SRC_157_4K', Line 1:
I/O error detected during wait for BUF=0xfffffd7ffb9e8f10, MASS=0xfffffd7ffb9e8f10, Buf#=0, page=0xfffffd7ffb942000, dbid=10, Mass vdevno=25, vpage=48, Buf lpage=24, Mass stat=0x4091108, Buf stat=0x1, size=4096, cid=0
('default data cache'), Pinned xdes=0x0000000000000000, spid=0.
Msg 834, Level 20, State 4:
Server 'SRC_157_4K', Line 1:
Illegal attempt to clean buffer: BUF=0xfffffd7ffb9e8f10, MASS=0xfffffd7ffb9e8f10, Buf#=0, page=0xfffffd7ffb942000, dbid=10, Mass vdevno=25, vpage=48, Buf lpage=24, Mass stat=0x4091108, Buf stat=0x1, size=4096, cid=0
('default data cache'), Pinned xdes=0x0000000000000000, spid=0.
Msg 145, Level 14, State 10:
Server 'SRC_157_4K', Line 1:
A subquery with no aggregate functions may only contain expressions in its GROUP BY clause that are in the select-list.
4. Try to recover from the failure by unmounting the database:
1> UNMOUNT DATABASE [dxs5r7xPNcv733OTr3_marke] to '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/manifest'
2> go
This never returns ^^^.
NOTE: On the attempts where I was unable to reproduce the issue, the spid executing the "MOUNT DATABASE" command was terminated. I then reconnected and issued the UNMOUNT DATABASE command and it would not hang but could return an error.
Diagnostics:
1> dbcc traceon(3604)
2> go
DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.
1> dbcc sqltext(30)
2> go
SQL Text: UNMOUNT DATABASE [dxs5r7xPNcv733OTr3_marke] to '/export/home/sybase/toolkit/564d5bfd-187c-ff17-14af-7e77caead436-staging-73/datafile/manifest'
DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.
1> dbcc stacktrace(30)
2> go
Start dbcc stacktrace...
pc: 0x0000000001dbaef4 kpsuspend+0xa4()
pc: 0x0000000001dbdcb4 upsleepgeneric+0x2a4()
pc: 0x0000000000c15e9c dbt_get+0xf8c()
pc: 0x000000000146987f dbt_get_with_statuscheck+0xf()
pc: 0x0000000001451e16 dbcontext_change+0x36()
pc: 0x0000000001451fe2 usedb+0x42()
pc: 0x0000000001cc9b35 umt_set_dbt_unmount+0x55()
[Handler pc: 0x0000000001cc6f30 umt_handle installed by the following function:-]
pc: 0x0000000001ccc23b unmountdb+0x8b()
pc: 0x0000000000b5e008 s_execute+0x7228()
[Handler pc: 0x0000000001988460 hdl_stack installed by the following function:-]
[Handler pc: 0x000000000191e4d0 s_handle installed by the following function:-]
pc: 0x0000000000bd57ae sequencer+0x10de()
pc: 0x0000000000c72bcb tdsrecv_language+0x25b()
[Handler pc: 0x0000000001cd8020 ut_handle installed by the following function:-]
pc: 0x0000000000ba8021 conn_hdlr+0x1e81()
pc: 0x0000000001dbaff2 kpstartproc+0x32()
dbcc stacktrace finished.
DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.
I have a customer who sometimes experiences this on AIX with ASE 15.0.3 ESD#2. Their stack trace is somewhat different. I'm not sure if this is due to version or platform differences? Or maybe it is an entirely different problem?
1> dbcc sqltext(240)
2> go
SQL Text: UNMOUNT DATABASE [dxrf0vzZdLXYz9df0f_PBMIS] to '/delphix_toolkit/toolkit/42338f9c-74c2-edac-e996-fe4127b28d4b-staging-11/datafile/manifest' WITH OVERRIDE
DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.
1> dbcc stacktrace ( 240 )
2> go
Start dbcc stacktrace...
pc: 0x000000016b5b9b98 ()
pc: 0x0000000101c8ba68 upsleepgeneric+0x264()
pc: 0x00000001001fdadc fio__p_sema+0xe0()
pc: 0x00000001001fd430 ufopen+0x6c()
pc: 0x0000000100680d20 umtm_setup+0x1c()
pc: 0x000000010067aa98 umt_generate_manifest+0x1c()
pc: 0x000000010067ab28 umt_make_manifest+0x24()
pc: 0x000000010067b3d8 umt_create_dbmanifest+0x3c()
[Handler pc: 0x0000000100676064 umt_handle+0x0 installed by the following function:-]
pc: 0x000000010067b4a8 unmountdb+0x90()
pc: 0x000000010011f600 s_execute__fdpr_6+0xd54()
[Handler pc: 0x0000000101aa2028 hdl_stack+0x0 installed by the following function:-]
[Handler pc: 0x0000000101aab11c s_handle+0x0 installed by the following function:-]
pc: 0x0000000101cabbc8 sequencer+0x230()
pc: 0x0000000101d1f958 tdsrecv_language+0xb8()
[Handler pc: 0x0000000101aa30b0 ut_handle+0x0 installed by the following function:-]
pc: 0x0000000101ca0894 conn_hdlr__fdpr_10+0x3c()
dbcc stacktrace finished.
DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.
1> sp_lock
2> go
The class column will display the cursor name for locks associated with a cursor for the current user and the cursor id for other users.
fid spid loid locktype table_id page row dbname class context
------ ------ ----------- ---------------------------- ----------- ----------- ------ --------------- ------------------------------ ----------------------------
0 240 480 Ex_intent 31 0 0 master Non Cursor Lock
0 240 480 Ex_intent-blk 35 0 0 master Non Cursor Lock
0 240 480 Ex_table-blk 35 0 0 master Non Cursor Lock
0 451 902 Sh_intent 1020527638 0 0 master Non Cursor Lock
(4 rows affected)
In their case it appears to hold a lock which blocks other spids:
1> sp_who
2> go
fid spid status loginame origname hostname blk_spid dbname tempdbname cmd block_xloid
--- ---- ---------- ------------ ------------ ------------------------ -------- ------ ---------- ----------------- -----------
0 2 sleeping NULL NULL NULL 0 master tempdb DEADLOCK TUNE 0
0 3 sleeping NULL NULL NULL 0 master tempdb ASTC HANDLER 0
0 4 sleeping NULL NULL NULL 0 master tempdb ASTC HANDLER 0
0 5 sleeping NULL NULL NULL 0 master tempdb ASTC HANDLER 0
0 6 sleeping NULL NULL NULL 0 master tempdb ASTC HANDLER 0
0 7 sleeping NULL NULL NULL 0 master tempdb CHECKPOINT SLEEP 0
0 8 sleeping NULL NULL NULL 0 master tempdb HK WASH 0
0 9 sleeping NULL NULL NULL 0 master tempdb HK GC 0
0 10 sleeping NULL NULL NULL 0 master tempdb HK CHORES 0
0 11 sleeping NULL NULL NULL 0 master tempdb AUDIT PROCESS 0
0 12 sleeping NULL NULL NULL 0 master tempdb PORT MANAGER 0
0 26 running syb_sam syb_sam sybase_cdc1pbdbprd16vp01 0 master tempdb INSERT 0
0 41 sleeping NULL NULL NULL 0 master tempdb LICENSE HEARTBEAT 0
0 104 sleeping delphix_disc delphix_disc cdc1pbsybdelphixprd04 0 master tempdb MOUNT DATABASE 0
0 183 sleeping NULL NULL NULL 0 master tempdb NETWORK HANDLER 0
0 193 sleeping delphix_disc delphix_disc cdc1pbsybdelphixprd04 0 master tempdb MOUNT DATABASE 0
0 233 lock sleep delphix_disc delphix_disc cdc1pbsybdelphixprd04 240 master tempdb LOAD DATABASE 0
0 240 sleeping delphix_disc delphix_disc cdc1pbsybdelphixprd04 0 master tempdb UNMOUNT DATABASE 0
The stack trace looks like it is stuck opening a file but there does not appear to be an issue with the file system:
cdc1pbdbprd8vp04:/opt/sybase/log 639>ls -la /delphix_toolkit/toolkit/42338f9c-74c2-edac-e996-fe4127b28d4b-staging-11/datafile
total 170156950
drwxr-x--x 2 sybase sybase 12 Aug 5 20:41 .
drwxrwxr-x 5 delphix sybase 256 Aug 11 11:32 ..
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:45 dx95SjqQLYUrWu0dfLfyY1_0.dat
-rw-r----- 1 sybase sybase 24 Aug 5 18:54 dx_staging_db_name
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:45 dxeBL39TtegMP3mojf3Qwr_3.dat
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:45 dxnrv3e9333YySu335wAnH_7.dat
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:45 dxuvfvefXU8AvY294ts87X_1.dat
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:39 dxv12TXzAvf9fdvu7w6L5W_6.dat
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:29 dxv2l3reislx26fVAu5o39_4.dat
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:34 dxvbvesnQ0n7fr97wQ8jvP_5.dat
-rw-r--r-- 1 sybase sybase 33074712576 Aug 16 19:45 dxvvvv39kvUtfq1rzEx6e1_2.dat
-rw-r--r-- 1 sybase sybase 4096 Aug 9 21:35 manifest
Just curious if this is a known issue as it seems like the UNMOUNT command shouldn't hang forever. I would expect it to fail at some point.
Thanks,
Neal