|
|
Introduction
Smenu has a
specific implementation for diagnose Database based on the idea
of strong DBA, namely Patrick Calwaert. The main idea is quite
simple but yield excellent results. We usually get
the Right picture of problems within 30s. A more accurate picture can
be taken from hours of sampling, but usually, it only confirms the
measurements of the first
seconds. The concept is based on the Oracle Wait Interface and
consists in duping the contents of V$SESSION_WAIT, V$SQL, V$SQLTEXT .
The specificity of this OWI implementaiton is that we aggregate the
values by SQL HASH_VALUE and completely
discart the notion of session. In fact we don't care who is
doing what, we are only
interrested on what the database had to endure, regardless of its
origin.
Once we successfully assert what slows the DB or has slowed it -
sampler can be used for history tracking - then we can
start to point those who runs it. The 4 tasks of 'spl'
Sampling the v$session_wait in Smenu does not creates
tables in the target DB. Smenu dump the contents of v$session_wait and
v$sql
into flat files using the utl_file in connection with the Create DIR.
For that reason, it can only work for version 9ir2 and above. Starting sampler options:
| |||||||||||||||||||||
| You can use
your windows Desktop/laptop with Linux or Windows Cygwin as a central
dunning repository, holding all your dump file and measurement from all
the network. On this box you will configure the graphic library. |
As soon as the sampler is started, it
produces flat files. You can start to query these flats files immediatly
[befsam18:POLDEV]> spl -t
MACHINE
- ORACLE_SID :
POLDEV
Page: 1
Date
- Monday 30th January 2006
13:09:51 Username
- SYS
Report
- Top 10 events for POLDEV.30130934 nline
.
Type 'spl -h' for
help Displaying
10 rows
Total number
of entries in waits file
: 9
Sample
Nbr
Wait Event
name
SQL_HASH_VALUE WAIT_TIME Perc Object name
---------
------------------------------ -------------- ---------- -------
----------------------------------------
3 db file sequential
read
511653754
2 33.33
2 db file sequential
read
4002059376
2 22.22
1 SQL*Net message to
client
79360305
0 11.11
1 SQL*Net message to
client
644478545
0 11.11
1 db file sequential
read
1308883746
11 11.11
1 db file sequential
read
2259161642
0 11.11
This tells you that there is 9
events captured by the sampler and gives you first glinch over what is
happening. The sql has
h_value are taken from v$sql
area
and you can get the text by typing spl
-hv <hash_value>
After 45 seconds we retake a
sample we start to see more precise info:
/opt/smenu>
spl -t
MACHINE
- ORACLE_SID :
POLDEV
Page: 1
Date
- Monday 30th January 2006
13:11:37 Username
- SYS
Report
- Top 10 events for POLDEV.30130934 nline
.
Type 'spl -h' for
help Displaying
10 rows
Total number
of entries in waits file : 132
Sample
Nbr
Wait Event
name
SQL_HASH_VALUE WAIT_TIME Perc Object name
---------
------------------------------ -------------- ---------- -------
----------------------------------------
74 db file sequential
read
511653754
97 56.06
16 db file scattered
read
511653754
21 12.12
10 SQL*Net message to
client
79360305
0 7.58
4 control file parallel
write
0
8 3.03
4 db file sequential
read
4002059376
2 3.03
3 SQL*Net message to
client
1076837221
2 2.27
3 db file sequential
read
4028880301
2 2.27
3 db file sequential
read
2347967350
0 2.27
2 db file sequential
read
0
2 1.52
2 db file sequential
read
1384212996
0 1.52
sql 511653754 is now
emerging in our sample file. Let's have a look to it.
We can take the text :
[befsam18:POLDEV]:>
spl -hv 511653754
Using default
SAMPLE_WORK_DIR
Using default
SPL
SELECT
COUNT(pr.CUST_REGARDING_ID) FROM CUSTOMER_FACT pr WHERE
pr.PRT_CUSTO_TP_ID = 1 AND SYSDATE BETWEEN pr.START_DATE AND
NVL(p
r.END_DATE,TO_DATE('12-12-2050','DD-MM-YYYY')) GROUP BY
pr.CUSTOMER_FACT_ID HAVING COUNT(pr.CUST_REGARDING_ID) != 1
AND pr.CUST
_REGARDING_ID = :b1
We can get also from 'spl' the
link between the SQL and dba_extents and get the segments over which it
was waiting.
Note:
| You need to access the original DB for DBA_SEGMENTS, as the P1 and P2 arguments containted in the sample file refers to files and blocks numbers of the original DB. This restriction does not exits for others queries throuhg SPL and you can transfer the sample files to another location to interrogate them. Then the only restriction is to have a DB up and running version 9ir2 and above. |
[befsam18:POLDEV]>
spl -tseg
Using default
SAMPLE_WORK_DIR
Using default
SPL
MACHINE
- ORACLE_SID :
POLDEV
Page: 1
Date
- Monday 30th January 2006
13:13:21 Username
- SYS
Report
- Top 10 events for POLDEV.30130934 nline
.
Type 'spl -h' for
help Displaying
10 rows
Total number
of entries in waits file : 239
Nbr
SQL Time
Sample
waits
Event
name
Hash Value waits Perc
Segment
name
Type
-------
------------------------- ------------ --------- -------
-------------------------------- ---------
67 db file sequential read
511653754 159 48.54
CUST
TABLE
13 db file scattered read
511653754
22 9.21
CUST
TABLE
9 db file sequential read
511653754
32 6.69
ORG_INF_TRNS
TABLE
4 db file sequential read
511653754
0 1.67
NDX_ORG_INF_TRNS06
INDEX
4 db file sequential read
4002059376
2 1.67
CUST
TABLE
2 db file sequential
read
0 2
0.84
CUST
TABLE
2 db file sequential read
511653754
6 1.67
NDX_CUST02
INDEX
2 db file sequential read
1384212996
0 0.84
CUST
TABLE
2 db file sequential read
2347967350
0 0.84
CUST
TABLE
2 db file sequential read
2354715425
0 0.84
NDX_CUST14
INDEX
If the monitoring is running,
the explain plan for the hash value is probably still there :
[befsam18:POLDEV]> sx 511653754
Date
- Monday 30th January 2006
13:14:30
Username
- SYS
Show explain
plan for query hash_value = 511653754
Search
ID
PARENT
OPERATION
COST CARDINALITY Cols OBJECT_NOD OBJECT_NAME
---- ------
----------------------------------- ---------- ----------- ------
---------- ------------------------------
0 SELECT
STATEMENT
70
0
1 0
FILTER
0
2 1 SORT GROUP
BY
70
1 0
3 2 TABLE ACCESS BY
INDEX
ROWID
61
1836
0
CUSTOMER_FACT
4 3 INDEX RANGE
SCAN
221
36727
1
NDX_CUSTOMER_FACT01
4 3 INDEX RANGE
SCAN
221
36727
1
NDX_CUSTOMER_FACT01
3 2 TABLE ACCESS BY
INDEX
ROWID
61
1836
0
CUSTOMER_FACT
4 3 INDEX RANGE
SCAN
221
36727
1
NDX_CUSTOMER_FACT01
4 3 INDEX RANGE
SCAN
221
36727
1
NDX_CUSTOMER_FACT01
There are lot of
options and sort options for reporting :
spl -d <DIR> -s <SID_ddhhmmss> -t [-hv <SQL_HASH_VALUE>] spl -t [-hv <SQL_HASH_VALUE>] : Top events, group by event, segment_name; optional limit to an hash value spl -t2 [-hv <SQL_HASH_VALUE>] : Top events, group by event,p1,p2 ; optional limit to an hash value spl -enq : Show all enqueue events and enqueue type (count enqueue) spl -enqh : Show enqueue events, group by hash_value (list sql with enqueue) spl -tseg : Top events, give segment name per hash_value, when applicable npl -ev : List top 10 events in sample for all event spl -ev <EVENT NAME> : List top 10 events in sample for specific event. Name must exist in v spl -de <SQL_HASH_VALUE> -gr <n> : extract delta for SQL_HASH_VALUE (give results only for one hash_Value) group sample by <n> minutes spl -dw -img -rn <ROWS> : Display all values Sort by rows processed spl -dr -img -rn <ROWS> : Display all values Sort by disk_reads spl -dg -img -rn <ROWS> : Display all values Sort by buffer gets spl -dx -img -rn <ROWS> : Display all values Sort by executions spl -dwx . : Display all values Sort by rows/execution spl -drx . : Display all values Sort by disk reads/executions spl -dgx . : Display all values Sort by buffer/executions spl -img -w -rn <ROWS> : List SQL sorted by rows processed spl -img -r -rn <ROWS> : List SQL sorted by Disk reads spl -img -g -rn <ROWS> : List SQL sorted by buffer gets spl -img -x -rn <ROWS> : List SQL sorted by execution spl -img -wx -rn <ROWS> : List SQL sorted by rows/exec spl -img -gx -rn <ROWS> : List SQL sorted by buffer_gets/exec spl -img -rx -rn <ROWS> : List SQL sorted by disk_reads/exec spl -img -rw -rn <ROWS> : List SQL sorted by disk_reads/rows spl -img -gw -rn <ROWS> : List SQL sorted by buffer_gets/rows You can pile the arguments, in this case the SQL is sorted by the first argument : spl -img -g -r -w -x Default are used when If DIR is not given then C:/app/cygwin/opt/smenu/tmp is used if file extention is not givent then latest file for ORACLE_SID is used -s : Oracle SID and time in the name of the sample file -d : SAMPLE_DIR |
============
Examples :
============
A) At 10h00 you want to start a sampler
of 4 hours:
spl
start -l 240
# it will stop automatically after 240 minutes
# You can stop it before with 'spl stop'
You can have the type of events already catched :
spl -ev
Nbr
Wait Event
name
WAIT_TIME Perc Object name
---------
------------------------------ ---------- -------
---------------------------
9005 i/o slave
wait
32468 42.69
1774 Queue Monitor
Wait
269290 8.41
1101 db file sequential
read
14 5.22
1005 latch: cache buffers
chains
4103 4.76
745 db file sequential
read
4973 3.53
464 io
done
1 2.20
247 jobq slave
wait
7053 1.17
218 db file sequential
read
4 1.03
209 db file sequential
read
28 0.99
157 db file sequential
read
0 0.74
B) You want to see the distribution for
"latch: cach buffers chains". Just type spl -ev and
cut and past the string text "latch: cache buffers chains"
spl -ev "latch: cache buffers
chains"
# double quotes are not needed
Sample
Nbr
Wait Event
name
SQL_HASH_VALUE WAIT_TIME Perc Object name
---------
------------------------------ -------------- ---------- -------
---------------------------
1005 latch: cache buffers
chains
1323209015 4103
4.76
32 latch: cache buffers
chains
4231337329
257 0.15
19 latch: cache buffers
chains
1323209015
87 0.09
16 latch: cache buffers
chains
815308336
4 0.08 PUBLIC./2ccf0331_Method
10 latch: cache buffers
chains
815308336
6 0.05 PUBLIC.sun/misc/Timeable
7 latch: cache buffers chains
4231337329
16 0.03 SYS./724f4e6d_Builder
C) You would like to know which SQL is
waiting on witch object and what it is waiting:
spl -tseg
Nbr
SQL Time
Sample
waits
Event
name
Hash Value waits Perc
Segment
name
Type
-------
------------------------- ------------ --------- -------
-------------------------------- ---------
15 db file sequential read
2834669896
0 0.07
SYS_LOB0000049528C000073912
LOBSEGMENT
3 db file sequential read
475547433
0 0.01
SYS_LOB0000049528C000073912
LOBSEGMENT
3 db file sequential read
2240780814
0 0.01
SYS_LOB0000049528C000073912
LOBSEGMENT
D) While the sampler was running or
after you asked the top 10 sorted by execution, buffer gets,
row_processed, disk_reads and you want a graphic also:
/opt/smenu>spl -x -g -r
-d -img
MACHINE poldev -
ORACLE_SID :
asdb
Page: 1
Date
- Thursday 13th April 2006
16:22:57
Username
-
SYS
Report
- List Sample Delta
From
- 2006-03-31 08:07:18 -->
08:37:45
HASH_VALUE
EXECUTIONS BUFFER_GETS DISK_READS ROWS_PROCESSED
----------
---------- ----------- ---------- --------------
1016061570
1414014
3508850
26 680822
545138317
785587
1571174
1 785587
4010258057
226010
0
0 226007
921443695
166367
833422
28 166362
3119962695
164954
1822700
1080 164954
3728733328
164944
1977065
1563 164302
3986006057
144903
435680
1530 143970
3208295378
144902
434505
23 144847
211256130
109595
8777
0 109595
4080626437
87105
636040
11585 87105
Sample
duration : 30Min26
Image
required : calling
C:/app/cygwin/opt/smenu/module3/s1/smenu_create_img.ksh -x -g -r -w
-img -f C:/app/cygwin/opt/smen
_420.log
C:/app/cygwin/opt/smenu/tmp/img_2444.png
done

We could have asked only for the
execution and also why not change the colors :

E) You notice query HASH_VALUE
1016061570 is the top. You retrieve its text with:
/smenu/opt> spl -hv 1016061570
| SELECT DECODE(:B6 , 'Y', 1/
ERE.RATE , ERE.RATE) ERE_RATE FROM EXCHANGE_RATES ERE WHERE
ERE.PERIOD_CODE = :B5 AND ERE.CURRENCY_CODE_ EXCHANGED_FROM = :B4 AND ERE.CURRENCY_CODE_EXCHANGED_TO = :B3 AND ERE.DATA_TYPE_CODE = :B2 AND ERE.TYPE = :B1 AND ERE.STATUS = 'A' |
Reporting For a Single
SQL over time
At 15h30 you are asked about a
graphich for query HASH_VALUE 1016061570
behaviour between 08h05 and 08h35
Information will be given grouped by 5 minutes. You must go back from
the top of the file (14h00)
spl -min 90 -de 345678 -gr 5 -rn 6 -img
This will display starting from
08h07 (start of our sample), 6 lines, each line will sum up the
activity for 5 minutes
for rows_processed, disk_reads,
Start date :
20060331080821 Total entries in
sample file: 65246
End
date : 20060331083745
SQL_HASH_VALUE
: 1016061570
TIME
ROWS_PROCESSED DISK_READS EXECUTIONS
BUFFER_GETS
SORTS CPU_TIME
--------
-------------- ---------- ----------
----------- ---------- ---------
08:09:25
1228860
1514
2719171
6667141
0 182.102951
-------#
08:14:39
437492
15
888008
2213508
0 53.916941
08:19:54
78676
0
196694
472064
0 11.484244
08:25:09
3
3
9
21
0 .00066
08:30:24
37211
23764
145642
334337
0 .001447
08:35:39
2
4
4
10
0 .000411
08:37:45
0
0
0
0
0 0

F) If you have a sample of 24h and want
to have the detail of a query by hour you will type:
spl -min 1440 -de
<HASH_VALUE> -gr 60
if you want only the last 12h summing
the activity of each hour a graphic:
spl -min 720 -de
<HASH_VALUE> -gr 60 -img
G) Your database is slow, you started a
sampler and want to retrieve all infos sorted executions for the
last 15 minutes. Note that the options
-dx
gives you the ratios also. In fact it is sorted by execution for
you choose -dx. You can sort it by buffer (-dg) and rows processed (-dw)
spl -min 15 -dx -img:
HASH_VALUE ROWS_PROCESSED DISK_READS BUFFER_GETS EXECUTIONS Bufs/Exec Rows/Exec Reads/Exec CPU_TIME
---------- -------------- ---------- ----------- ---------- ---------- ---------- ---------- ----------
1016061570 680822 26 3508850 1414014 2 0 0 0
545138317 785587 1 1571174 785587 2 1 0 0
4010258057 226007 0 0 226010 0 1 0 0
921443695 166362 28 833422 166367 5 1 0 0
3119962695 164954 1080 1822700 164954 11 1 0 0
.
.

|
'Spl' offers other goodies such
as usefull ratio, among the most importants are:
spl -img -gx -rn <ROWS> : List SQL sorted by buffer_gets/exec spl -img -rx -rn <ROWS> : List SQL sorted by disk_reads/exec spl -img -rw -rn <ROWS> : List SQL sorted by disk_reads/rows spl -img -gw -rn <ROWS> : List SQL sorted by buffer_gets/rows All these ratio are most usefull
and underline those who are make lots of disk read in order to produce
a small amount of ro |
