Sampler

Home Up

 

DB diagnostic with Smenu
 
  1. Introduction
  2. Start sampler again local DB or remote DB
  3. First investigation
  4. how to access the explain plan for a given hash_value in sample
  5. Maintain ascii  samples
  6. Reporting : Values for Multiple SQL
  7. reporting :  Values for a single SQL over time
  8. Reporting : SQL ratio

Introduction

                    Smenu has a specific implementation for diagnose Database based on the idea of strong DBA, namely Patrick Calwaert. The main idea is quite simple but yield excellent results. We usually get the Right picture of problems within 30s. A more accurate picture can be taken from hours of sampling, but usually, it only confirms the measurements of the first seconds.  The concept is based on the Oracle Wait Interface and consists in duping the contents of V$SESSION_WAIT, V$SQL, V$SQLTEXT . The specificity of this OWI implementaiton is that we aggregate the values by SQL HASH_VALUE and completely discart the notion of session. In fact we don't care who is doing what, we are only interrested on what the database had to endure, regardless of its origin. Once we successfully assert what slows the DB or has slowed it - sampler can be used for history tracking - then we can start to point those who runs it.

  There is only one shortcut in Smenu that deal with this, 'spl' and it stands for 'sampler'. it is in fact a false shortcuts as behind the scene there are 4 scripts.

The 4 tasks of 'spl'

  1.       Interface for fine grained setting. This is for lazy people for you can have all with command line, but you can also set the default behaviour for futur uses.
  2.       Generate the Korn shell that will be run in background. this leaves you the hability to request a sample of the DB  for several days and be able to logout.
  3.       Check and maintains the external tables definitions and connect the proper flat file with the proper table definition
  4.       Issue SQL irequest against the sampled files, produce and display the graphics when required

               Sampling the v$session_wait in Smenu does not creates tables in the target DB. Smenu dump the contents of v$session_wait and v$sql into flat files using the utl_file in connection with the Create DIR. For that reason, it can only work for version 9ir2 and above.

Starting sampler options:

 To start a sampler, just put yourself in the TMP dir of smenu ('tp') and type 'spl start'

[befsam18:POLDEV]> spl start
/export/TAR/oracle/scripts/smenu/tmp/sampler_sql_w_POLDEV.ksh
Sampler started for waits and sql text

Here are the differents option of 'spl' to start a sampler :


         spl start  -l <duration seconds> -i <sec> -s<sec>
         spl stop

               -l : duration of sampler in seconds: default is 1800
               -i : interval of dump for OWI, default is 1 second
               -s : interval of dump for SQL figures, default is 60s


Sampler exists in 2 forms :

       korn shell/PLSQL   # need a user with the create directory rights and execute on utl_file plus a local directory.
       Korn shell/Perl    # need a user with access to v$sql, v$session_wait, v$sysstats, no others access needed on server

Both produce the same contents. The PLSQL version was the first version but lack the capability to dump a remote DB to a local filesystem. So the perl version was created. Howerver for the moment the perl version is not able to connect as sys. Spl choose himeself which version to use. If he cannot detect the DB, it assumes the DB is remote and check for the default user of smenu and eventully a specific default user for the remote instance if you ever assigne one (SM/1.3). It then connect and start it work.

You can force the perl version with the -perl option and force the korn shell version with option -local. This last option is usefull when usiug cygwin, as Cygwin is unable to detect the processes of a DB if it is local to your windows machine.

spl -start local
spl -start -u system -o REMOTE_SID

if you password is not in the $SBIN/.passwd file (use 'vpas' or SM/1.2) then you can still provide the passwd:

spl -start -u system -p manager -o REMOTE_SID

Still you can give the -l -i -s options :

 spl -start -u system -p manager -o DBPROD -l 900 -i 2 -s 30.

Will start to dump in your current directory the waits states taken every 2 seconds, dump v$sql every 30s for a total duration of 900 secs (15 min), connecting to DBPROD with user system/manager. You can use any user, providing that


Maintaining dumped ascii files locally

 


    If you used the perl sampler to get measuremment from a remote DB, you created local files from a remote DB. In Order to query these ascii files, Smenu needs to create a directory mount this directory and create the ascii file as external tables. After that these tables can be queried. But oracle directories can only be local to a DB server. So the trick is to rename these ascii file to the local DB

      spl -rename -o <REMOTE SID> [-n <LOCAL SID>] [-d <date>]


   ie) Given that my local DB is TESTDB

-rwxr-xr-x  1 a118990 mkgroup-l-d 3248945 Apr 19 21:38 sample_delta_w_DBPROD.04191426
-rwxr-xr-x  1 a118990 mkgroup-l-d  104814 Apr 19 21:38 sample_sql_w_DBPROD.04191426
-rwxr-xr-x  1 a118990 mkgroup-l-d   16766 Apr 19 21:38 sample_sys_w_DBPROD.04191426
-rwxr-xr-x  1 a118990 mkgroup-l-d 1584080 Apr 19 21:38 sample_w_txt_DBPROD.04191426

       spl -rename -o DBPROD -n POLDB1

-rwxr-xr-x  1 a118990 mkgroup-l-d 3248945 Apr 19 21:38 sample_delta_w_POLDB1.04191426
-rwxr-xr-x  1 a118990 mkgroup-l-d  104814 Apr 19 21:38 sample_sql_w_POLDB1.04191426
-rwxr-xr-x  1 a118990 mkgroup-l-d   16766 Apr 19 21:38 sample_sys_w_POLDB1.04191426
-rwxr-xr-x  1 a118990 mkgroup-l-d 1584080 Apr 19 21:38 sample_w_txt_POLDB1.04191426

And now I can easily use the file against the local POLDB1

TIPS:

You can use your windows Desktop/laptop with Linux or Windows Cygwin as a central dunning repository, holding all your dump file and measurement from all the network. On this box you will configure the graphic library.



First investigation

As soon as the sampler is started, it produces flat files. You can start to query these flats files immediatly

[befsam18:POLDEV]> spl -t

MACHINE  - ORACLE_SID : POLDEV                   Page: 1

Date              -  Monday    30th January   2006  13:09:51 Username          -  SYS
Report            -  Top 10 events for POLDEV.30130934 nline

.     Type 'spl -h' for help          Displaying 10 rows

Total number of entries in waits file :          9

                                                                   Sample
 Nbr Wait Event name                     SQL_HASH_VALUE  WAIT_TIME  Perc   Object name
--------- ------------------------------ -------------- ---------- ------- ----------------------------------------
        3 db file sequential read             511653754          2   33.33
        2 db file sequential read            4002059376          2   22.22
        1 SQL*Net message to client            79360305          0   11.11
        1 SQL*Net message to client           644478545          0   11.11
        1 db file sequential read            1308883746         11   11.11
        1 db file sequential read            2259161642          0   11.11


This tells you that there is 9 events captured by the sampler and gives you first glinch over what is happening. The sql has
h_value are taken from v$sql area and you can get the text by typing spl -hv <hash_value>
After 45 seconds we retake a sample we start to see more precise info:

/opt/smenu> spl -t

MACHINE  - ORACLE_SID : POLDEV                   Page: 1

Date              -  Monday    30th January   2006  13:11:37 Username          -  SYS
Report            -  Top 10 events for POLDEV.30130934 nline

.     Type 'spl -h' for help          Displaying 10 rows

Total number of entries in waits file :        132
                                                                   Sample
 Nbr Wait Event name                     SQL_HASH_VALUE  WAIT_TIME  Perc   Object name
--------- ------------------------------ -------------- ---------- ------- ----------------------------------------
       74 db file sequential read             511653754         97   56.06
       16 db file scattered read              511653754         21   12.12
       10 SQL*Net message to client            79360305          0    7.58
        4 control file parallel write                 0          8    3.03
        4 db file sequential read            4002059376          2    3.03
        3 SQL*Net message to client          1076837221          2    2.27
        3 db file sequential read            4028880301          2    2.27
        3 db file sequential read            2347967350          0    2.27
        2 db file sequential read                     0          2    1.52
        2 db file sequential read            1384212996          0    1.52

sql  511653754 is now emerging in our sample file. Let's have a look to it.

We can take the text :

[befsam18:POLDEV]:> spl -hv 511653754

Using default SAMPLE_WORK_DIR
Using default SPL

SELECT COUNT(pr.CUST_REGARDING_ID) FROM CUSTOMER_FACT pr WHERE pr.PRT_CUSTO_TP_ID = 1 AND SYSDATE BETWEEN pr.START_DATE AND
NVL(p r.END_DATE,TO_DATE('12-12-2050','DD-MM-YYYY')) GROUP BY pr.CUSTOMER_FACT_ID HAVING COUNT(pr.CUST_REGARDING_ID) != 1
AND pr.CUST _REGARDING_ID = :b1


We can get also from 'spl' the link between the SQL and dba_extents and get the segments over which it was waiting.

Note:

 

You need to access the original  DB for DBA_SEGMENTS, as the P1 and P2 arguments containted in the sample file refers to files and blocks numbers of the original DB. This restriction does not exits for others queries throuhg SPL and you can transfer the sample files to another location to interrogate them. Then the only restriction is to have a DB up and running version 9ir2 and above.


[befsam18:POLDEV]> spl -tseg
Using default SAMPLE_WORK_DIR
Using default SPL
MACHINE  - ORACLE_SID : POLDEV                   Page: 1

Date              -  Monday    30th January   2006  13:13:21 Username          -  SYS
Report            -  Top 10 events for POLDEV.30130934 nline

.     Type 'spl -h' for help          Displaying 10 rows

Total number of entries in waits file :        239

  Nbr                                 SQL        Time    Sample
 waits  Event name                 Hash Value    waits    Perc   Segment name                       Type
------- ------------------------- ------------ --------- ------- -------------------------------- ---------
     67 db file sequential read      511653754       159   48.54 CUST                             TABLE
     13 db file scattered read       511653754        22    9.21 CUST                             TABLE
      9 db file sequential read      511653754        32    6.69 ORG_INF_TRNS                     TABLE
      4 db file sequential read      511653754         0    1.67 NDX_ORG_INF_TRNS06               INDEX
      4 db file sequential read     4002059376         2    1.67 CUST                             TABLE
      2 db file sequential read              0         2    0.84 CUST                             TABLE
      2 db file sequential read      511653754         6    1.67 NDX_CUST02                       INDEX
      2 db file sequential read     1384212996         0    0.84 CUST                             TABLE
      2 db file sequential read     2347967350         0    0.84 CUST                             TABLE
      2 db file sequential read     2354715425         0    0.84 NDX_CUST14                       INDEX

Access the explain plan of a hash_value

 

If the monitoring is running, the explain plan for the hash value is probably still there :

[befsam18:POLDEV]> sx 511653754

Date              -  Monday    30th January   2006  13:14:30
Username          -  SYS
Show explain plan for query hash_value = 511653754

                                                                       Search
  ID PARENT OPERATION                                 COST CARDINALITY   Cols OBJECT_NOD OBJECT_NAME
---- ------ ----------------------------------- ---------- ----------- ------ ---------- ------------------------------
   0        SELECT STATEMENT                            70                  0
   1      0  FILTER                                                         0
   2      1   SORT GROUP BY                             70           1      0
   3      2    TABLE ACCESS BY INDEX ROWID              61        1836      0            CUSTOMER_FACT
   4      3     INDEX RANGE SCAN                       221       36727      1            NDX_CUSTOMER_FACT01
   4      3     INDEX RANGE SCAN                       221       36727      1            NDX_CUSTOMER_FACT01
   3      2    TABLE ACCESS BY INDEX ROWID              61        1836      0            CUSTOMER_FACT
   4      3     INDEX RANGE SCAN                       221       36727      1            NDX_CUSTOMER_FACT01
   4      3     INDEX RANGE SCAN                       221       36727      1            NDX_CUSTOMER_FACT01


Reporting with the sample

 

    There are lot of options and sort options for reporting :


     spl -d <DIR> -s <SID_ddhhmmss> -t [-hv <SQL_HASH_VALUE>]


     spl -t   [-hv <SQL_HASH_VALUE>]   : Top events, group by event, segment_name; optional limit to an hash value
     spl -t2  [-hv <SQL_HASH_VALUE>]   : Top events, group by event,p1,p2 ; optional limit to an hash value
     spl -enq                          : Show all enqueue events and enqueue type (count enqueue)
     spl -enqh                         : Show enqueue events, group by hash_value (list sql with enqueue)
     spl -tseg                         : Top events, give segment name per hash_value, when applicable
     npl -ev                           : List top 10 events in sample for all event
     spl -ev <EVENT NAME>              : List top 10 events in sample for specific event. Name must exist in v

     spl -de  <SQL_HASH_VALUE> -gr <n> : extract delta for SQL_HASH_VALUE  (give results only for one hash_Value)
                                         group sample by <n> minutes
     spl -dw  -img  -rn <ROWS>         : Display all values Sort by rows processed
     spl -dr  -img  -rn <ROWS>         : Display all values Sort by disk_reads
     spl -dg  -img  -rn <ROWS>         : Display all values Sort by buffer gets
     spl -dx  -img  -rn <ROWS>         : Display all values Sort by executions
     spl -dwx  .                       : Display all values Sort by rows/execution
     spl -drx  .                       : Display all values Sort by disk reads/executions
     spl -dgx  .                       : Display all values Sort by buffer/executions

     spl -img -w  -rn <ROWS>           : List SQL sorted by rows processed
     spl -img -r  -rn <ROWS>           : List SQL sorted by Disk reads
     spl -img -g  -rn <ROWS>           : List SQL sorted by buffer gets
     spl -img -x  -rn <ROWS>           : List SQL sorted by execution
     spl -img -wx -rn <ROWS>           : List SQL sorted by rows/exec
     spl -img -gx -rn <ROWS>           : List SQL sorted by buffer_gets/exec
     spl -img -rx -rn <ROWS>           : List SQL sorted by disk_reads/exec
     spl -img -rw -rn <ROWS>           : List SQL sorted by disk_reads/rows
     spl -img -gw -rn <ROWS>           : List SQL sorted by buffer_gets/rows

 You can pile the arguments, in this case the SQL is sorted by the first argument :

      spl -img -g -r -w -x

Default are used when

       If DIR is not given then C:/app/cygwin/opt/smenu/tmp is used
       if file extention is not givent then latest file for ORACLE_SID is used

       -s   : Oracle SID and time in the name of the sample file
       -d   : SAMPLE_DIR


  ============
  Examples :
  ============

     A)  At 10h00 you want to start a sampler of 4 hours:

            spl start -l 240                        # it will stop automatically after 240 minutes
                                              # You can stop it before with 'spl stop'

         You can have the type of events already catched :

            spl -ev

 Nbr Wait Event name                      WAIT_TIME  Perc   Object name
--------- ------------------------------ ---------- ------- ---------------------------
     9005 i/o slave wait                      32468   42.69
     1774 Queue Monitor Wait                 269290    8.41
     1101 db file sequential read                14    5.22
     1005 latch: cache buffers chains          4103    4.76
      745 db file sequential read              4973    3.53
      464 io done                                 1    2.20
      247 jobq slave wait                      7053    1.17
      218 db file sequential read                 4    1.03
      209 db file sequential read                28    0.99
      157 db file sequential read                 0    0.74


      B) You want to see the distribution for "latch: cach buffers chains". Just type spl -ev and
           cut and past the string text "latch: cache buffers chains"

            spl -ev "latch: cache buffers chains"                # double quotes are not needed

                                                                   Sample
 Nbr Wait Event name                     SQL_HASH_VALUE  WAIT_TIME  Perc   Object name
--------- ------------------------------ -------------- ---------- ------- ---------------------------
     1005 latch: cache buffers chains        1323209015       4103    4.76
       32 latch: cache buffers chains        4231337329        257    0.15
       19 latch: cache buffers chains        1323209015         87    0.09
       16 latch: cache buffers chains         815308336          4    0.08 PUBLIC./2ccf0331_Method
       10 latch: cache buffers chains         815308336          6    0.05 PUBLIC.sun/misc/Timeable
        7 latch: cache buffers chains        4231337329         16    0.03 SYS./724f4e6d_Builder


      C)  You would like to know which SQL is waiting on witch object and what it is waiting:

            spl -tseg


  Nbr                                 SQL        Time    Sample
 waits  Event name                 Hash Value    waits    Perc   Segment name                       Type
------- ------------------------- ------------ --------- ------- -------------------------------- ---------
     15 db file sequential read     2834669896         0    0.07 SYS_LOB0000049528C000073912        LOBSEGMENT
      3 db file sequential read      475547433         0    0.01 SYS_LOB0000049528C000073912        LOBSEGMENT
      3 db file sequential read     2240780814         0    0.01 SYS_LOB0000049528C000073912        LOBSEGMENT


      D)  While the sampler was running or after you asked the top 10 sorted by execution, buffer gets, row_processed, disk_reads and you want a graphic also:

               /opt/smenu>spl -x -g -r -d  -img

MACHINE poldev - ORACLE_SID : asdb                                                   Page: 1
Date              -  Thursday  13th April     2006  16:22:57
Username          -  SYS                            Report            -  List Sample Delta
From              -  2006-03-31    08:07:18  -->  08:37:45


HASH_VALUE EXECUTIONS BUFFER_GETS DISK_READS ROWS_PROCESSED
---------- ---------- ----------- ---------- --------------
1016061570    1414014     3508850         26         680822
 545138317     785587     1571174          1         785587
4010258057     226010           0          0         226007
 921443695     166367      833422         28         166362
3119962695     164954     1822700       1080         164954
3728733328     164944     1977065       1563         164302
3986006057     144903      435680       1530         143970
3208295378     144902      434505         23         144847
 211256130     109595        8777          0         109595
4080626437      87105      636040      11585          87105

Sample duration : 30Min26
Image required : calling C:/app/cygwin/opt/smenu/module3/s1/smenu_create_img.ksh -x -g -r -w -img -f C:/app/cygwin/opt/smen
_420.log
C:/app/cygwin/opt/smenu/tmp/img_2444.png done

SQL sorted by executions

We could have asked only for the execution and also why not change the colors :

[/opt/smenu/tmp]> spl -x -img

MACHINE bebrw993 - ORACLE_SID : asdb                                                   Page: 1

Date              -  Thursday  13th April     2006  16:27:24
Username          -  SYS                            Report            -  List Sample Delta
From              -  2006-03-31    08:07:18  -->  08:37:45


HASH_VALUE EXECUTIONS
---------- ----------
1016061570    1414014
 545138317     785587
4010258057     226010
 921443695     166367
3119962695     164954
3728733328     164944
3986006057     144903
3208295378     144902
 211256130     109595
4080626437      87105

Sample duration : 30Min26
Image required : calling C:/app/cygwin/opt/smenu/module3/s1/smenu_create_img.ksh -x -img -f C:/app/cygwin/opt/smenu/tmp/spl_716.log
C:/app/cygwin/opt/smenu/tmp/img_2468.png done       

SQL shown by only executions
      E)  You notice query HASH_VALUE 1016061570 is the top. You retrieve its text with:

       /smenu/opt> spl -hv
1016061570

SELECT DECODE(:B6 , 'Y', 1/ ERE.RATE , ERE.RATE) ERE_RATE FROM EXCHANGE_RATES ERE WHERE ERE.PERIOD_CODE = :B5 AND ERE.CURRENCY_CODE_
EXCHANGED_FROM = :B4 AND ERE.CURRENCY_CODE_EXCHANGED_TO = :B3 AND ERE.DATA_TYPE_CODE = :B2 AND ERE.TYPE = :B1 AND ERE.STATUS = 'A'

 Reporting For a Single SQL over time

          At 15h30 you are asked about a graphich for query HASH_VALUE 1016061570 behaviour between 08h05 and 08h35
          Information will be given grouped by 5 minutes. You must go back from the top of the file (14h00)

             spl -min 90 -de 345678 -gr 5 -rn 6 -img

          This will display starting from 08h07 (start of our sample), 6 lines, each line will sum up the activity for 5 minutes
          for rows_processed, disk_reads,


Start date : 20060331080821       Total entries in sample file:      65246
End date   : 20060331083745       SQL_HASH_VALUE              : 1016061570

TIME     ROWS_PROCESSED  DISK_READS  EXECUTIONS  BUFFER_GETS     SORTS       CPU_TIME
-------- --------------  ----------  ----------  -----------   ----------   ---------
08:09:25      1228860         1514      2719171      6667141            0 182.102951
-------#
08:14:39       437492           15       888008      2213508            0  53.916941
08:19:54        78676            0       196694       472064            0  11.484244
08:25:09            3            3            9           21            0     .00066
08:30:24        37211       23764        145642      334337            0    .001447
08:35:39            2            4            4           10            0    .000411
08:37:45            0            0            0            0            0          0

values by time for a single hash value sorted by rows processed

      F)  If you have a sample of 24h and want to have the detail of a query by hour you will type:

             spl -min 1440 -de <HASH_VALUE> -gr 60

          if you want only the last 12h summing the activity of each hour a graphic:

             spl -min 720 -de <HASH_VALUE> -gr 60 -img

      G)  Your database is slow, you started a sampler and want to retrieve all infos sorted executions  for the last 15 minutes. Note that the options
            -dx gives you the ratios also. In fact  it is sorted by execution for you choose -dx. You can sort it by buffer (-dg) and rows processed (-dw)

             spl -min 15 -dx -img:


HASH_VALUE ROWS_PROCESSED DISK_READS BUFFER_GETS EXECUTIONS  Bufs/Exec  Rows/Exec Reads/Exec   CPU_TIME
---------- -------------- ---------- ----------- ---------- ---------- ---------- ---------- ----------
1016061570         680822         26     3508850    1414014          2          0          0          0
 545138317         785587          1     1571174     785587          2          1          0          0
4010258057         226007          0           0     226010          0          1          0          0
 921443695         166362         28      833422     166367          5          1          0          0
3119962695         164954       1080     1822700     164954         11          1          0          0
.
. 


some sql sorted by rows_processed

Reporting SQL Ratios:

'Spl' offers other goodies such as usefull ratio, among the most importants are:

  •           Disk reads retrieved per row processed  (-rw)    
  •           Rows retrieved per execuction                (-wx)
  •           Buffer retrieved per execution                 (-gx)
  •           Disk reads per exec                                (-rx)
  •           Buffer gets per rows                               (-gw)
     spl -img -wx -rn <ROWS>           : List SQL sorted by rows/exec
     spl -img -gx -rn <ROWS>           : List SQL sorted by buffer_gets/exec
     spl -img -rx -rn <ROWS>           : List SQL sorted by disk_reads/exec
     spl -img -rw -rn <ROWS>           : List SQL sorted by disk_reads/rows
     spl -img -gw -rn <ROWS>           : List SQL sorted by buffer_gets/rows

All these ratio are most usefull and underline those who are make lots of disk read in order to produce a small amount of ro



    /opt/smenu> spl -gw -rw -img  -rn 5

MACHINE bebrw993 - ORACLE_SID : asdb                                                   Page: 1

Date              -  Thursday  13th April     2006  17:07:14
Username          -  SYS                            Report            -  List Sample Delta
From              -  2006-03-31    08:07:18  -->  08:37:45


HASH_VALUE BUFFER_GETS/ROWS DISK_READS/ROWS
---------- ---------------- ---------------
1853727248         56225045          645312
 815308336         45039002          126171
 985792044           169361               0
1765180702            28500            3893
3720083330            27219            7777

Sample duration : 30Min26
Image required : calling C:/app/cygwin/opt/smenu/module3/s1/smenu_create_img.ksh -gw -rw -img -rn 5 -f C:/app/cygwin/opt/sme
pl_3392.log
C:/app/cygwin/opt/smenu/tmp/img_1252.png done

RAtio for some SQL


                                                    B. Polarski@yahoo.com                 Last update 20 April 2006