Histogram
(12c)
----------------
A histogram is a column statistics that provide the
data distribution on a particular column.
Histogram type is chosen based on the NDV (number of
distinct values) on that column.
Diffent type for Historgrams are
1)
Frequency & Top Frequency
2)
Height based (Legacy)
3)
Hybrid
Purpose
By
default the optimizer assumes a uniform distribution of rows across the
distinct values in a column. For columns that contain data skew (a nonuniform distribution of
data within the column), a histogram enables the optimizer to generate accurate cardinality estimates for filter and join
predicates that involve these columns.For example, a California-based book store ships 95% of the books to California, 4% to Oregon, and 1% to Nevada. The book orders table has 300,000 rows. A table column stores the state to which orders are shipped. A user queries the number of books shipped to Oregon. Without a histogram, the optimizer assumes an even distribution of 300000/3 (the NDV is 3), estimating cardinality at 100,000 rows. With this estimate, the optimizer chooses afull table scan. With a histogram, the optimizer calculates that 4% of the books are shipped to Oregon, and chooses an index scan.
SQL> select TABLE_NAME, COLUMN_NAME,
ENDPOINT_NUMBER, ENDPOINT_VALUE , ENDPOINT_ACTUAL_VALUE, ENDPOINT_REPEAT_COUNT,
scope from user_tab_histograms where table_name = 'BOOK_SALES' and column_name
= 'SOLD_TO_COUNTRY_ID' order by 3;
TABLE_NAME
COLUMN_NAME
ENDPOINT_NUMBER ENDPOINT_VALUE
ENDPOINT_ACTUAL_VALU ENDPOINT_REPEAT_COUNT SCOPE
-------------------- --------------------
--------------- ------------------------ --------------------
--------------------- -------
BOOK_SALES
SOLD_TO_COUNTRY_ID
18000 1001 1001 0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
30000 1002
1002
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID 40000 1003 1003 0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
50000 1004
1004
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID 65000 1005 1005 0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
75000 1006
1006
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
80000 1007
1007
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
90000 1008
1008
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
93000 1009
1009
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
95000 1010
1010 0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
100000 1011
1011
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
130000 1012
1012 0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
160000 1013
1013
0 SHARED
BOOK_SALES
SOLD_TO_COUNTRY_ID
160001 1014
1014 0 SHARED
SQL> select SOLD_TO_COUNTRY_ID, count(*) from
BOOK_SALES group by SOLD_TO_COUNTRY_ID order by 1;
SOLD_TO_COUNTRY_ID
COUNT(*)
------------------ ----------
1001 18000
1002 12000
1003 10000
1004 10000
1005 15000
1006 10000
1007 5000
1008 10000
1009 3000
1010 2000
1011 5000
1012 30000
1013 30000
1014 1
14 rows selected.
Popular values
a)
Optimizer checks whether it a popular values by
first checking whether it is a end point value of a bucket. If so the optimizer
subtracts the end point number for the previous bucket from the end point
number of the current bucket, if the value is greater than 1 then it is
popular.
Eg. (popular)
SQL> select SOLD_TO_COUNTRY_ID, count(*)
from BOOK_SALES where SOLD_TO_COUNTRY_ID = 1010 group by SOLD_TO_COUNTRY_ID;
SOLD_TO_COUNTRY_ID COUNT(*)
------------------ ----------
1010 2000
Optimizer uses the below formula for
calculating cardinality for popular values.
Cardinally of popular values = number of rows
in table * (number of end point spanned by this value / total num of
endpoints).
SQL> select count(*) from BOOK_SALES;
COUNT(*)
----------
160001
Num. of rows in table = 160001
Num. Of end points spanned by this value (1010)
is = (95000 – 93000) = 2000
Total number of end points is = 160001
then
C = 160001 * ( 2000/160001) = 2000
Thus the optimizer derives the cardinlity.
Eg. Non popular
SQL> select SOLD_TO_COUNTRY_ID, count(*)
from BOOK_SALES where SOLD_TO_COUNTRY_ID = 1014 group by SOLD_TO_COUNTRY_ID;
SOLD_TO_COUNTRY_ID COUNT(*)
------------------ ----------
1014 1
Cardinality estimate of non popular = number of
rows in table * density
SQL> select table_name, COLUMN_NAME,
NUM_DISTINCT, DENSITY, HISTOGRAM from user_tab_col_statistics where table_name
= 'BOOK_SALES' and column_name = 'SOLD_TO_COUNTRY_ID';
TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------- --------------------
------------ ---------- ---------------
BOOK_SALES SOLD_TO_COUNTRY_ID 14 .000003125 FREQUENCY
Optimizers calculates density using the factors
such as number of buckets and NDV, density is expressed in decimal number
between
0 and 1
, value close to 1 indicates that optimizer expects many rows to be returned
and close to 0 indicates optimizer to expect few rows to be returned.
C = 160001 * .000003125
SQL> select 160001*DENSITY from
user_tab_col_statistics where table_name = 'BOOK_SALES' and column_name =
'SOLD_TO_COUNTRY_ID';
160001*DENSITY
--------------
.5
Creating top frequency
IF a small number of values occupies most of
the rows, then creating a frequency histogram on this small set of values is
useful
Even when the NDV is greater than n (number of
buckets). To create better quality histogram on popular values the optimizer ignores
The non popular values. The database create top
when following conditions are met.
a)
The data set NDV should be more than number of
buckets specified.
b)
The percentage of rows occupied by the top n
frequent values should be >= threashld p where p = (1-(1/n))*100
N=
number of buckets.
c)
The estimate parameter is set to AUTO_SAMPLE_SIZE in
dbms_stats.
Eg.
SQL>
select SOLD_TO_COUNTRY_ID, count(*) from book_sales group by SOLD_TO_COUNTRY_ID
ORDER BY 1;
SOLD_TO_COUNTRY_ID COUNT(*)
------------------
----------
1001 18000
1002 12000
1003 10000
1004 10000
1005 15000
1006 10000
1007 5000
1008 10000
1009 3000
1010 2000
1011 5000
1012 30000
1013 30000
1014 1
14 rows
selected.
In the above the number of distinct value is
14, lets create a histogram by specifying 10 buckets.
SQL> exec
dbms_stats.gather_table_stats('MANZOOR','BOOK_SALES', method_opt => 'FOR
COLUMNS SOLD_TO_COUNTRY_ID SIZE 10');
PL/SQL procedure successfully completed.
SQL> select TABLE_NAME, COLUMN_NAME, ENDPOINT_NUMBER,
ENDPOINT_VALUE , ENDPOINT_ACTUAL_VALUE, ENDPOINT_REPEAT_COUNT, scope from
user_tab_histograms where table_name = 'BOOK_SALES' and column_name =
'SOLD_TO_COUNTRY_ID' order by 3;
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
ENDPOINT_REPEAT_COUNT SCOPE
-------------------- --------------------
--------------- ------------------------ --------------------
--------------------- -------
BOOK_SALES SOLD_TO_COUNTRY_ID 18000 1001 1001 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 30000 1002 1002 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 40000 1003 1003 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 50000 1004 1004 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 65000 1005 1005 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 75000 1006 1006 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 85000 1008 1008 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 115000 1012 1012 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 145000 1013 1013 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 145001 1014 1014 0 SHARED
10 rows selected.
SQL> select table_name, COLUMN_NAME,
NUM_DISTINCT, DENSITY, HISTOGRAM from user_tab_col_statistics where table_name
= 'BOOK_SALES' and column_name = 'SOLD_TO_COUNTRY_ID';
TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
-------------------- --------------------
------------ ---------- ---------------
BOOK_SALES SOLD_TO_COUNTRY_ID 14 .000003125 TOP-FREQUENCY
As per above we could see that the TOP-FREQUNCY
histogram has been created by using the popular values. The see whether the
Conditions are met.
1)
NDV should > number of buckets = TRUE
i.e ( NDV is 14 and number of
buckets specified is 10).
2)
Percentage of rows occupied by top frequency values
should be greater than or equal to threashold p ( p = (1-(1/n))*100)
p=(1-1/10))*100
p=90
As per the formula the percentage of rows
occupied by the top frequenct values should be >= 90% . Lets check.
SQL> select
(a.top_frequent_sum/b.total_sum)*100 "Top Frequenct values %" from
( select
sum(full_count) top_frequent_sum from
(select count(*) full_count from book_sales
where SOLD_TO_COUNTRY_ID in
(select ENDPOINT_VALUE from user_tab_histograms where table_name =
'BOOK_SALES' and column_name = 'SOLD_TO_COUNTRY_ID')
)) a,
(select
count(*) total_sum from book_sales ) b
/
Top Frequenct values %
----------------------
90.6250586
As we could see that the top frequecnt values
occupied 90.62 % of rows in the table, hence this condition is satisfied.
3)
Estimate percent should be AUTO_SAMPLE_SIZE, we have
not specified it, it has taken the default values. Lets check the
Default
value.
SQL>
var A VARCHAR2(30);
SQL>
exec :A := dbms_stats.get_prefs (PNAME => 'ESTIMATE_PERCENT', OWNNAME =>
'MANZOOR', TABNAME => 'BOOK_SALES');
PL/SQL
procedure successfully completed.
SQL>
print :A
A
------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE
Condition
satisified.
HeightBased
Histogram (Legacy) in 12c
The
default value of number of buckets is 254 when it is not specified using
method_opt parameter. Before 12c oracle
Creates
a height based histrogram for which the NDV is > number of buckets. This
type of histrogram was useful for range
Predicates
and equality predicats on values that appear as endpoint in at least 2 buckets.
Eg.
SQL>
exec dbms_stats.delete_table_stats('MANZOOR','BOOK_SALES');
PL/SQL
procedure successfully completed.
To
simulate Oracle Database 11g behavior, which is necessary to create a
height-based histogram, set estimate_percent to a nondefault value. If you
specify a nondefault percentage, then the database creates frequency or
height-balanced histograms.
SQL>
exec dbms_stats.gather_table_stats
('MANZOOR','BOOK_SALES',METHOD_OPT
=> 'FOR COLUMNS SOLD_TO_COUNTRY_ID SIZE 10', ESTIMATE_PERCENT => 100);
PL/SQL
procedure successfully completed.
SQL>
select TABLE_NAME, COLUMN_NAME, ENDPOINT_NUMBER, ENDPOINT_VALUE ,
ENDPOINT_ACTUAL_VALUE, ENDPOINT_REPEAT_COUNT, scope from user_tab_histograms
where table_name = 'BOOK_SALES' and column_name = 'SOLD_TO_COUNTRY_ID' order by
3;
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
ENDPOINT_ACTUAL_VALU ENDPOINT_REPEAT_COUNT SCOPE
--------------------
-------------------- --------------- -------------- --------------------
--------------------- -------
BOOK_SALES SOLD_TO_COUNTRY_ID 1 1001 1001 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 2 1003 1003 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 3 1004 1004 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 4 1005 1005 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 5 1007 1007 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 6 1011 1011 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 8 1012 1012 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 9 1013 1013 0 SHARED
BOOK_SALES SOLD_TO_COUNTRY_ID 10 1014 1014 0 SHARED
9
rows selected.
SQL> select table_name, COLUMN_NAME, NUM_DISTINCT,
DENSITY, HISTOGRAM from user_tab_col_statistics where table_name = 'BOOK_SALES'
and column_name = 'SOLD_TO_COUNTRY_ID';
TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM
--------------------
-------------------- ------------ ---------- ---------------
BOOK_SALES SOLD_TO_COUNTRY_ID 14 .098844776 HEIGHT BALANCED
SQL>
select SOLD_TO_COUNTRY_ID, count(*) from book_sales group by SOLD_TO_COUNTRY_ID
ORDER BY 1;
SOLD_TO_COUNTRY_ID COUNT(*)
------------------
----------
1001 18000
1002 12000
1003 10000
1004 10000
1005 15000
1006 10000
1007 5000
1008 10000
1009 3000
1010 2000
1011 5000
1012 30000
1013 30000
1014 1
CASINO 100% RANGE BONUS CODES - JTM Hub
ReplyDeleteThe Casino is 서울특별 출장마사지 one of the most popular destinations around 이천 출장안마 in 통영 출장샵 the United States. You can't miss the excitement of gaming excitement at the 원주 출장샵 Casino in 김해 출장안마