PostgreSQL Wait Events: What they mean and How to tune

In this article I am grouping likewise wait events

The LWLock:BufferIO and IO:DataFileRead:

LWLock:BufferIO:

  • This wait event occurs when a session is waiting for a lightweight lock (LWLock) on a buffer I/O operation.
  • It typically indicates contention for accessing shared buffers or waiting for I/O operations to complete.

IO:DataFileRead:

  • This wait event occurs when a session is waiting for a read operation from a data file (e.g., reading a table or index block from disk).
  • It indicates that the requested data is not in the shared buffers and must be read from disk.

These wait events in PostgreSQL (or Amazon RDS PostgreSQL) indicate that sessions are waiting for I/O operations related to reading data from disk into the shared buffers. These wait events are common in database workloads, but if they occur frequently or for long durations, they can indicate performance bottlenecks. Here’s how to diagnose and address these wait events.

Cause:

1)We see these wait events mainly when there are inefficient queries, performing full table scan.

2)When there is bloating on tables. Check if there are any dead tuples, and the table require vacuum

3)When there are no appropriate indexes.

4)Finally when shared_buffers is too small, and it is unable to help with no. of operations.

How to Identify this issue.

1)In AWS RDS click on monitoring tab, search for DiskQueueDepth, here you will be seeing high count, usually above 15 or so..

2)Also in performance insight in monitoring tab we can find read and write operations per seconds, we will see spike in read operation per second.

3)If your postgresql is on VM, then you can verify the IOSTATS, and we see spike in operations.


Tombstones

Unlike RDBMS databases, Cassandra will not do in place update, delete or any modification, deletes are also kind of inserts and it will follow the rule of Last Write Wins(LWW). The data is always written to the immutable files called SSTABLES, In the context of Cassandra, a tombstone is specific data stored alongside standard data. A delete does nothing more than insert a tombstone.

Tombstones will not be cleared until certain time, which can controlled by the parameter gc_grace_seconds in cassandra.yaml file

Why tombstones will not be cleared until certain time?

As it is distributed database, suppose a delete happens with a Replication Factor of 3 and Consistency Quorum, before acknowledging to the client node, Cassandra will make sure that data is written to 2 out of 3 nodes, however as RF is 3 the data will be written to 3rd node also eventually. Even if the 3rd node is down by the time deletes were happening, Cassandra will consider this as successful transaction. Because it satisfied the consistency level. So once the third node is up there will be no deleted data in that node. So if the user try to read, the next read involving that node will be an ambiguous read since there is no way to determine what is correct: return an empty response or return the data? Cassandra would always consider that returning the data is the correct thing to do, so deletes would often lead to reappearing data, called “zombie” or “ghost” and their behavior would be unpredictable.

So as a workaround for this Cassandra introduced a parameter gc_grace_period, the default value for this parameter is 1064000(10 days), when there are huge deletes it is always good to run repair before the gc_grace_period.

Tombstones will cause performance issues, Cassandra will read the tombstones while reading the actual data. There are two parameters in cassandra.yaml that will help to log and abort when the query exceeds scanning certain no. of tomstones.

tombstone_warn_threshold (default: 1000): if the number of tombstones scanned by a query exceeds this number Cassandra will log a warning in system.log
(which will likely be propagating to your monitoring system and send you an alert).

tombstone_failure_threshold (default: 100000): if the number of tombstones scanned by a query exceeds this number Cassandra will abort the query.
The is a mechanism to prevent one or more nodes from running out of memory and crashing.

Ways to clear tombstones

Tombstones will be cleaned regularly, over time, by compactions(Please verify compaction article to know when will a compaction triggers and about types of compaction). In some cases because of shadowed data or due to a bulk delete, tombstones are not cleaned quickly enough, and can cause queries to fail (because of tombstone_fail_threshold) or nodes to be overloaded (because of the processing overhead). In these cases, there are a few ways to clean them aggressively (at the cost of using more system resources while the cleaning is being done).
 

A major compaction on the table, one node at a time will compact all the sstables together into one big sstable, So in the case of SizeTieredCompactionStrategy, you can use the ‘-s’ option when doing the compaction, but with TimeWindowCompaction -s will not help and all the sstables will be combined to one big sstable, so it is advised to then stop DSE and use sstablesplit to split the table into manageably-sized sstables.  

A minor compaction( less invasive approach) is to tune the compaction sub-properties for the table’s compaction strategy, tombstone_threshold(Default value is 0.2, If the ratio exceeds this limit, Cassandra starts compaction on that table alone, to purge the tombstones) and tombstone_compaction_interval(Default value is 1 day(86400). The minimum number of seconds after an SSTable is created before Cassandra considers the SSTable for tombstone compaction. Cassandra performs tombstone compaction on an SSTable if the table exceeds the tombstone_threshold ratio) . These parameters can altered in table definition to clean tombstones more aggressively. Also, in SizeTieredCompactionStrategy, the min_threshold(the default value is 4, The minimum number of SSTables to trigger a minor compaction) for compaction can be reduced to compact sstables more quickly.

One more approach is to use “nodetool garbagecollect” then either perform single-sstable compactions on the sstables, or run garbagecollect a second time, to clean out partition-level tombstones. On the first run, garbagecollect clears out the deleted (or “shadowed”) data that tombstones mark. On a second run, or during later compactions, if there’s a tombstone that deletes an entire partition, and that partition is not in other sstables, the tombstone can be safely cleaned out. (For tombstones that shadow individual rows or cells, it’s not as easy to drop them, and they won’t be cleaned out by garbagecollect or a single-sstable compaction.)

‘nodetool garbagecollect’ clears deleted data from sstables. —–What is actually a deleted data ?

suppose you insert bunch of rows and do nodetool flush, again you delete few records of the same inserted rows and you flush again,
now you have 2 sstables, one with tombstones and other with data that is shadowed by tombstones.

Garbagecollect actually helps to clear deleted data in first sstable that is no longer needed because it is shadowed by tombstones in the second sstable.
Which in turn helps to clear tombstones in second sstable. “nodetool garbagecollect” does not generally drop the tombstones even after gc_grace_seconds,
it can be used to reclaim space, or to make it more likely that tombstones will be dropped in subsequent compactions.

Troubleshooting network issues

Troubleshooting network connectivity issues like connection refused or connection is hanging due to port issues.

Step 1.

Verify ping time from both ends(source and destination)

ping 10.10.10.11

Ping time of 100 ms and below are average for most broadband connections. So there will not be any lag.

While a ping of 150 ms or more may not be helpful and there will be a lag.

Step 2.

Verify whether the port is listening or not, using netstat or nc.

netstat -ntl|grep 7199

tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN

Or can verified as below

nc -lk 7199

nc: Address already in use

When the port is open and when none of the processes is listening then the above command(nc -lk) will start listening on a specific port.

Step 3.

Check if the port is open or not for the remote server. Login to source server and try to connect remote server using any one of the following methods(telnet, nc, and nmap).

telnet 10.10.10.12 7199

Trying 10.10.10.12…

Connected to 10.10.10.12.

Escape character is ‘^]’.

^] — ctrl+ ]

telnet> quit

nc -zvw3 10.10.10.12 7199

Connection to 10.10.10.12 7199 port [tcp/*] succeeded!

nmap -sT 10.10.10.12 -p 7199 -Pn

Output should show as open, should not show as filtered or closed .

Step 4.

Acknowledgment between the server’s ports can be verified using ngrep or tcpdump.

For example

sudo tcpdump tcp port 7199

Or

tcp port 7199 -w trace7199.pcap –to save in file

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

21:07:35.356796 IP ip-10-10-10-11.srv101.dsinternal.org.47608 > ip-10.10.10.12.srv101.dsinternal.org.7199: Flags [S], seq 1054145878, win 29200, options [mss 1460,sackOK,TS val 989707819 ecr 0,nop,wscale 9], length 0

21:07:35.356836 IP ip-10.10.10.12.srv101.dsinternal.org.7199 > ip-10.10.10.11.srv101.dsinternal.org.47608: Flags [S.], seq 3203231434, ack 1054145879, win 28960, options [mss 1460,sackOK,TS val 989707126 ecr 989707819,nop,wscale 9], length 0

Note: Check the length, if the length is 0, then nothing is getting pushed.

Or

sudo ngrep port 7199

T 10.10.10.12:46996 -> 10.10.10.11:7199

T 10.10.10.11:7199 -> 10.10.10.12:46996

Step 5.

Get confirmation from the network team whether the outbound or short time living port ranges are open or not, for example on Linux flavor we can verify range using the below command.

cat /proc/sys/net/ipv4/ip_local_port_range

32768 60999

Step 6.

Verify the outbound port’s connectivity by passing text between servers as below.

Server 1

Connect to local outbound port

cassandra@ip-10-10-10-11:~$ sudo nc -l 32768

Server 2

Connect to server 1 outbound port.

cassandra@10-10-10-12:~$ nc 10.10.10.11 32768

Now type the text on any server that should be viewed on another server terminal.

Datastax outbound port verification

Note:Please test vice-versa as well to verify the outbound port’s remote connectivity.

Step 7.

Check the firewall rules using iptables.

10.10.10.11:~$ sudo iptables -L

Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Step 8.

Finally, confirm with the networking team that they need to check inter-subnet firewall or firewall appliance policies that regulate traffic going to different subnets.

Specific to JMX Port 7199 or if nodetool is not working across the servers, then verify the following steps

Step 1.

Make sure whether the following parameters are in the cassandra-env.sh or in jvm.options file

-Dcassandra.jmx.remote.port=7199

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.ssl=false

-Djava.rmi.server.hostname=10.10.10.11

Step 2.

Download the jmxterm application from any website or from (https://docs.cyclopsgroup.org/jmxterm) then test port 7199 as below

10-10-10-11:~$ java -jar jmxterm-1.0.2-uber.jar

$>open localhost:7199

#Connection to localhost:7199 is opened

$>exit

#bye

10-10-10-11:~$ java -jar jmxterm-1.0.2-uber.jar

$>open 10.10.10.12:7199

#Connection to 10.10.10.12:7199 is opened

Note:If the connection is hanging or the connection is refused then there is a chance for the above things, one is either 7199 or outbound ports are not open or might be due to the above parameters in step1.

Cassandra Memory

Cassandra memory consist of Heap memory + Off Heap Memory

Off-Heap Memory(Native Memory or Direct memory, which is managed by OS)

Heap Memory(Which is managed by Java)

Following are the part of Off-Heap Memory 

Partition key Cache
Row cache
Chunk cache
Memtable space   

Note:Depending upon memtable_allocation_type memtable, space can be in off-heap or in heap memory

Java take cares of heap memory and we have some control over heap space, Offheap or native memory is controlled by the OS(Operating System).

Here are the few parameters where we can control the heap space and off heap space

We can set the maximum heap size in the jvm.option file on a single node. For example:

-Xms48G
-Xmx48G

Set the min (-Xms) and max (-Xmx) heap sizes to the same value to avoid GC pauses during resize, and to lock the heap in memory on startup.

We can set memtable size and threshold for the flush of memtable data in cassandra.yaml on single node. For example

memtable_cleanup_threshold: 0.50   -- Once it reaches 50% the memtable will be flushed from memory            
memtable_space_in_mb: 4096 --- Default heap memory is 1/4th of the heap

We can manage off heap space or native memory or max direct memory in jvm.options or cassandra-env.sh, for example in jvm.options we can set as below.

-XX:MaxDirectMemorySize=1M

Note: The default value when MAX_DIRECT_MEMORY(XX:MaxDirectMemorySize) is not set is (MAX_SYSTEM_MEMORY – MAX_HEAP_SIZE) / 2, When this property file_cache_size_in_mb. is not set, the chunk cache size is set to 1/2 of the maximum direct memory,If no maximum direct memory is set, then the cache size will be set to ⅓ of the system memory

Scripts for troubleshooting OOM issues

The following will help to check if bloomfilter is taking huge space

for i in {1..10}; do nodetool sjk mx -mg -b 'org.apache.cassandra.metrics:type=Table,name=BloomFilterOffHeapMemoryUsed' -f Value; date ; sleep 10; done

The following will help the native memory is getting used

for i in {1..10}; do nodetool sjk mxdump -q "org.apache.cassandra.metrics:type=NativeMemory,name=*"; date ; sleep 10; done > $(hostname -i)_NativeMemory

The following will help for leaks detection.

for i in {1..20}; do nodetool leaksdetection; sleep 30; done > $(hostname -i)_leaksdetection.txt

Try to reduce heap and native memory to control OOM issues.

Optimizer Access Paths, PostgreSQL vs Oracle

This article is about the access paths in PostgreSQL, and I am trying to relate them with Oracle access paths, which helps Oracle DBA’s to understand access paths in PostgreSQL and vice versa. To be clear, I am not comparing the algorithms behind these access paths in the article.

Sequential Scan in PostgreSQL / Table Access Full in Oracle

When all the rows or maximum rows are accessed in a table optimizer choses Sequential scan in PostgreSQL, similarly in Oracle the name of this access is called “full table scan” or “table access full”

PostgreSQL execution plan

DEVL=# explain analyze select * from sample;

                                                   QUERY PLAN                                                   

—————————————————————————————————————–

 Seq Scan on sample  (cost=0.00..3481.36 rows=127936 width=428) (actual time=0.010..10.449 rows=127936 loops=1)

 Planning Time: 0.375 ms

 Execution Time: 15.537 ms

Oracle execution plan

SQL> select plan_table_output from table(dbms_xplan.display_cursor(null,null,’TYPICAL’));

PLAN_TABLE_OUTPUT

——————————————————————————–

SQL_ID 99kss4rwnchth, child number 1

————————————-

select * from sample

Plan hash value: 1092880916

—————————————————————————–

| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |

—————————————————————————–

|   0 | SELECT STATEMENT  |           |             |             |         557 (100)|               |

|   1 |  TABLE ACCESS FULL| SAMPLE |    127K|    13M|    557   (1)| 00:00:01 |

Index Only Scan

From any indexed column, when we select only a few rows or less percentage of rows, for instance, less than 5%, the optimizer chooses ‘Index scan’.

In the next three examples, we are selecting only the indexed column, so the optimizer will choose ‘Index only scan’ which pulls rows or tuples directly from the index without going to the table.

Example 1

PostgreSQL

DEVL=# explain analyze  select sample_surrogate_id from sample where sample_surrogate_id <19;

                                                            QUERY PLAN                                                            

———————————————————————————————————————————–

 Index Only Scan using surrogate_idx on sample  (cost=0.42..36.47 rows=1146 width=4) (actual time=0.012..0.128 rows=1152 loops=1)

   Index Cond: (sample_surrogate_id < 19)

   Heap Fetches: 0

 Planning Time: 0.127 ms

 Execution Time: 0.189 ms

Oracle

SQL> select plan_table_output from table(dbms_xplan.display_cursor(null,null,’TYPICAL’));

PLAN_TABLE_OUTPUT

——————————————————————————–

SQL_ID 9vwjzy7aaxqt6, child number 0

————————————-

select sample_surrogate_id from sample where sample_surrogate_id <19

Plan hash value: 2013376945

——————————————————————————–

| Id  | Operation               | Name                 | Rows  | Bytes | Cost (%CPU)| Time

 |

PLAN_TABLE_OUTPUT

——————————————————————————–

——————————————————————————–

|   0 | SELECT STATEMENT |                         |             |             |     2 (100)|

|*  1 |  INDEX RANGE SCAN| SURROGATE_IDX |    64 |   320 |     2   (0)| 00:00:01

 |

——————————————————————————–

PLAN_TABLE_OUTPUT

——————————————————————————–

Predicate Information (identified by operation id):

—————————————————

   1 – access(“SAMPLE_SURROGATE_ID”<19)

Example 2

PostgreSQL

DEVL=# explain analyze select sample_surrogate_id from sample order by sample_surrogate_id;

                                                                QUERY PLAN                                                                

——————————————————————————————————————————————

 Index Only Scan using surrogate_idx on sample  (cost=0.42..3651.46 rows=127936 width=4) (actual time=0.079..14.143 rows=127936 loops=1)

   Heap Fetches: 0

 Planning Time: 0.103 ms

 Execution Time: 19.231 ms

Oracle

SQL> select plan_table_output from table(dbms_xplan.display_cursor(null,null,’TYPICAL’));

PLAN_TABLE_OUTPUT——————————————————————————–

SQL_ID 9sn2jw1zvhnmq, child number 0

————————————-

select sample_surrogate_id from sample order by sample_surrogate_id

Plan hash value: 3286072040

——————————————————————————–

| Id  | Operation               | Name                 | Rows  | Bytes | Cost (%CPU)| Time

 |

PLAN_TABLE_OUTPUT

——————————————————————————–

——————————————————————————–

|   0 | SELECT STATEMENT |                         |             |             |   279 (100)|

 |

|   1 |  INDEX FULL SCAN | SURROGATE_IDX |   127K|   624K|   279   (1)| 00:00:01

 |

——————————————————————————–

PLAN_TABLE_OUTPUT

——————————————————————————–

Example 3

PostgreSQL

DEVL=# explain analyze select sample_surrogate_id from sample where sample_surrogate_id=1;

                                                          QUERY PLAN                                                         

——————————————————————————————————————————

 Index Only Scan using surrogate_idx on sample  (cost=0.42..5.54 rows=64 width=4) (actual time=0.018..0.026 rows=64 loops=1)

   Index Cond: (sample_surrogate_id = 1)

   Heap Fetches: 0

 Planning Time: 0.102 ms

 Execution Time: 0.041 ms

(5 rows)

Oracle

SQL> select plan_table_output from table(dbms_xplan.display_cursor(null,null,’TYPICAL’));

PLAN_TABLE_OUTPUT

——————————————————————————–

SQL_ID a9j5jtsh0ssck, child number 0

————————————-

select sample_surrogate_id from sample where sample_surrogate_id=1

Plan hash value: 2013376945

——————————————————————————–

| Id  | Operation               | Name                 | Rows  | Bytes | Cost (%CPU)| Time

|

PLAN_TABLE_OUTPUT

——————————————————————————–

——————————————————————————–

|   0 | SELECT STATEMENT |                         |             |             |     1 (100)|

 |

|*  1 |  INDEX RANGE SCAN| SURROGATE_IDX |    64 |   320 |     1   (0)| 00:00:01

 |

——————————————————————————–

PLAN_TABLE_OUTPUT

——————————————————————————–

Predicate Information (identified by operation id):

—————————————————

   1 – access(“SAMPLE_SURROGATE_ID”=1)

Bitmap Heap Scan/Table Access by Index Rowid(2 Step plans)

In this example, we are selecting all the columns not just indexed columns, so there will be two steps in the following plan, the first step will visit the index to find the locations of rows matching the index condition , and then the upper plan actually fetches those rows from the table.

The “bitmap” mentioned in the plan is the mechanism that does the sorting. Actually, PostgreSQL doesn’t support creating bitmap indexes manually like Oracle.

DEVL=# explain analyze select * from sample where sample_surrogate_id=1;

                                                       QUERY PLAN                                                      

————————————————————————————————————————

 Bitmap Heap Scan on sample  (cost=4.91..228.98 rows=64 width=428) (actual time=0.029..0.099 rows=64 loops=1)

   Recheck Cond: (sample_surrogate_id = 1)

   Heap Blocks: exact=64

   ->  Bitmap Index Scan on surrogate_idx  (cost=0.00..4.90 rows=64 width=0) (actual time=0.018..0.018 rows=64 loops=1)

         Index Cond: (sample_surrogate_id = 1)

 Planning Time: 0.145 ms

 Execution Time: 0.127 ms

SQL> select plan_table_output from table(dbms_xplan.display_cursor(null,null,’TYPICAL’));

PLAN_TABLE_OUTPUT

——————————————————————————–

SQL_ID 19rfbsk5hh17r, child number 0

————————————-

select * from sample where sample_surrogate_id=1

Plan hash value: 210076746

——————————————————————————–

———————

| Id  | Operation                                                   | Name                | Rows  | Bytes | Co

st (%CPU)| Time     |

PLAN_TABLE_OUTPUT

——————————————————————————–

——————————————————————————–

———————

|   0 | SELECT STATEMENT                                |                             |             |             |

 65 (100)|                |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| SAMPLE               |         64 |  6976 |

 65   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                            | SURROGATE_IDX |  64 |          |

PLAN_TABLE_OUTPUT

——————————————————————————–

  1   (0)| 00:00:01 |

——————————————————————————–

———————

Predicate Information (identified by operation id):

—————————————————

   2 – access(“SAMPLE_SURROGATE_ID”=1)

Tablespace PostgreSQL vs Oracle

Tablespaces in PostgreSQL helps to define or link the directory on the physical disk where the files of the objects(tables) are stored.
In Oracle, tablespaces are not links to the directory, tablespaces linked to physical files, and are allocated to a schema in a database so that all the tables created in that schema are stored in the allocated tablespace or we can mention a tablespace name while creating a table. The tablespace in PostgreSQL is assigned to a database or can be to a table and also one tablespace can be allocated to one or more databases.

Simple practical to show how the tablespace is created and how it is assigned to a database, their physical structures etc..

postgres=# create tablespace development location ‘/Library/PostgreSQL/data’;

CREATE TABLESPACE

postgres=# \db

                List of tablespaces

    Name     |  Owner   |         Location         

————-+———-+————————–

 development | postgres | /mnt/PostgreSQL/data

 pg_default  | postgres | 

 pg_global   | postgres | 

(3 rows)

When ever a new tablespace is created a link is also created in the following path

Datadirectory/pg_tblspc linking to actual location /mnt/PostgreSQL/data

cd /Library/PostgreSQL/12/data/pg_tblspc/

postgres$ ls -ltr

total 0

lrwx——  1 postgres  daemon  24 Nov 15 12:46 16582 -> /mnt/PostgreSQL/data

postgres=# create database DEVLDB tablespace development;

CREATE DATABASE

postgres=# \l+

                                                                List of databases

   Name    |  Owner   | Encoding | Collate | Ctype |   Access privileges   |   Size   | Tablespace  |                Description                 

———–+———-+———-+———+——-+———————–+———-+————-+——————————————–

  devldb    | postgres | UTF8     | C       | C     |                       | 8025 kB  | development | 

 postgres  | postgres | UTF8     | C       | C     |                       | 8033 kB  | pg_default  | default administrative connection database

 template0 | postgres | UTF8     | C       | C     | =c/postgres          +| 7881 kB  | pg_default  | unmodifiable empty database

           |          |          |         |       | postgres=CTc/postgres |          |             | 

 template1 | postgres | UTF8     | C       | C     | =c/postgres          +| 8025 kB  | pg_default  | default template for new databases

           |          |          |         |       | postgres=CTc/postgres |          |             | 

(6 rows)

postgres=# \c devldb

You are now connected to database “devldb” as user “postgres”.

devldb=# create table t1(name char(30));

CREATE TABLE

devldb=# select table_catalog,table_schema,table_name from information_schema.tables where table_name=’t1′;

 table_catalog | table_schema | table_name 

—————+————–+————

 devldb        | public       | t1

(1 row)

So the above table is created in devldb database , and the file related to the table should be stored in /Library/PostgreSQL/data .

We can check the path of the physical file of a table using below function

devldb=# SELECT pg_relation_filepath(‘t1’);

            pg_relation_filepath             

———————————————

 pg_tblspc/16582/PG_12_201909212/16583/16584

postgres$ cd /mnt/PostgreSQL/data/PG_12_201909212/16583

16583 postgres$ ls -l 16584

-rw——-  1 postgres  daemon  0 Nov 15 12:49 16584

Schemas PostgreSQL vs Oracle

Users are not schemas in PostgreSQL. Schemas in PostgreSQL are different entities, a schema in PostgreSQL sits under a database and a schema is always assigned to a user.We can assign more than one schema to a user.In PostgreSQL, it is a good practice to create a schema for each user with the same name as that user, so that each user will access their own schema.

Schemas in Oracle are similar to a user or we can say they are synonymous, depending upon the purpose the words are interchanged between schema and user. One advantage in PostgreSQL compared to Oracle is one can assign grant access at the schema level to the user, which is not possible in Oracle. For example, in Oracle, if user A wants to access complete tables in schema or user B, one should have to write a code to give access at the individual table level. If there are 80 objects under that schema, then the grant loop will repeat 80 times.

However, in both Oracle and PostgreSQL schemas are used to avoid conflict of the table names when different applications are using the same database.

Would like to show simple practical on users, schemas, search path and schema level grant access in PostgreSQL for better understanding

postgres=# \c DEVL

You are now connected to database “DEVL” as user “postgres”.   -postgres is super user at cluster level

\dn Helps to show you the relationship between schema and user

DEVL=# \dn

  List of schemas

  Name  |  Owner   

——–+———-

 public | postgres

(1 row)

In DEVL database, creating user a1, and schemas with name a1 and a2, and also assigning both schemas to user a1

DEVL=# create user a1 password ‘**’;

CREATE ROLE

DEVL=# 

DEVL=# create schema a1 authorization a1;

CREATE SCHEMA

DEVL=# create schema a2 authorization a1;

CREATE SCHEMA

Now user a1 is owning two schemas in DEVL database.

DEVL=# \dn                   

  List of schemas

  Name  |  Owner   

——–+———-

 a1     | a1

 a2     | a1

 public | postgres

(3 rows)

Now creating table t1 in 3 schemas (public, a1 and a2)

create table t1(name char(30));

CREATE TABLE

insert into t1(name) values (‘table is in public schema’);

postgres=# \c DEVL a1

Password for user a1: 

You are now connected to database “DEVL” as user “a1”.

DEVL=> create table t1(name char(30));

CREATE TABLE

insert into a1.t1(name) values (‘table is in a1 schema’);

As the user name and schema name are not same we have to put prefix a2.t1

DEVL=> create table a2.t1(name char(30));

CREATE TABLE

insert into a2.t1(name) values (‘table is in a2 schema’);

Now check the output of the dictionary views

postgres=# \c DEVL

You are now connected to database “DEVL” as user “postgres”.

DEVL=# select table_catalog,table_schema,table_name from information_schema.tables where table_name=’t1′;

 table_catalog | table_schema | table_name 

—————+————–+————

 DEVL          | public       | t1

 DEVL          | a1           | t1

 DEVL          | a2           | t1

(3 rows)

DEVL=# select catalog_name,schema_name,schema_owner from information_schema.schemata where schema_owner=’a1′;

 catalog_name | schema_name | schema_owner 

————–+————-+————–

 DEVL         | a1          | a1

 DEVL         | a2          | a1

(2 rows)

DEVL=> \dn

  List of schemas

  Name  |  Owner   

——–+———-

 a1     | a1

 a2     | a1

 public | postgres

The PostgreSQL determines which table it is meant, by following the search_path 

DEVL=> show search_path;

   search_path   

—————–

 “$user”, public

As default search_path is “$user, public”, if application logins as user a1 when it tries to select t1 table without suffix then it will select table from a1 schema.

DEVL=# \c DEVL a1

Password for user a1: 

You are now connected to database “DEVL” as user “a1”.

DEVL=> select * from t1;

              name              

——————————–

 table is in a1 schema         

(1 row)

DEVL=> select * from a2.t1;\

              name              

——————————–

 table is in a2 schema         

(1 row)

invalid command \

Try \? for help.

DEVL=> 

DEVL=> select * from a2.t1;

              name              

——————————–

 table is in a2 schema         

(1 row)

Once you exit from a1, here is the output of the table from public schema

DEVL=# 

DEVL=# select * from t1;

              name              

——————————–

 table is in public schema     

(1 row)

One more example to show how the grants are assigned at schema level

DEVL=# create user u1 password ‘**’;

CREATE ROLE

DEVL=# create schema u1 authorization u1;

CREATE SCHEMA

postgres=# \c DEVL u1

Password for user u1: 

You are now connected to database “DEVL” as user “u1”.

DEVL=# \dn

  List of schemas

  Name  |  Owner   

——–+———-

 a1     | a1

 a2     | a1

 public | postgres

 u1     | u1

(4 rows)

DEVL=> create table t1(name char(30));

CREATE TABLE

DEVL=> 

DEVL=> insert into t1(name) values (‘table is in u1 schema’);

INSERT 0 1

Now logging as user a1

DEVL=> \c DEVL a1

Password for user a1: 

You are now connected to database “DEVL” as user “a1”.

DEVL=> 

User a1 can’t access table in schema u1

DEVL=> select * from u1.t1;

ERROR:  permission denied for schema u1

LINE 1: select * from u1.t1;

postgres=# \c DEVL

You are now connected to database “DEVL” as user “postgres”.

Granting access on schema u1 to user a1

DEVL=# grant usage on schema u1 to a1;

GRANT

DEVL=# grant select on all tables in schema u1 to a1;

GRANT

DEVL=# \c DEVL a1

Password for user a1: 

You are now connected to database “DEVL” as user “a1”.

DEVL=> select * from u1.t1;

              name              

——————————–

 table is in u1 schema         

(1 row)

PostgreSQL vs Oracle AutoAnalyze/Gather Stats Jobs

This article will provide details of the automatic stats collector job in PostgreSQL and Oracle

POSTGRESQL                                                                                                 ORACLE

Parameters: Auto analyze/vacuum in PostgreSQL
Default: ON
AUTOVACUUM=ON TRACK_COUNTS = ON     
Parameters: Auto gather statistic in Oracle
Default: TYPICAL STATISTICS_LEVEL=’TYPICAL’
Metadata 
The background process “stats collector” will collect data about inserts, updates and deletes all the time. PG_STAT_ALL_TABLES will have the data about modified records or tuples modified (update/insert/delete)

Metadata
The information about the inserts, updates and deletes on tables will be in SGA. Every 15 minutes SMON will push this data into data-dictionary tables.  Or you can update manually to flush stats. Exec  dbms_stats.FLUSH_DATABASE_MONITORING_INFO();
DBA_TAB_MODIFICATIONS will have the data about modified  records or tuples(update/insert/delete)   
Frequency of the auto analyze

AUTOVACUUM_NAPTIME helps to schedule the frequency of the tables that are required to be analyzed/vacuumed Default:60 sec  You can verify parameter after connecting to server command line show autovacuum_naptime; This is parameter can be modified in postgresql.conf and can be reloaded without database restart For example, connect as super user Select pg_reload_conf();




Frequency of the gather stats
 
Job scheduler helps to modify the frequency  of the statistics collection
Below sql’s will help for verifying default settings, window name and how to modify the job scheduler
select client_name, status, window_group from  DBA_AUTOTASK_CLIENT ; select window_name from DBA_SCHEDULER_WINGROUP_MEMBERS where window_group_name = ‘ORA$AT_WGRP_OS’; EXECUTE DBMS_SCHEDULER.SET_ATTRIBUTE(- ‘MONDAY_WINDOW’,’repeat_interval’,’freq=daily; byday=MON; byhour=1;byminute=0;bysecond=0′)



Stale Percentage

Below parameters will calculate when a table should be analyzed depending no. of DML’s(Insert/update/delete) or stale percent
AUTOVACUUM_ANALYZE_SCALE_FACTOR=0.1(Default stale percentage is 0.1) 0.1 is 10% AUTOVACUUM_ANALYZE_THRESHOLD=50(No. of records modified)
Parameters for reusing space. AUTOVACUUM_VACUUM_ SCALE_FACTOR =0.2 AUTOVACUUM_VACUUM_ THRESHOLD =50 Note: PostgreSQL will consider both percentage of modified records and also no. of modified records or can set either one of them depending upon the table size and other factors.
Stale Percentage
By default, oracle gathers stats when the  stale percentage is >=10
Use GLOBAL_PREFS to modify at global level Exec DBMS_STATS.SET_GLOBAL_PREFS(‘STALE_PERCENT’, ’10’);
Can be modified at table level as follows exec DBMS_STATS.SET_TABLE_PREFS(null,’EMP’, ‘STALE_PERCENT’,15)

PostgreSQL vs Oracle Comparing reuse of space, bloat, and fragmentation

In PostgreSQL, the old version rows or dead tuples or obsolete records that are generated due to delete and update will not release the space unless the table is vacuumed. The vacuum helps to release the space occupied by obsolete records but the released space will be under table or segment. These blocks will be used for subsequent inserts and updates into the table. The table will be vacuumed whenever it reaches the threshold. The following parameters will help to calculate when a table should be vacuumed.

autovacuum_vacuum_scale_factor=0.2(whenever 20 % of records modified) 

autovacuum_vacuum_threshold=50(No. of records)

Suppose if there is a table with 1000 rows, it will get vacuumed if there are deletes (or updates) 1000 * 0.2 + 50 = 250 rows i.e 20% + 50.

The above parameters can be set at the cluster level as well as table level.

Oracle reuses the space in blocks generated during deletes (No bloat for updates like PostgreSQL). Oracle reuses space in the blocks that are listed in freelist of the segment. PCTUSED helps to calculate when a block should be listed in freelist. Suppose if the value of the PCTUSED is 40, and the space occupied by rows in a block is less than 40% then that block will be listed in freelist. These blocks will be used for subsequent inserts and updates into the table. PCTUSED is usually set at tablespace level however this can be set at table level as well.

Again, going back to PostgreSQL, once the table is vacuumed, the space occupied by obsolete records can be reused but space is not released to disk. Similarly, how Oracle will not release the blocks and maintains the blocks under segment freelist below the high-water mark. 

Vacuum full is something similar to “Move” (Not Shrink) in Oracle which will release space to the disk. Like “Move” operation, the “vacuum full” will lock the table and it requires additional free space of the size the table to complete the operation and helps to reduce the size of the table. “Shrink” in Oracle has more advantage like no need of additional space and lock time on a table is also very less compared to “Move”

Created “sample” table with 1999 tuples or rows both in Oracle and PostgreSQL to show the example of reuse space(vacuum), bloat and vacuum full(defragmentation)

select count(1) from sample;

 count 

——-

   1999

PostgreSQL : Table Size and Space occupied with rows(estimated size)

DEVL=# select table_len as table_size,tuple_len as estimated_size from pgstattuple(‘sample’);

 table_size | estimated_size 

————+—————-

     286720 |         267452

Oracle: Table Size and Space occupied with rows (estimated size)

SQL> select blocks*8*1024 as table_size,num_rows*avg_row_len as estimated_size from user_tables where table_name = ‘SAMPLE’;

TABLE_SIZE ESTIMATED_SIZE

———-              ————–

    262144   217891

Note: There is a slight difference in table size vs estimated size(total bytes occupied by data) because of overhead metadata.

Bloat Example

PostgreSQL: Update

Bloat in Postgresql: updating all the 1999 records, due to MVCC architecture, table size will increase even during update, vacuum helps to reuse the space below the high water mark. 

DEVL=# UPDATE sample SET entity_ind = ‘T’ WHERE entity_ind = ‘I’;

UPDATE 1999

DEVL=# select table_len as table_size,tuple_len as estimated_size from pgstattuple(‘sample’);

 table_size | estimated_size 

————+—————-

     565248 |         267452

DEVL=# UPDATE sample SET entity_ind = ‘I’ WHERE entity_ind = ‘T’;

UPDATE 1999

DEVL=# select table_len as table_size,tuple_len as estimated_size from pgstattuple(‘sample’);

 table_size | estimated_size 

————+—————-

     843776 |         267452

DEVL=# vacuum sample;

VACUUM

DEVL=# analyze sample;

ANALYZE

Started reusing the space because the table is vacuumed

DEVL=# UPDATE sample SET entity_ind = ‘E’ WHERE entity_ind = ‘I’;

UPDATE 1999

DEVL=# select table_len as table_size,tuple_len as estimated_size from pgstattuple(‘sample’);

 table_size | estimated_size 

————+—————-

     843776 |         267452

(1 row)

Oracle: Update

There is no bloat, Oracle uses undo segments for old version of records

SQL> UPDATE sample SET entity_ind = ‘T’ WHERE entity_ind = ‘I’;

1999 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(”,’SAMPLE’);

PL/SQL procedure successfully completed.

SQL> select blocks*8*1024 as table_size,num_rows*avg_row_len as estimated_size from user_tables where table_name = ‘SAMPLE’;

TABLE_SIZE ESTIMATED_SIZE

     ———-             ————–

    262144   217891

PostgreSQL: DELETE— fragmentation

DEVL=# delete from sample where de_user=’NA’;

DELETE 1752

DEVL=# select table_len as table_size,tuple_len as estimated_size from pgstattuple(‘sample’);

 table_size | estimated_size 

————+—————-

     843776 |          34896

(1 row)

DEVL=# vacuum full sample;  —Helps to release waste space back from segment or table and also creates new table file

VACUUM

DEVL=# analyze sample;

ANALYZE

DEVL=# select table_len as table_size,tuple_len as estimated_size from pgstattuple(‘sample’);

 table_size | estimated_size 

————+—————-

      40960 |          34896

Oracle -DELETE — Creates fragmentation

SQL> delete from sample where de_user=’NA’;

1752 rows deleted.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(”,’SAMPLE’);

PL/SQL procedure successfully completed.

SQL> select blocks*8*1024 as table_size,num_rows*avg_row_len as estimated_size from user_tables where table_name = ‘SAMPLE’;

TABLE_SIZE ESTIMATED_SIZE

———- ————–

    262144     28899

SQL> alter table sample move; —- Helps reclaiming space in a table. It recovers space above and below the high-water mark

Table altered.

SQL> exec dbms_stats.gather_table_stats(”,’SAMPLE’);

PL/SQL procedure successfully completed.

SQL> select blocks*8*1024 as table_size,num_rows*avg_row_len as estimated_size from user_tables where table_name = ‘SAMPLE’;

TABLE_SIZE ESTIMATED_SIZE

———-                 ————–

     40960   28899

Note: Didn’t consider overhead space used by a block or page both in PostgreSQL and Oracle.