Segmentation fault restoring data to Azure VM

Tue Apr 23 11:05:30 PDT 2024

Yah dropping the raster extension if you are not using it would be my first
suggestion.

If you do 

DROP EXTENSION postgis_raster;

And it successfully drops, then you definitely don't have data tied to it so
safe to drop before you do your pg_dump and restore.

I've seen people complain about random crashes with for example a bad netcdf
driver that gets loaded when gdal lib is loaded.  So if you are running into
one of those types of issues, not installing the raster extension will save
you from that pain.

I do see you have Azure postgis compiled with GEOS 3.10.2, but you are
running with 3.11.1.  In theory that shouldn't be an issue since 3.10.2 is
upward compatible with 3.11.

Everything else at a glance looks to be in shape.

Hope that helps,

Regina

From: Jeffrey Durrence <jeffrey.durrence at mcleanengineering.com> 
Sent: Tuesday, April 23, 2024 1:21 PM
To: Regina Obe <lr at pcorp.us>; postgis-users at lists.osgeo.org
Subject: Re: Segmentation fault restoring data to Azure VM

Regina,

Thank you so much for your quick response (and for your contributions to the
rest of us in your writings)!

On the physical server I have:
 POSTGIS="3.4.2 c19ce56" [EXTENSION] PGSQL="150" GEOS="3.10.2-CAPI-1.16.0"
PROJ="8.2.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org
USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/share/proj/proj.db"
GDAL="GDAL 3.4.3, released 2022/04/22" LIBXML="2.9.13" LIBJSON="0.15"
LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" (core procs from "3.4.1
ca035b9" need upgrade) RASTER (raster procs from "3.4.1 ca035b9" need
upgrade)

On the Azure server I have 
POSTGIS="3.4.2 c19ce56" [EXTENSION] PGSQL="150" GEOS="3.11.1-CAPI-1.17.1"
(compiled against GEOS 3.10.2) PROJ="8.2.1 NETWORK_ENABLED=OFF
URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj
DATABASE_PATH=/usr/share/proj/proj.db" GDAL="GDAL 3.4.3, released
2022/04/22" LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0
(Internal)" RASTER

There are some differences, though I'm not sure of the best strategy to
reconcile this.  As for the raster extension, I don't use it for my
workflows.  I could drop the postgis_raster extension and see if that allows
my export/import to succeed without error.

-Jeffrey

Jeffrey Durrence

McLean Engineering Company

www.mcleanengineering.com <http://www.mcleanengineering.com/> 

Office: 229-985-1148

Mobile: 229-798-0480

  _____  

From: Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> >
Sent: Tuesday, April 23, 2024 1:09 PM
To: Jeffrey Durrence <jeffrey.durrence at mcleanengineering.com
<mailto:jeffrey.durrence at mcleanengineering.com> >;
postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
<postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> >
Subject: RE: Segmentation fault restoring data to Azure VM 

What does 

SELECT postgis_full_version();  

Run on your current setup vs. your Azure VM setup.

Might thinking is it might be a dependent library crashing.

Usually the issues are with postgis_raster extension since that drags in
GDAL which drags in several more libraries.

Could also be an issue with PROJ.  As I recall at least on windows, I used
to get crashes if PROJ_LIB path wasn't set, but I think that issue has gone
away with recent releases.

Not sure if it was ever an issue on Linux or Unix.

Hope that helps,

Regina

From: Jeffrey Durrence <jeffrey.durrence at mcleanengineering.com
<mailto:jeffrey.durrence at mcleanengineering.com> > 
Sent: Tuesday, April 23, 2024 11:42 AM
To: postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org> 
Subject: Segmentation fault restoring data to Azure VM

Greetings,

I am migrating PostgreSQL/PostGIS databases from a physical server with
these specs:
Host OS: Ubuntu 22.04.4 LTS x64 (32GB RAM, 8GB Swap, 1TB disk space, XEON
E5-1660 CPU)
Postgres: 15.6.1-pgdg22.04+1
PostGIS: 3.4.2

To an Azure VM with 

Host OS: Ubuntu 22.04.4 LTS x64 (32GB RAM, 8GB Swap, 512GB disk space, XEON
8171M) (again, this is in Azure VM)
Postgres: 15.6.1-pgdg22.04+1
PostGIS: 3.4.2

In the past, I have migrated the same data from on the same physical
hardware for changes in host O/S, Postgres version, and/or PostGIS version.
I've also migrated from one physical host to another.  When doing so, I try
to make sure that my software packages are up to date and that the packages
in the destination environment match.  I did this for this planned
environment as well.  Because I know that some of my dbs have several years
of history in them, I export these dbs individually using pg_dump (using the
custom format when possible but in some cases I use the plain text option)
and pg_restore on the destination host.

During the process of migrating data in this manner, I observed that
restoration of many of my dbs failed.  I did lots of testing with this and
have made these additional observations:

1.	I am able to restore my export files to another physical host with
the same software environment without error.  (the errors occur when
importing to virtual environment).

2.	If I split my export/import into separate steps for the schema and
data, it is the schema import that produces the error, not the data import.

3.	When the error occurs, the db process terminates with a segmentation
vault (see log output below), and the db process restarts.

4.	With logging of all statements enabled, I can see that the
segmentation vault occurs at the absolute end of the import, when it seems
that all commands are completed.  So long as I wait on the server process to
recover, I have no errors when I import the data following the error.

5.	The db imports that produce these errors seem to be those with more
history (e.g., it's more likely that these were created with an older
version of postGIS and subsequently upgraded to the versions they have now).

6.	The errors do seem related to the installation of the PostGIS
extension.  If I purposely drop the extension from the source database
before I export, the resulting file can be imported in the target
environment without error.  This is obviously undesirable as it results in
my losing all of the geometry info from tables, views, etc. in the db.  I
only did this to experiment.

My typical restore command looks like this:
pg_restore --verbose --format=custom --dbname=my_db
/tmp/my_db_schema_only.backup 

(plus some options to capture the output) 

Here's a look at what the command line output looks like when this happens:

LOG:  server process (PID 52476) was terminated by signal 11: Segmentation
fault

LOG:  terminating any other active server processes

LOG:  all server processes terminated; reinitializing

LOG:  database system was interrupted; last known up at 2024-04-23 02:16:49
UTC

LOG:  database system was not properly shut down; automatic recovery in
progress

LOG:  redo starts at 5/51381960

WARNING:  could not open directory "base/696372": No such file or directory

CONTEXT:  WAL redo at 5/51381A48 for Database/DROP: dir 1663/696372

WARNING:  some useless files may be left behind in old database directory
"base/696372"

CONTEXT:  WAL redo at 5/51381A48 for Database/DROP: dir 1663/696372

LOG:  invalid record length at 5/52679558: expected at least 24, got 0

LOG:  redo done at 5/526794E0 system usage: CPU: user: 0.07 s, system: 0.01
s, elapsed: 0.10 s

LOG:  checkpoint starting: end-of-recovery immediate wait

LOG:  checkpoint complete: wrote 2699 buffers (16.5%); 0 WAL file(s) added,
1 removed, 0 recycled;

B; lsn=5/52679558, redo lsn=5/52679558

LOG:  database system is ready to accept connections

In the log file, the section written as the error occurs is something like
this...

[60419] [6627bcbb.ec03] [2024-04-23 13:50:54.813 UTC] [0]: LOG:  statement:
GRANT ALL ON TABLE my_schema.my_table TO my_user_name;

[60419] [6627bcbb.ec03] [2024-04-23 13:50:54.815 UTC] [0]: LOG:  statement:
-- Completed on 2024-04-23 13:50:54 UTC

[60419] [6627bcbb.ec03] [2024-04-23 13:50:54.815 UTC] [0]: LOG:  statement:
--

        -- PostgreSQL database dump complete

        --

[59940] [6627b39f.ea24] [2024-04-23 13:50:54.943 UTC] [0]: LOG:  server
process (PID 60419) was terminated by signal 11: Segmentation fault

[59940] [6627b39f.ea24] [2024-04-23 13:50:54.943 UTC] [0]: LOG:  terminating
any other active server processes

[59940] [6627b39f.ea24] [2024-04-23 13:50:54.945 UTC] [0]: LOG:  all server
processes terminated; reinitializing

[60423] [6627bcbf.ec07] [2024-04-23 13:50:55.007 UTC] [0]: LOG:  database
system was interrupted; last known up at 2024-04-23 13:50:48 UTC

[60423] [6627bcbf.ec07] [2024-04-23 13:51:05.511 UTC] [0]: LOG:  syncing
data directory (fsync), elapsed time: 10.00 s, current path:
./base/897556/1249

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.554 UTC] [0]: LOG:  database
system was not properly shut down; automatic recovery in progress

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.606 UTC] [0]: LOG:  redo starts
at E/3A8785A8

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: WARNING:  could
not open directory "base/562219": No such file or directory

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: CONTEXT:  WAL
redo at E/3A878690 for Database/DROP: dir 1663/562219

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: WARNING:  some
useless files may be left behind in old database directory "base/562219"

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.607 UTC] [0]: CONTEXT:  WAL
redo at E/3A878690 for Database/DROP: dir 1663/562219

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.704 UTC] [0]: LOG:  invalid
record length at E/3BACDAF8: expected at least 24, got 0

[60423] [6627bcbf.ec07] [2024-04-23 13:51:12.704 UTC] [0]: LOG:  redo done
at E/3BACDA80 system usage: CPU: user: 0.07 s, system: 0.01 s, elapsed: 0.09
s

[60424] [6627bcbf.ec08] [2024-04-23 13:51:12.804 UTC] [0]: LOG:  checkpoint
starting: end-of-recovery immediate wait

[60424] [6627bcbf.ec08] [2024-04-23 13:51:13.143 UTC] [0]: LOG:  checkpoint
complete: wrote 2592 buffers (15.8%); 0 WAL file(s) added, 1 removed, 0
recycled; writ

9 s, average=0.001 s; distance=18773 kB, estimate=18773 kB; lsn=E/3BACDAF8,
redo lsn=E/3BACDAF8

[59940] [6627b39f.ea24] [2024-04-23 13:51:13.156 UTC] [0]: LOG:  database
system is ready to accept connections

I searched the mail archive and performed some additional web searches based
on my observations.  Unfortunately, I haven't found a similar-enough
incident where I might get suggestions for further troubleshooting.  In over
10 years of using PostgreSQL+PostGIS, I've never had to dig this deeply to
determine the source of error.  I'm looking for advice for further
troubleshooting or for shared experiences in similar situations

Thank you for your time!

Jeffrey Durrence

McLean Engineering Company

www.mcleanengineering.com <http://www.mcleanengineering.com/> 

Office: 229-985-1148

Mobile: 229-798-0480

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20240423/67c62498/attachment.htm>