[postgis-users] Various ways to handle addresses in postgresql

Shaozhong SHI shishaozhong at gmail.com
Sat Jan 9 08:02:56 PST 2021


Hi, Steve,

Many thanks.

We are strongly interested in dealing with UK addresses.

Last time, I passed on messages for testing.  I was told not working for UK
addresses.

I will try it again.   Anything else do we have to watch out for?

Regards,

David

On Sat, 9 Jan 2021 at 15:53, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:

> David,
>
> This is the link to the address standardizer:
> https://github.com/woodbri/address-standardizer
>
> This is a link to all my code that I developed consulting. It includes a
> few SQL geocoders based on the code above. And has some README files
> discussing how to build a geocoder which is the basis for how the
> geocoders work.
>
> https://github.com/woodbri/imaptools.com
>
> this is the geocoder for Tiger data, but the code is essentially the
> same for every country because the when you load country specific data
> into the database it goes into its own table and then you standardize
> that data into stdstreets table and all queries are done against the
> stdstreets table and you only have to tweak the address range
> interpolation function which needs to access the source streets table
> for the geometry and house number ranges.
>
>
> https://github.com/woodbri/imaptools.com/blob/master/sql-scripts/geocoder/prep-tiger-geo-new.sql
>
> I would approach this by:
>
> 1. get the address standardizer compiled and installed. I can help if
> you run into problems or have questions.
> 2. load your UK street data into rawdata schema, ideally it would be
> best if we can create a table/view that presents this data as a single
> table where each record represents one side of the street and one
> jurisdiction this may mean that a single record in your source data will
> generate multiple records in this table/view (this greatly simplifies
> the coding and performance later)
> 3. look at the prep-tiger-geo-new.sql file
> 4. create a stdstreets table and standardize your table/view data into it
> 5. look at standardization failures and adjust lexicon and grammar as
> needed
> 6. loop back to 4 until good enough
> 7. load functions from prep-tiger-geo-new.sql file and adjust any for
> your data
> 8. try it out!
>
> -Steve
>
>
> On 1/9/2021 10:22 AM, Shaozhong SHI wrote:
> > Hi, Stephen,
> >
> > Many thanks.  We are interested in it is working with the UK addresses.
> >
> > Please send me the link to this.
> >
> > Regards,
> >
> > David
> >
> > On Sat, 9 Jan 2021 at 15:00, Stephen Woodbridge
> > <stephenwoodbridge37 at gmail.com <mailto:stephenwoodbridge37 at gmail.com>>
> > wrote:
> >
> >     David,
> >
> >     Yup and this is just one a dozens of cases that you have to deal
> >     with. You are dealing with a natural language processing problem.
> >     And you have to deal with human input that has typos and
> >     abbreviations.
> >
> >     These issues are what the address standardizer fixes. It tokenized
> >     the address and uses the gazette to standardize the terms and then
> >     classifies each term and assigns it to part of the address based
> >     on a grammar.
> >
> >     So there is a simple solution, use my address standardizer, it is
> >     free, MIT license, it has a sample lexicon/ gazette and grammar
> >     for the UK, it is easy to modify these to fit your needs, and it
> >     just works. Oh if you want to do another county it also has sample
> >     files for 25 countries.
> >
> >     Sent from my iPhone
> >
> >>     On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski
> >>     <me at komzpa.net <mailto:me at komzpa.net>> wrote:
> >>
> >>     
> >>     Hello,
> >>
> >>     People make neural networks for this kind of task:
> >>
> >>     https://github.com/openvenues/libpostal
> >>     <https://github.com/openvenues/libpostal>
> >>
> >>     сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI
> >>     <shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>> напісаў:
> >>
> >>         Hi, Steve W,
> >>
> >>         it is easy to parse addresses as tokens. But it is difficult
> >>         to put tokens in right columns, due to that the same address
> >>         could be expressed with partial address or full address.
> >>
> >>         The same address can be written like, Flat 1 122 Great Avenue
> >>         London UK, or Flat 1 122 Greet Avenue Central London London
> >>         United Kingdom.
> >>
> >>         When this happens, each address has different number of
> >>         tokens, so different numbers of tokens.  Is there a way to
> >>         deal with this issue so that each token can get into right
> >>         column?
> >>
> >>         Please enlighten me.
> >>
> >>         Regards,
> >>
> >>         David
> >>
> >>         On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge
> >>         <stephenwoodbridge37 at gmail.com
> >>         <mailto:stephenwoodbridge37 at gmail.com>> wrote:
> >>
> >>             And I have create an address-standardizer project here
> >>             https://github.com/woodbri/address-standardizer
> >>             <https://github.com/woodbri/address-standardizer> which
> >>             is user
> >>             configurable. I might be over kill is you just want to
> >>             strip off the
> >>             number, in which case you might just use a SQL regexp
> >>             replace to remove it.
> >>
> >>             -Steve W
> >>
> >>             On 4/25/2020 12:04 AM, Stephen Woodbridge wrote:
> >>             > PostGIS has address_standardizer extension that includes
> >>             > parse_address() and standardize_address() functions.
> >>             >
> >>             > -Steve W
> >>             >
> >>             > On 4/24/2020 9:54 PM, Imre Samu wrote:
> >>             >> > handle addresses in postgresql
> >>             >>
> >>             >> maybe you can use the
> >>             https://github.com/openvenues/libpostal
> >>             <https://github.com/openvenues/libpostal> library
> >>             >> with your favorite language bindings ( Python / Ruby /
> >>             Go / PHP /
> >>             >> Node / R / Java  ...)
> >>             >>
> >>             >> or as a Postgres database extension:
> >>             >>
> >>
> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal
> >>             <
> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal
> >
> >>
> >>             >>
> >>             >> https://github.com/pramsey/pgsql-postal
> >>             <https://github.com/pramsey/pgsql-postal>
> >>             >>
> >>             >> Regards,
> >>             >>  Imre
> >>             >>
> >>             >>
> >>             >>
> >>             >>
> >>             >> Shaozhong SHI <shishaozhong at gmail.com
> >>             <mailto:shishaozhong at gmail.com>
> >>             >> <mailto:shishaozhong at gmail.com
> >>             <mailto:shishaozhong at gmail.com>>> ezt írta (időpont:
> >>             2020. ápr. 25.,
> >>             >> Szo, 2:49):
> >>             >>
> >>             >>     I find this is a simple, but important question.
> >>             >>
> >>             >>     How best to split numbers and the rest of address?
> >>             >>
> >>             >>     For instance, one tricky one is as follows:
> >>             >>
> >>             >>     21-1 Great Avenue, a city, a country, this planet
> >>             >>
> >>             >>     How to turn this into the following:
> >>             >>
> >>             >>     column 1,       column 2
> >>             >>
> >>             >>       21-1              Great Avenue, a city, a
> >>             country, this planet
> >>             >>
> >>             >>     Note:  there is a hyphen in 21-1
> >>             >>
> >>             >>     Any clue?
> >>             >>
> >>             >>     Regards,
> >>             >>
> >>             >>     Shao
> >>             >> _______________________________________________
> >>             >>     postgis-users mailing list
> >>             >> postgis-users at lists.osgeo.org
> >>             <mailto:postgis-users at lists.osgeo.org>
> >>             <mailto:postgis-users at lists.osgeo.org
> >>             <mailto:postgis-users at lists.osgeo.org>>
> >>             >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >>             <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >>             >>
> >>             >>
> >>             >> _______________________________________________
> >>             >> postgis-users mailing list
> >>             >> postgis-users at lists.osgeo.org
> >>             <mailto:postgis-users at lists.osgeo.org>
> >>             >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >>             <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >>             >
> >>
> >>             _______________________________________________
> >>             postgis-users mailing list
> >>             postgis-users at lists.osgeo.org
> >>             <mailto:postgis-users at lists.osgeo.org>
> >>             https://lists.osgeo.org/mailman/listinfo/postgis-users
> >>             <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >>
> >>         _______________________________________________
> >>         postgis-users mailing list
> >>         postgis-users at lists.osgeo.org
> >>         <mailto:postgis-users at lists.osgeo.org>
> >>         https://lists.osgeo.org/mailman/listinfo/postgis-users
> >>         <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >>
> >>     _______________________________________________
> >>     postgis-users mailing list
> >>     postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org
> >
> >>     https://lists.osgeo.org/mailman/listinfo/postgis-users
> >>     <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >     _______________________________________________
> >     postgis-users mailing list
> >     postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
> >     https://lists.osgeo.org/mailman/listinfo/postgis-users
> >     <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210109/b4fb13f9/attachment.html>


More information about the postgis-users mailing list