How can I optimize a Point in Polygon query for millions of points when most of the points lie within the...

I have 150 million points in a point table and would like to find the few points lying outside a given polygon geometry. I know that 99.9% of the points are within the polygon geometry. I am interested in finding the few points which lie outside the polygon.

My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?

SELECT COUNT(*) 

FROM italy_points pt

JOIN borders poly

ON ST_WITHIN (pt.the_geom, poly.geom)

WHERE poly.iso3 = 'ITA';

The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)

edited Dec 17 at 22:25

asked Dec 17 at 16:49

Prithvi

433

2

Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 at 17:02

The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 at 19:56

@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only GROUP BY the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in the SELECT clause that come from a table where the primary key is included in the GROUP BY clause.)
– jpmc26
Dec 17 at 21:02

@WhiteboxDev ST_Within already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 at 21:05

@jpmc26 Certainly, but the SQL query would also need to be modified to use ST_Intersects, since ST_Within would not reliably match internal boundary conditions.
– Vince
Dec 17 at 21:52

|
show 1 more comment

My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?

SELECT COUNT(*) 

FROM italy_points pt

JOIN borders poly

ON ST_WITHIN (pt.the_geom, poly.geom)

WHERE poly.iso3 = 'ITA';

The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)

edited Dec 17 at 22:25

asked Dec 17 at 16:49

Prithvi

433

2

Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 at 17:02

The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 at 19:56

@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only GROUP BY the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in the SELECT clause that come from a table where the primary key is included in the GROUP BY clause.)
– jpmc26
Dec 17 at 21:02

@WhiteboxDev ST_Within already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 at 21:05

@jpmc26 Certainly, but the SQL query would also need to be modified to use ST_Intersects, since ST_Within would not reliably match internal boundary conditions.
– Vince
Dec 17 at 21:52

|
show 1 more comment

My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?

SELECT COUNT(*) 

FROM italy_points pt

JOIN borders poly

ON ST_WITHIN (pt.the_geom, poly.geom)

WHERE poly.iso3 = 'ITA';

The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)

edited Dec 17 at 22:25

asked Dec 17 at 16:49

Prithvi

433

My present best query using indexed PostGIS tables takes about 30 minutes to complete. Is there a way to optimize the following query knowing that most of the points are within the polygon (border)?

SELECT COUNT(*) 

FROM italy_points pt

JOIN borders poly

ON ST_WITHIN (pt.the_geom, poly.geom)

WHERE poly.iso3 = 'ITA';

The polygon is basically the admin 0 border of Italy.
Vertices - 405,000. Parts - 510. The envelope is much larger than the polygon (The polygon covers 24% of the envelope)

postgis postgresql

edited Dec 17 at 22:25

asked Dec 17 at 16:49

Prithvi

433

edited Dec 17 at 22:25

asked Dec 17 at 16:49

Prithvi

433

edited Dec 17 at 22:25

asked Dec 17 at 16:49

Prithvi

433

asked Dec 17 at 16:49

Prithvi

433

asked Dec 17 at 16:49

Prithvi

433

2

Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 at 17:02

The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 at 19:56

@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only GROUP BY the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in the SELECT clause that come from a table where the primary key is included in the GROUP BY clause.)
– jpmc26
Dec 17 at 21:02

@WhiteboxDev ST_Within already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 at 21:05

@jpmc26 Certainly, but the SQL query would also need to be modified to use ST_Intersects, since ST_Within would not reliably match internal boundary conditions.
– Vince
Dec 17 at 21:52

|
show 1 more comment

2

Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 at 17:02

The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 at 19:56

@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only GROUP BY the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in the SELECT clause that come from a table where the primary key is included in the GROUP BY clause.)
– jpmc26
Dec 17 at 21:02

@WhiteboxDev ST_Within already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 at 21:05

@jpmc26 Certainly, but the SQL query would also need to be modified to use ST_Intersects, since ST_Within would not reliably match internal boundary conditions.
– Vince
Dec 17 at 21:52

Please Edit the question to give an indication of the complexity of the polygon -- How many parts? How many vertices? What percentage of the envelope of the polygon is within the polygon. I've found that partitioning complex polygons can improve point-in polygon evaluation, but you need to handle the condition where a single point intersects more than one partition.
– Vince
Dec 17 at 17:02

The first optimization for this type of operation is commonly to check whether the point is in the bounding box of the polygon before carrying on to the full point-in-polygon operation. Point-in-box is a very efficient operation by comparison.
– WhiteboxDev
Dec 17 at 19:56

@Vince If duplicates are possible (the only case I think think of is when it falls exactly on the border of two partitions), this is trivially handled in PostGIS. You need only GROUP BY the primary key of the points. (PostgreSQL conveniently allows you to reference any columns in the SELECT clause that come from a table where the primary key is included in the GROUP BY clause.)
– jpmc26
Dec 17 at 21:02

@WhiteboxDev ST_Within already does a boundary box check that enables to use of the index. (Almost all of PostGIS's functions include this optimization.) If it's still slow, then clearly the problem is with the complexity of the polygon.
– jpmc26
Dec 17 at 21:05

@jpmc26 Certainly, but the SQL query would also need to be modified to use ST_Intersects, since ST_Within would not reliably match internal boundary conditions.
– Vince
Dec 17 at 21:52

|
show 1 more comment

1 Answer
1

active

oldest

votes

Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.

Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).

answered Dec 17 at 17:00

Mesa

989615

4

Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 at 17:27

The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 at 22:28

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "79"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f306295%2fhow-can-i-optimize-a-point-in-polygon-query-for-millions-of-points-when-most-of%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.

Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).

answered Dec 17 at 17:00

Mesa

989615

4

Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 at 17:27

The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 at 22:28

add a comment |

Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.

Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).

answered Dec 17 at 17:00

Mesa

989615

4

Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 at 17:27

The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 at 22:28

add a comment |

Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.

Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).

answered Dec 17 at 17:00

Mesa

989615

Use ST_Subdivide to cut your polygon into smaller polygons, save them to a table and create a spatial index. Then do your query on the gridded polygons.

Without that, spatial indexing does not provide any advantages in your case (only 1 polygon of interest).

answered Dec 17 at 17:00

Mesa

989615

answered Dec 17 at 17:00

Mesa

989615

answered Dec 17 at 17:00

Mesa

989615

answered Dec 17 at 17:00

Mesa

989615

4

Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 at 17:27

The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 at 22:28

add a comment |

4

Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 at 17:27

The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 at 22:28

Further to the ST_Subdivide() comment, you might find that using less than the default number of vertices yields further gains by increasing index leverage and decreasing geometry recovery time. Try 64 or even 32.
– Paul Ramsey
Dec 17 at 17:27

The query now completes in 5 minutes as opposed to 30! Thanks for the suggestion.
– Prithvi
Dec 17 at 22:28

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Geographic Information Systems Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk