Python and SQL - Query takes too long to execute [on hold]

up vote
-1
down vote

favorite

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

edited 11 hours ago

asked 11 hours ago

Helena Martins

New contributor

put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago

add a comment |

up vote
-1
down vote

favorite

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

edited 11 hours ago

asked 11 hours ago

Helena Martins

New contributor

put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago

add a comment |

up vote
-1
down vote

favorite

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

edited 11 hours ago

asked 11 hours ago

Helena Martins

New contributor

I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.

The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.

So far, my function responsible to execute the query is:

def read_query(access):



    start_time = time.time()



    conn = pyodbc.connect(access)



    df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)

Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?

python python-3.x sql

edited 11 hours ago

asked 11 hours ago

Helena Martins

New contributor

edited 11 hours ago

asked 11 hours ago

Helena Martins

New contributor

edited 11 hours ago

asked 11 hours ago

Helena Martins

New contributor

asked 11 hours ago

Helena Martins

asked 11 hours ago

Helena Martins

New contributor

Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago

add a comment |

3

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago

Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago

Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago

add a comment |

active

oldest

votes

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk