Python and SQL - Query takes too long to execute [on hold]











up vote
-1
down vote

favorite












I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?










share|improve this question









New contributor




Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    11 hours ago












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    11 hours ago















up vote
-1
down vote

favorite












I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?










share|improve this question









New contributor




Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    11 hours ago












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    11 hours ago













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?










share|improve this question









New contributor




Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.



After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.



The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.



So far, my function responsible to execute the query is:



def read_query(access):

start_time = time.time()

conn = pyodbc.connect(access)

df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)


Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?







python python-3.x sql






share|improve this question









New contributor




Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 11 hours ago





















New contributor




Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 11 hours ago









Helena Martins

12




12




New contributor




Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.




put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.








  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    11 hours ago












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    11 hours ago














  • 3




    Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
    – яүυк
    11 hours ago












  • Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
    – Helena Martins
    11 hours ago








3




3




Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago






Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a start_time = time.time() but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago














Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago




Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago















active

oldest

votes






















active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes

Popular posts from this blog

Morgemoulin

Scott Moir

Souastre