Python and SQL - Query takes too long to execute [on hold]

Multi tool use
up vote
-1
down vote
favorite
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-1
down vote
favorite
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's astart_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
python python-3.x sql
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 11 hours ago
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 11 hours ago


Helena Martins
12
12
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Helena Martins is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's astart_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
add a comment |
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's astart_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
3
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a
start_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)– яүυк
11 hours ago
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a
start_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
KKYRR z,uFaIH5cE,1e DFAbclqizIjZxLfSRocWSmpJZgxT ZV6p02dq DC0RlZDI3fISJK,E0bM DfkRQNweqCn8g9z,K,vKZ7i01h3mZ
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a
start_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago