Python and SQL - Query takes too long to execute [on hold]
up vote
-1
down vote
favorite
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
New contributor
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-1
down vote
favorite
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
New contributor
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's astart_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
New contributor
I've recently began to work with Database queries when I was asked to develop a program that would have read data from the last 1 month in a Firebird DB with almost 100M rows.
After stumbling a little bit, I finally managed to make the code work, using Python (and, more specifically, Pandas library), but the code takes more than 8 hours just to filter the data and has to be executed everyday.
The rest of the code runs really quickly, since I just need around the 3000 last rows of the dataset.
So far, my function responsible to execute the query is:
def read_query(access):
start_time = time.time()
conn = pyodbc.connect(access)
df = pd.read_sql_query(r"SELECT * from TABLE where DAY >= DATEADD(MONTH,-1, CURRENT_TIMESTAMP(2)) AND DAY <= 'TODAY'", conn)
Is there any better (and quicker) approach to filter this data? Maybe reading each row from DB alone, starting by its bottom? And if so, what could I do to optimize my code run time?
python python-3.x sql
python python-3.x sql
New contributor
New contributor
edited 11 hours ago
New contributor
asked 11 hours ago
Helena Martins
12
12
New contributor
New contributor
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as off-topic by Dannnno, 200_success, Gerrit0, яүυк, Mast 7 hours ago
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Lacks concrete context: Code Review requires concrete code from a project, with sufficient context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site." – Dannnno, Gerrit0, яүυк, Mast
If this question can be reworded to fit the rules in the help center, please edit the question.
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's astart_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
add a comment |
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's astart_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)
– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
3
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a
start_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)– яүυк
11 hours ago
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a
start_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
3
Hello! Some questions: How big is your table? How does the data look like? It's the table indexed in any way? Where's the rest of the code? How can I run it? Is using FireDB a requirement? There's a
start_time = time.time()
but where does it end? Why did you chose it? Summary: please add more context :)– яүυк
11 hours ago
Unfortunately, it was a choice from my company's clients to use Firebase and yes, I believe running the analysis on it is the best way to find a solution. And most of the data is sigilous, unfortunately, but it basicaly consists in dates. About thecode, there isn't much more, actually: just printing the selected rows. And the size is described on the question. Hope it helps ^^
– Helena Martins
11 hours ago