Python Pandas exploratory data analysis - output in Windows [on hold]
up vote
1
down vote
favorite
My goal is to generate EDA reports for new datasets.
Step 1 - python script
To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.
import pandas as pd
def EDA(df, name):
'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''
df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')
path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')
# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')
Step 2 - Administrator CMD commands
two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file
My data come from a global database and often have lettering that will not work with the default codec
chcp 65001
set PYTHONIOENCODING=utf-8
python Syntaxnameofscript.py > OutputdatasetEDA.txt
Issues
- have to create customized script for each dataset
- have to run cmd commands for each dataset
- output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.
Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory
python
New contributor
put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill
If this question can be reworded to fit the rules in the help center, please edit the question.
|
show 1 more comment
up vote
1
down vote
favorite
My goal is to generate EDA reports for new datasets.
Step 1 - python script
To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.
import pandas as pd
def EDA(df, name):
'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''
df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')
path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')
# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')
Step 2 - Administrator CMD commands
two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file
My data come from a global database and often have lettering that will not work with the default codec
chcp 65001
set PYTHONIOENCODING=utf-8
python Syntaxnameofscript.py > OutputdatasetEDA.txt
Issues
- have to create customized script for each dataset
- have to run cmd commands for each dataset
- output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.
Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory
python
New contributor
put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill
If this question can be reworded to fit the rules in the help center, please edit the question.
2
Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday
1
Also, what ischcp
(I have no idea, since I am on a UNIX system) and why do you need to setPYTHONIOENCODING
if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday
EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday
Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday
First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday
|
show 1 more comment
up vote
1
down vote
favorite
up vote
1
down vote
favorite
My goal is to generate EDA reports for new datasets.
Step 1 - python script
To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.
import pandas as pd
def EDA(df, name):
'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''
df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')
path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')
# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')
Step 2 - Administrator CMD commands
two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file
My data come from a global database and often have lettering that will not work with the default codec
chcp 65001
set PYTHONIOENCODING=utf-8
python Syntaxnameofscript.py > OutputdatasetEDA.txt
Issues
- have to create customized script for each dataset
- have to run cmd commands for each dataset
- output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.
Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory
python
New contributor
My goal is to generate EDA reports for new datasets.
Step 1 - python script
To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.
import pandas as pd
def EDA(df, name):
'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''
df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')
path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')
# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')
Step 2 - Administrator CMD commands
two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file
My data come from a global database and often have lettering that will not work with the default codec
chcp 65001
set PYTHONIOENCODING=utf-8
python Syntaxnameofscript.py > OutputdatasetEDA.txt
Issues
- have to create customized script for each dataset
- have to run cmd commands for each dataset
- output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.
Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory
python
python
New contributor
New contributor
edited yesterday
New contributor
asked 2 days ago
Andrew
1064
1064
New contributor
New contributor
put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill
If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill
If this question can be reworded to fit the rules in the help center, please edit the question.
2
Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday
1
Also, what ischcp
(I have no idea, since I am on a UNIX system) and why do you need to setPYTHONIOENCODING
if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday
EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday
Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday
First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday
|
show 1 more comment
2
Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday
1
Also, what ischcp
(I have no idea, since I am on a UNIX system) and why do you need to setPYTHONIOENCODING
if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday
EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday
Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday
First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday
2
2
Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday
Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday
1
1
Also, what is
chcp
(I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING
if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?– Graipher
yesterday
Also, what is
chcp
(I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING
if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?– Graipher
yesterday
EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday
EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday
Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday
Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday
First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday
First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday
|
show 1 more comment
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
2
Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday
1
Also, what is
chcp
(I have no idea, since I am on a UNIX system) and why do you need to setPYTHONIOENCODING
if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?– Graipher
yesterday
EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday
Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday
First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday