Python Pandas exploratory data analysis - output in Windows [on hold]











up vote
1
down vote

favorite












My goal is to generate EDA reports for new datasets.



Step 1 - python script



To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.



import pandas as pd


def EDA(df, name):

'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''

df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')

path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')

# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')


Step 2 - Administrator CMD commands



two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file



My data come from a global database and often have lettering that will not work with the default codec



chcp 65001
set PYTHONIOENCODING=utf-8

python Syntaxnameofscript.py > OutputdatasetEDA.txt


Issues




  1. have to create customized script for each dataset

  2. have to run cmd commands for each dataset

  3. output is one txt per dataset
    Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
    The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.


Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory










share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 2




    Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
    – Toby Speight
    yesterday






  • 1




    Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
    – Graipher
    yesterday










  • EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
    – Andrew
    yesterday










  • Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
    – Andrew
    yesterday










  • First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
    – Andrew
    yesterday















up vote
1
down vote

favorite












My goal is to generate EDA reports for new datasets.



Step 1 - python script



To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.



import pandas as pd


def EDA(df, name):

'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''

df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')

path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')

# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')


Step 2 - Administrator CMD commands



two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file



My data come from a global database and often have lettering that will not work with the default codec



chcp 65001
set PYTHONIOENCODING=utf-8

python Syntaxnameofscript.py > OutputdatasetEDA.txt


Issues




  1. have to create customized script for each dataset

  2. have to run cmd commands for each dataset

  3. output is one txt per dataset
    Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
    The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.


Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory










share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 2




    Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
    – Toby Speight
    yesterday






  • 1




    Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
    – Graipher
    yesterday










  • EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
    – Andrew
    yesterday










  • Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
    – Andrew
    yesterday










  • First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
    – Andrew
    yesterday













up vote
1
down vote

favorite









up vote
1
down vote

favorite











My goal is to generate EDA reports for new datasets.



Step 1 - python script



To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.



import pandas as pd


def EDA(df, name):

'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''

df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')

path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')

# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')


Step 2 - Administrator CMD commands



two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file



My data come from a global database and often have lettering that will not work with the default codec



chcp 65001
set PYTHONIOENCODING=utf-8

python Syntaxnameofscript.py > OutputdatasetEDA.txt


Issues




  1. have to create customized script for each dataset

  2. have to run cmd commands for each dataset

  3. output is one txt per dataset
    Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
    The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.


Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory










share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











My goal is to generate EDA reports for new datasets.



Step 1 - python script



To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.



import pandas as pd


def EDA(df, name):

'''
name == string version of df's name
provides count, unique
to use:
1. run cmd as administrator
2. in cmd run
chcp 65001 # enables utf-8 to display in the cmd console
set PYTHONIOENCODING=utf-8 # enables utf-8 output
3. python scriptpath.py > Output/dfnameEDA.txt
'''

df.name = name
print('#{}n'.format(df.name))
for col in df.columns:
print('#{}n'.format(col))
print(df[col].describe())
print('n')
print(df[col].value_counts(dropna=False))
print('n')

path = 'Data/active.csv'
active = pd.read_csv(path, encoding='utf-8')

# runs for 'active' dataset. must change for each dataset in directory
if __name__=='__main__':
EDA(active, name='active')


Step 2 - Administrator CMD commands



two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file



My data come from a global database and often have lettering that will not work with the default codec



chcp 65001
set PYTHONIOENCODING=utf-8

python Syntaxnameofscript.py > OutputdatasetEDA.txt


Issues




  1. have to create customized script for each dataset

  2. have to run cmd commands for each dataset

  3. output is one txt per dataset
    Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
    The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.


Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory







python






share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday





















New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Andrew

1064




1064




New contributor




Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.




put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.








  • 2




    Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
    – Toby Speight
    yesterday






  • 1




    Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
    – Graipher
    yesterday










  • EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
    – Andrew
    yesterday










  • Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
    – Andrew
    yesterday










  • First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
    – Andrew
    yesterday














  • 2




    Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
    – Toby Speight
    yesterday






  • 1




    Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
    – Graipher
    yesterday










  • EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
    – Andrew
    yesterday










  • Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
    – Andrew
    yesterday










  • First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
    – Andrew
    yesterday








2




2




Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday




Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday




1




1




Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday




Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday












EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday




EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday












Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday




Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday












First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday




First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday















active

oldest

votes






















active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes

Popular posts from this blog

Morgemoulin

Scott Moir

Souastre