Python Pandas exploratory data analysis - output in Windows [on hold]

up vote
1
down vote

favorite

My goal is to generate EDA reports for new datasets.

Step 1 - python script

To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.

import pandas as pd





def EDA(df, name):



    '''

    name == string version of df's name

    provides count, unique

        to use:

        1. run cmd as administrator

        2. in cmd run 

            chcp 65001                   # enables utf-8 to display in the cmd console

            set PYTHONIOENCODING=utf-8   # enables utf-8 output

        3. python scriptpath.py > Output/dfnameEDA.txt

    '''



    df.name = name

    print('#{}n'.format(df.name))

    for col in df.columns:

        print('#{}n'.format(col))

        print(df[col].describe())

        print('n')

        print(df[col].value_counts(dropna=False))

        print('n')



path = 'Data/active.csv'

active = pd.read_csv(path, encoding='utf-8')



# runs for 'active' dataset. must change for each dataset in directory

if __name__=='__main__':

    EDA(active, name='active')

Step 2 - Administrator CMD commands

two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file

My data come from a global database and often have lettering that will not work with the default codec

chcp 65001

set PYTHONIOENCODING=utf-8



python Syntaxnameofscript.py > OutputdatasetEDA.txt

Issues

have to create customized script for each dataset

have to run cmd commands for each dataset

output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.

Overall, it just seems like I shouldn't be doing so many repetitive steps to get this to work. I would ultimately like to run a single command in a directory to create a single EDA report for all datafiles in that directory

edited yesterday

asked 2 days ago

Andrew

1064

New contributor

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.

2

Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday

1

Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday

EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday

Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday

First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday

|
show 1 more comment

up vote
1
down vote

favorite

My goal is to generate EDA reports for new datasets.

Step 1 - python script

To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.

import pandas as pd





def EDA(df, name):



    '''

    name == string version of df's name

    provides count, unique

        to use:

        1. run cmd as administrator

        2. in cmd run 

            chcp 65001                   # enables utf-8 to display in the cmd console

            set PYTHONIOENCODING=utf-8   # enables utf-8 output

        3. python scriptpath.py > Output/dfnameEDA.txt

    '''



    df.name = name

    print('#{}n'.format(df.name))

    for col in df.columns:

        print('#{}n'.format(col))

        print(df[col].describe())

        print('n')

        print(df[col].value_counts(dropna=False))

        print('n')



path = 'Data/active.csv'

active = pd.read_csv(path, encoding='utf-8')



# runs for 'active' dataset. must change for each dataset in directory

if __name__=='__main__':

    EDA(active, name='active')

Step 2 - Administrator CMD commands

two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file

My data come from a global database and often have lettering that will not work with the default codec

chcp 65001

set PYTHONIOENCODING=utf-8



python Syntaxnameofscript.py > OutputdatasetEDA.txt

Issues

have to create customized script for each dataset

have to run cmd commands for each dataset

output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.

edited yesterday

asked 2 days ago

Andrew

1064

New contributor

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.

2

Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday

1

Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday

EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday

Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday

First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday

|
show 1 more comment

up vote
1
down vote

favorite

My goal is to generate EDA reports for new datasets.

Step 1 - python script

To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.

import pandas as pd





def EDA(df, name):



    '''

    name == string version of df's name

    provides count, unique

        to use:

        1. run cmd as administrator

        2. in cmd run 

            chcp 65001                   # enables utf-8 to display in the cmd console

            set PYTHONIOENCODING=utf-8   # enables utf-8 output

        3. python scriptpath.py > Output/dfnameEDA.txt

    '''



    df.name = name

    print('#{}n'.format(df.name))

    for col in df.columns:

        print('#{}n'.format(col))

        print(df[col].describe())

        print('n')

        print(df[col].value_counts(dropna=False))

        print('n')



path = 'Data/active.csv'

active = pd.read_csv(path, encoding='utf-8')



# runs for 'active' dataset. must change for each dataset in directory

if __name__=='__main__':

    EDA(active, name='active')

Step 2 - Administrator CMD commands

two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file

My data come from a global database and often have lettering that will not work with the default codec

chcp 65001

set PYTHONIOENCODING=utf-8



python Syntaxnameofscript.py > OutputdatasetEDA.txt

Issues

have to create customized script for each dataset

have to run cmd commands for each dataset

output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.

edited yesterday

asked 2 days ago

Andrew

1064

New contributor

My goal is to generate EDA reports for new datasets.

Step 1 - python script

To accomplish this, I have written a function in python which is embedded in a script that 1. loads pandas, 2. loads a data set, 3. calls the function on that dataset.

import pandas as pd





def EDA(df, name):



    '''

    name == string version of df's name

    provides count, unique

        to use:

        1. run cmd as administrator

        2. in cmd run 

            chcp 65001                   # enables utf-8 to display in the cmd console

            set PYTHONIOENCODING=utf-8   # enables utf-8 output

        3. python scriptpath.py > Output/dfnameEDA.txt

    '''



    df.name = name

    print('#{}n'.format(df.name))

    for col in df.columns:

        print('#{}n'.format(col))

        print(df[col].describe())

        print('n')

        print(df[col].value_counts(dropna=False))

        print('n')



path = 'Data/active.csv'

active = pd.read_csv(path, encoding='utf-8')



# runs for 'active' dataset. must change for each dataset in directory

if __name__=='__main__':

    EDA(active, name='active')

Step 2 - Administrator CMD commands

two parts here: 1. change output encoding to utf-8, 2. run script and save output as a txt file

My data come from a global database and often have lettering that will not work with the default codec

chcp 65001

set PYTHONIOENCODING=utf-8



python Syntaxnameofscript.py > OutputdatasetEDA.txt

Issues

have to create customized script for each dataset

have to run cmd commands for each dataset

output is one txt per dataset
Ideally I could generate a .doc file or at least a single .txt file for all datasets in the directory.
The reason for this is that I would like to expand the function to include histograms and other graphics that will not work with the .txt format.

python

edited yesterday

asked 2 days ago

Andrew

1064

New contributor

edited yesterday

asked 2 days ago

Andrew

1064

New contributor

edited yesterday

asked 2 days ago

Andrew

1064

New contributor

asked 2 days ago

Andrew

1064

asked 2 days ago

Andrew

1064

New contributor

Andrew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – 200_success, Quill

If this question can be reworded to fit the rules in the help center, please edit the question.

2

Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday

1

Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday

EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday

Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday

First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday

|
show 1 more comment

2

Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday

1

Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday

EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday

Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday

First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday

Could you please expand (for the ignorant such as myself) what you mean by "EDA"? I'm guessing it's not European Dairy Association, but beyond that, it's hard to guess what is meant (perhaps it's standard jargon in your field, but that doesn't help reviewers not immersed in your world!).
– Toby Speight
yesterday

Also, what is chcp (I have no idea, since I am on a UNIX system) and why do you need to set PYTHONIOENCODING if it is used nowhere in your script (and you even explicitly set the encoding again when reading the file)?
– Graipher
yesterday

EDA means exploratory data analysis. It's the step of getting to know a dataset before creating new variables and doing inferential statistics.
– Andrew
yesterday

Chcp is a command that allows utf8 to display in the console. The PYTHONIOENCODING is necessary for it to use utf8 when writing to the output file. The call inside the oy script makes sure that it reads utf8 when loading the dataset to memory, but the program will throw an error when writing to txt unless I also set the IO.
– Andrew
yesterday

First I import the dataset and give it a name then enter that name into the function -- this could probably be automated. Next, in order to save the output to a txt file--which is my real goal here, I dont just want it to print to console--I have to run the script from cmd using the `python scrpt.py > output.txt format
– Andrew
yesterday

|
show 1 more comment

active

oldest

votes

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk

Python Pandas exploratory data analysis - output in Windows [on hold]

Step 1 - python script

Step 2 - Administrator CMD commands

Issues

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

Step 1 - python script

Step 2 - Administrator CMD commands

Issues

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

Step 1 - python script

Step 2 - Administrator CMD commands

Issues

Step 1 - python script

Step 2 - Administrator CMD commands

Issues

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

put on hold as off-topic by 200_success, Quill, Toby Speight, Graipher, t3chb0t yesterday

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”