Taking text from a file and formatting it

up vote
0
down vote

favorite

My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.

#reading in workload data

def getworkload():

    work = 

    strings = 

    with open("workload.txt") as f:

        read_data = f.read()

        jobs = read_data.split("n")

    for j in jobs:

        strings.append(" ".join(j.split()))

    for i in strings:

        work.append([float(s) for s in i.split(" ")])

    return work



print(getworkload())

The text file is over 2000 lines long, and looks like this:

    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    2        0 2265800 251924  640   3124   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    3        1 3114175     -1  640     -1   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    4  1813487   7481     -1  128     -1 20250   -1     -1    -1  5   3   1   5  8 -1 -1 -1

    5  1814044      0    122  512   1.13  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    6  1814374      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    7  1814511      0     55  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    8  1814695      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    9  1815198      0     75  512   2.14  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

   10  1815617      0    115  512   1.87  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    …

It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?

edited Nov 13 at 7:15

200_success

127k15148410

asked Nov 12 at 10:59

timtti

1

Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
– Calak
Nov 12 at 11:07

The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
– timtti
Nov 12 at 11:11

1

"If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
– Calak
Nov 12 at 11:26

add a comment |

up vote
0
down vote

favorite

My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.

#reading in workload data

def getworkload():

    work = 

    strings = 

    with open("workload.txt") as f:

        read_data = f.read()

        jobs = read_data.split("n")

    for j in jobs:

        strings.append(" ".join(j.split()))

    for i in strings:

        work.append([float(s) for s in i.split(" ")])

    return work



print(getworkload())

The text file is over 2000 lines long, and looks like this:

    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    2        0 2265800 251924  640   3124   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    3        1 3114175     -1  640     -1   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    4  1813487   7481     -1  128     -1 20250   -1     -1    -1  5   3   1   5  8 -1 -1 -1

    5  1814044      0    122  512   1.13  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    6  1814374      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    7  1814511      0     55  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    8  1814695      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    9  1815198      0     75  512   2.14  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

   10  1815617      0    115  512   1.87  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    …

It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?

edited Nov 13 at 7:15

200_success

127k15148410

asked Nov 12 at 10:59

timtti

1

Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
– Calak
Nov 12 at 11:07

The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
– timtti
Nov 12 at 11:11

1

"If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
– Calak
Nov 12 at 11:26

add a comment |

up vote
0
down vote

favorite

My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.

#reading in workload data

def getworkload():

    work = 

    strings = 

    with open("workload.txt") as f:

        read_data = f.read()

        jobs = read_data.split("n")

    for j in jobs:

        strings.append(" ".join(j.split()))

    for i in strings:

        work.append([float(s) for s in i.split(" ")])

    return work



print(getworkload())

The text file is over 2000 lines long, and looks like this:

    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    2        0 2265800 251924  640   3124   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    3        1 3114175     -1  640     -1   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    4  1813487   7481     -1  128     -1 20250   -1     -1    -1  5   3   1   5  8 -1 -1 -1

    5  1814044      0    122  512   1.13  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    6  1814374      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    7  1814511      0     55  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    8  1814695      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    9  1815198      0     75  512   2.14  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

   10  1815617      0    115  512   1.87  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    …

It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?

edited Nov 13 at 7:15

200_success

127k15148410

asked Nov 12 at 10:59

timtti

My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.

#reading in workload data

def getworkload():

    work = 

    strings = 

    with open("workload.txt") as f:

        read_data = f.read()

        jobs = read_data.split("n")

    for j in jobs:

        strings.append(" ".join(j.split()))

    for i in strings:

        work.append([float(s) for s in i.split(" ")])

    return work



print(getworkload())

The text file is over 2000 lines long, and looks like this:

    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    2        0 2265800 251924  640   3124   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    3        1 3114175     -1  640     -1   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1

    4  1813487   7481     -1  128     -1 20250   -1     -1    -1  5   3   1   5  8 -1 -1 -1

    5  1814044      0    122  512   1.13  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    6  1814374      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    7  1814511      0     55  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    8  1814695      1     51  512     -1  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

    9  1815198      0     75  512   2.14  1181   -1     -1    -1  1   1   1   2  9 -1 -1 -1

   10  1815617      0    115  512   1.87  1181   -1     -1    -1  1   1   1   1  9 -1 -1 -1

    …

It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?

python performance csv formatting

edited Nov 13 at 7:15

200_success

127k15148410

asked Nov 12 at 10:59

timtti

edited Nov 13 at 7:15

200_success

127k15148410

asked Nov 12 at 10:59

timtti

edited Nov 13 at 7:15

200_success

127k15148410

edited Nov 13 at 7:15

200_success

127k15148410

edited Nov 13 at 7:15

200_success

127k15148410

asked Nov 12 at 10:59

timtti

asked Nov 12 at 10:59

timtti

asked Nov 12 at 10:59

timtti

1

Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
– Calak
Nov 12 at 11:07

The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
– timtti
Nov 12 at 11:11

1

"If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
– Calak
Nov 12 at 11:26

add a comment |

1

Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
– Calak
Nov 12 at 11:07

The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
– timtti
Nov 12 at 11:11

1

"If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
– Calak
Nov 12 at 11:26

Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
– Calak
Nov 12 at 11:07

The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
– timtti
Nov 12 at 11:11

"If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
– Calak
Nov 12 at 11:26

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?

Instead, here is a list comprehension that should do the same thing:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        return [[float(x) for x in row.split()] for row in f]

This uses the fact that files are iterable and when iterating over them you get each row on its own.

If this is still too slow (or e.g. too large to fit into memory), then you need to process each row separately. For this you would make this a generator of processed lines:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        for row in f:

            yield [float(x) for x in row.split()]

edited Nov 15 at 16:13

answered Nov 12 at 15:51

Graipher

22k53183

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207469%2ftaking-text-from-a-file-and-formatting-it%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?

Instead, here is a list comprehension that should do the same thing:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        return [[float(x) for x in row.split()] for row in f]

This uses the fact that files are iterable and when iterating over them you get each row on its own.

If this is still too slow (or e.g. too large to fit into memory), then you need to process each row separately. For this you would make this a generator of processed lines:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        for row in f:

            yield [float(x) for x in row.split()]

edited Nov 15 at 16:13

answered Nov 12 at 15:51

Graipher

22k53183

add a comment |

up vote
1
down vote

accepted

You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?

Instead, here is a list comprehension that should do the same thing:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        return [[float(x) for x in row.split()] for row in f]

This uses the fact that files are iterable and when iterating over them you get each row on its own.

If this is still too slow (or e.g. too large to fit into memory), then you need to process each row separately. For this you would make this a generator of processed lines:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        for row in f:

            yield [float(x) for x in row.split()]

edited Nov 15 at 16:13

answered Nov 12 at 15:51

Graipher

22k53183

add a comment |

up vote
1
down vote

accepted

You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?

Instead, here is a list comprehension that should do the same thing:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        return [[float(x) for x in row.split()] for row in f]

This uses the fact that files are iterable and when iterating over them you get each row on its own.

If this is still too slow (or e.g. too large to fit into memory), then you need to process each row separately. For this you would make this a generator of processed lines:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        for row in f:

            yield [float(x) for x in row.split()]

edited Nov 15 at 16:13

answered Nov 12 at 15:51

Graipher

22k53183

You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?

Instead, here is a list comprehension that should do the same thing:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        return [[float(x) for x in row.split()] for row in f]

This uses the fact that files are iterable and when iterating over them you get each row on its own.

If this is still too slow (or e.g. too large to fit into memory), then you need to process each row separately. For this you would make this a generator of processed lines:

def get_workload(file_name="workload.txt"):

    with open(file_name) as f:

        for row in f:

            yield [float(x) for x in row.split()]

edited Nov 15 at 16:13

answered Nov 12 at 15:51

Graipher

22k53183

edited Nov 15 at 16:13

answered Nov 12 at 15:51

Graipher

22k53183

answered Nov 12 at 15:51

Graipher

22k53183

answered Nov 12 at 15:51

Graipher

22k53183

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk