Should computer code be included within publications that present numerical results?
up vote
24
down vote
favorite
Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?
I've heard arguments from both sides:
- Including such code can greatly decrease the time taken by a referee to replicate the results,
- The code can be easily modified by further authors who wish to extend the result,
- The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.
On the other hand,
- Pages of code degrade the aesthetic nature of the publication,
- The author might need to spend additional space explaining the coding decisions that were made in the algorithms,
- It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.
So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?
soft-question na.numerical-analysis journals
|
show 4 more comments
up vote
24
down vote
favorite
Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?
I've heard arguments from both sides:
- Including such code can greatly decrease the time taken by a referee to replicate the results,
- The code can be easily modified by further authors who wish to extend the result,
- The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.
On the other hand,
- Pages of code degrade the aesthetic nature of the publication,
- The author might need to spend additional space explaining the coding decisions that were made in the algorithms,
- It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.
So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?
soft-question na.numerical-analysis journals
16
Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
2 days ago
8
@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Flermat
2 days ago
6
Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
2 days ago
1
@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
2 days ago
2
@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
2 days ago
|
show 4 more comments
up vote
24
down vote
favorite
up vote
24
down vote
favorite
Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?
I've heard arguments from both sides:
- Including such code can greatly decrease the time taken by a referee to replicate the results,
- The code can be easily modified by further authors who wish to extend the result,
- The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.
On the other hand,
- Pages of code degrade the aesthetic nature of the publication,
- The author might need to spend additional space explaining the coding decisions that were made in the algorithms,
- It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.
So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?
soft-question na.numerical-analysis journals
Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?
I've heard arguments from both sides:
- Including such code can greatly decrease the time taken by a referee to replicate the results,
- The code can be easily modified by further authors who wish to extend the result,
- The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.
On the other hand,
- Pages of code degrade the aesthetic nature of the publication,
- The author might need to spend additional space explaining the coding decisions that were made in the algorithms,
- It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.
So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?
soft-question na.numerical-analysis journals
soft-question na.numerical-analysis journals
asked 2 days ago
community wiki
Flermat
16
Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
2 days ago
8
@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Flermat
2 days ago
6
Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
2 days ago
1
@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
2 days ago
2
@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
2 days ago
|
show 4 more comments
16
Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
2 days ago
8
@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Flermat
2 days ago
6
Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
2 days ago
1
@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
2 days ago
2
@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
2 days ago
16
16
Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
2 days ago
Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
2 days ago
8
8
@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Flermat
2 days ago
@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Flermat
2 days ago
6
6
Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
2 days ago
Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
2 days ago
1
1
@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
2 days ago
@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
2 days ago
2
2
@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
2 days ago
@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
2 days ago
|
show 4 more comments
5 Answers
5
active
oldest
votes
up vote
21
down vote
My answer is:
Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.
- The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.
- All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)
In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!
The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.
6
I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
– Neil Strickland
yesterday
1
Why specifically github? Anyone sharing code must have a github account and use git?
– Dror Speiser
yesterday
@Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
– rubenvb
yesterday
2
@rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
– JAB
yesterday
@DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
– Paul Siegel
yesterday
add a comment |
up vote
14
down vote
At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.
Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:
- hosting it on your institutional page
- offering to share the source code to interested parties via e-mail
- having a Github repository
- including it into the Arxiv version of your paper as an ancillary_file
- sharing it on Zenodo.
If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.
These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
– Flermat
2 days ago
2
@Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
– Federico Poloni
2 days ago
@FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
– Flermat
2 days ago
8
@Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
– Federico Poloni
2 days ago
add a comment |
up vote
8
down vote
I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?
One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.
11
Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
– Gerhard Paseman
2 days ago
3
Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
– Federico Poloni
2 days ago
8
Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
– Timothy Chow
2 days ago
5
@GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
– Timothy Chow
2 days ago
3
@AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
– Timothy Chow
2 days ago
|
show 11 more comments
up vote
3
down vote
This is more of a very extended comment than a complete answer.
I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?
Let me suggest that we need to understand a few things:
- The advantages
- The disadvantages
- Is there a real problem with reproducibility that needs fixed?
- The cost of not doing so or of doing so halfheartedly.
- The opportunity cost or motivational/ funding challenges
- Variation between sub fields
- Technical challenges, short and long term
- Expectations or even standards
- Cultural challenges
I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:
- In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.
- Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?
- There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?
- What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.
- Do people feel that time spent publishing code would be unproductive?
- A software engineering style code review is not anonymous (at least usually); is this a problem?
- There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.
- Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.
add a comment |
up vote
2
down vote
There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.
So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.
So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").
Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
21
down vote
My answer is:
Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.
- The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.
- All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)
In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!
The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.
6
I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
– Neil Strickland
yesterday
1
Why specifically github? Anyone sharing code must have a github account and use git?
– Dror Speiser
yesterday
@Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
– rubenvb
yesterday
2
@rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
– JAB
yesterday
@DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
– Paul Siegel
yesterday
add a comment |
up vote
21
down vote
My answer is:
Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.
- The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.
- All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)
In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!
The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.
6
I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
– Neil Strickland
yesterday
1
Why specifically github? Anyone sharing code must have a github account and use git?
– Dror Speiser
yesterday
@Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
– rubenvb
yesterday
2
@rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
– JAB
yesterday
@DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
– Paul Siegel
yesterday
add a comment |
up vote
21
down vote
up vote
21
down vote
My answer is:
Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.
- The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.
- All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)
In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!
The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.
My answer is:
Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.
- The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.
- All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)
In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!
The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.
answered 2 days ago
community wiki
Paul Siegel
6
I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
– Neil Strickland
yesterday
1
Why specifically github? Anyone sharing code must have a github account and use git?
– Dror Speiser
yesterday
@Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
– rubenvb
yesterday
2
@rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
– JAB
yesterday
@DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
– Paul Siegel
yesterday
add a comment |
6
I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
– Neil Strickland
yesterday
1
Why specifically github? Anyone sharing code must have a github account and use git?
– Dror Speiser
yesterday
@Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
– rubenvb
yesterday
2
@rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
– JAB
yesterday
@DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
– Paul Siegel
yesterday
6
6
I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
– Neil Strickland
yesterday
I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
– Neil Strickland
yesterday
1
1
Why specifically github? Anyone sharing code must have a github account and use git?
– Dror Speiser
yesterday
Why specifically github? Anyone sharing code must have a github account and use git?
– Dror Speiser
yesterday
@Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
– rubenvb
yesterday
@Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
– rubenvb
yesterday
2
2
@rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
– JAB
yesterday
@rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
– JAB
yesterday
@DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
– Paul Siegel
yesterday
@DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
– Paul Siegel
yesterday
add a comment |
up vote
14
down vote
At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.
Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:
- hosting it on your institutional page
- offering to share the source code to interested parties via e-mail
- having a Github repository
- including it into the Arxiv version of your paper as an ancillary_file
- sharing it on Zenodo.
If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.
These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
– Flermat
2 days ago
2
@Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
– Federico Poloni
2 days ago
@FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
– Flermat
2 days ago
8
@Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
– Federico Poloni
2 days ago
add a comment |
up vote
14
down vote
At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.
Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:
- hosting it on your institutional page
- offering to share the source code to interested parties via e-mail
- having a Github repository
- including it into the Arxiv version of your paper as an ancillary_file
- sharing it on Zenodo.
If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.
These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
– Flermat
2 days ago
2
@Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
– Federico Poloni
2 days ago
@FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
– Flermat
2 days ago
8
@Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
– Federico Poloni
2 days ago
add a comment |
up vote
14
down vote
up vote
14
down vote
At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.
Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:
- hosting it on your institutional page
- offering to share the source code to interested parties via e-mail
- having a Github repository
- including it into the Arxiv version of your paper as an ancillary_file
- sharing it on Zenodo.
If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.
At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.
Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:
- hosting it on your institutional page
- offering to share the source code to interested parties via e-mail
- having a Github repository
- including it into the Arxiv version of your paper as an ancillary_file
- sharing it on Zenodo.
If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.
answered 2 days ago
community wiki
Federico Poloni
These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
– Flermat
2 days ago
2
@Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
– Federico Poloni
2 days ago
@FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
– Flermat
2 days ago
8
@Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
– Federico Poloni
2 days ago
add a comment |
These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
– Flermat
2 days ago
2
@Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
– Federico Poloni
2 days ago
@FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
– Flermat
2 days ago
8
@Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
– Federico Poloni
2 days ago
These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
– Flermat
2 days ago
These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
– Flermat
2 days ago
2
2
@Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
– Federico Poloni
2 days ago
@Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
– Federico Poloni
2 days ago
@FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
– Flermat
2 days ago
@FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
– Flermat
2 days ago
8
8
@Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
– Federico Poloni
2 days ago
@Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
– Federico Poloni
2 days ago
add a comment |
up vote
8
down vote
I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?
One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.
11
Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
– Gerhard Paseman
2 days ago
3
Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
– Federico Poloni
2 days ago
8
Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
– Timothy Chow
2 days ago
5
@GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
– Timothy Chow
2 days ago
3
@AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
– Timothy Chow
2 days ago
|
show 11 more comments
up vote
8
down vote
I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?
One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.
11
Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
– Gerhard Paseman
2 days ago
3
Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
– Federico Poloni
2 days ago
8
Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
– Timothy Chow
2 days ago
5
@GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
– Timothy Chow
2 days ago
3
@AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
– Timothy Chow
2 days ago
|
show 11 more comments
up vote
8
down vote
up vote
8
down vote
I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?
One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.
I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?
One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.
answered 2 days ago
community wiki
Timothy Chow
11
Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
– Gerhard Paseman
2 days ago
3
Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
– Federico Poloni
2 days ago
8
Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
– Timothy Chow
2 days ago
5
@GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
– Timothy Chow
2 days ago
3
@AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
– Timothy Chow
2 days ago
|
show 11 more comments
11
Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
– Gerhard Paseman
2 days ago
3
Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
– Federico Poloni
2 days ago
8
Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
– Timothy Chow
2 days ago
5
@GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
– Timothy Chow
2 days ago
3
@AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
– Timothy Chow
2 days ago
11
11
Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
– Gerhard Paseman
2 days ago
Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
– Gerhard Paseman
2 days ago
3
3
Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
– Federico Poloni
2 days ago
Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
– Federico Poloni
2 days ago
8
8
Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
– Timothy Chow
2 days ago
Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
– Timothy Chow
2 days ago
5
5
@GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
– Timothy Chow
2 days ago
@GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
– Timothy Chow
2 days ago
3
3
@AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
– Timothy Chow
2 days ago
@AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
– Timothy Chow
2 days ago
|
show 11 more comments
up vote
3
down vote
This is more of a very extended comment than a complete answer.
I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?
Let me suggest that we need to understand a few things:
- The advantages
- The disadvantages
- Is there a real problem with reproducibility that needs fixed?
- The cost of not doing so or of doing so halfheartedly.
- The opportunity cost or motivational/ funding challenges
- Variation between sub fields
- Technical challenges, short and long term
- Expectations or even standards
- Cultural challenges
I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:
- In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.
- Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?
- There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?
- What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.
- Do people feel that time spent publishing code would be unproductive?
- A software engineering style code review is not anonymous (at least usually); is this a problem?
- There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.
- Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.
add a comment |
up vote
3
down vote
This is more of a very extended comment than a complete answer.
I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?
Let me suggest that we need to understand a few things:
- The advantages
- The disadvantages
- Is there a real problem with reproducibility that needs fixed?
- The cost of not doing so or of doing so halfheartedly.
- The opportunity cost or motivational/ funding challenges
- Variation between sub fields
- Technical challenges, short and long term
- Expectations or even standards
- Cultural challenges
I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:
- In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.
- Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?
- There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?
- What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.
- Do people feel that time spent publishing code would be unproductive?
- A software engineering style code review is not anonymous (at least usually); is this a problem?
- There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.
- Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.
add a comment |
up vote
3
down vote
up vote
3
down vote
This is more of a very extended comment than a complete answer.
I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?
Let me suggest that we need to understand a few things:
- The advantages
- The disadvantages
- Is there a real problem with reproducibility that needs fixed?
- The cost of not doing so or of doing so halfheartedly.
- The opportunity cost or motivational/ funding challenges
- Variation between sub fields
- Technical challenges, short and long term
- Expectations or even standards
- Cultural challenges
I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:
- In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.
- Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?
- There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?
- What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.
- Do people feel that time spent publishing code would be unproductive?
- A software engineering style code review is not anonymous (at least usually); is this a problem?
- There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.
- Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.
This is more of a very extended comment than a complete answer.
I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?
Let me suggest that we need to understand a few things:
- The advantages
- The disadvantages
- Is there a real problem with reproducibility that needs fixed?
- The cost of not doing so or of doing so halfheartedly.
- The opportunity cost or motivational/ funding challenges
- Variation between sub fields
- Technical challenges, short and long term
- Expectations or even standards
- Cultural challenges
I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:
- In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.
- Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?
- There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?
- What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.
- Do people feel that time spent publishing code would be unproductive?
- A software engineering style code review is not anonymous (at least usually); is this a problem?
- There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.
- Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.
answered 2 days ago
community wiki
Keith
add a comment |
add a comment |
up vote
2
down vote
There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.
So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.
So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").
Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.
add a comment |
up vote
2
down vote
There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.
So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.
So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").
Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.
add a comment |
up vote
2
down vote
up vote
2
down vote
There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.
So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.
So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").
Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.
There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.
So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.
So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").
Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.
answered yesterday
community wiki
Brendan McKay
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316155%2fshould-computer-code-be-included-within-publications-that-present-numerical-resu%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
16
Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
2 days ago
8
@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Flermat
2 days ago
6
Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
2 days ago
1
@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
2 days ago
2
@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
2 days ago