Cosine Similarity on Huge Dataset











up vote
8
down vote

favorite
1












I have a very large data file full of movie ratings that I am looking at for work. I wanted to do this in a clean and very effective manner. The ratings file contains on a per column by column basis:



userID, movieID, rating...



I have parsed the files, and I am now trying to compute cosine similarity of all 100,000 ratings for each movie. Thus, I'm using an ADT Hashmap to store the values of the ratings of each movie as follows HashMap. For each 1000 or so movie, I'm to compute the cosine Similarity. This is what i have done so far, what do you guys think?



import java.util.*;
import java.io.*;

public class MovieRatingParser {
static HashMap<String, Double> ratings = new HashMap<>();

public void parseMovieFile() throws FileNotFoundException, IOException {
//Create an ArrayList to store movies
ArrayList<Movie> movies = new ArrayList<Movie>();
try {
//Create a buffered file reader for FileReader to read in movies.dat
BufferedReader br = new BufferedReader(new FileReader("movies.dat"));

String readFile = br.readLine();
while (readFile != null) {
//Use String split delimiter to load each movie one by one
//File delimiter is “\|"
String tokenDelimiter = readFile.split("\|");
String movieID = tokenDelimiter[0];
String movieTitle = tokenDelimiter[1];


Movie movieToAdd = new Movie(movieID, movieTitle);
movies.add(movieToAdd);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("file was not Found!");
}
System.out.println("==============================================");

}

public static void parseRatingFile() throws FileNotFoundException, IOException{

try {
BufferedReader br = new BufferedReader(new FileReader("ratings.dat"));
String readFile = br.readLine();
while (readFile != null) {
String tokenDelimiter = readFile.split("\|");
String userID = tokenDelimiter[0];
String movieID = tokenDelimiter[1];
double rating = Double.parseDouble(tokenDelimiter[2]);

ratings.put(movieID, rating);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("File was not Found!");
}
}



public static double computeCosineSimilarity(HashMap<String, Double> movieA, HashMap<String, Double> movieB) {

double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
parseRatingFile();
for (int j = 0; j < ratings.size(); j++) {
movieA.put(ratings.get(3), ratings.values());
}

for (int i = 0; i < movieA.size(); i++) {
dotProduct += movieA[i] * movieB[i];
normA += Math.pow(movieA[i], 2);
normB += Math.pow(movieB[i], 2);
}

return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}




}


What can I do to improve the code? It looks very sloppy.










share|improve this question
















bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 3




    Math.sqrt(normA) * Math.sqrt(normB) == Math.sqrt(normA * normB)
    – coderodde
    May 3 '17 at 13:04






  • 1




    How exactly do you plan to use the method computeCosineSimilarity ? I mostly find it strange that you would call parseRatingFile() each time but store the result in a static list. Also, my IDE complains that the types are incorrect on the line movieA.put(ratings.get(3),ratings.values()); You sure this code works?
    – Imus
    May 3 '17 at 14:14










  • Not sure what the purpose is. What does cosine similarity of all rating for a movie represent?
    – paparazzo
    Jun 1 at 12:45















up vote
8
down vote

favorite
1












I have a very large data file full of movie ratings that I am looking at for work. I wanted to do this in a clean and very effective manner. The ratings file contains on a per column by column basis:



userID, movieID, rating...



I have parsed the files, and I am now trying to compute cosine similarity of all 100,000 ratings for each movie. Thus, I'm using an ADT Hashmap to store the values of the ratings of each movie as follows HashMap. For each 1000 or so movie, I'm to compute the cosine Similarity. This is what i have done so far, what do you guys think?



import java.util.*;
import java.io.*;

public class MovieRatingParser {
static HashMap<String, Double> ratings = new HashMap<>();

public void parseMovieFile() throws FileNotFoundException, IOException {
//Create an ArrayList to store movies
ArrayList<Movie> movies = new ArrayList<Movie>();
try {
//Create a buffered file reader for FileReader to read in movies.dat
BufferedReader br = new BufferedReader(new FileReader("movies.dat"));

String readFile = br.readLine();
while (readFile != null) {
//Use String split delimiter to load each movie one by one
//File delimiter is “\|"
String tokenDelimiter = readFile.split("\|");
String movieID = tokenDelimiter[0];
String movieTitle = tokenDelimiter[1];


Movie movieToAdd = new Movie(movieID, movieTitle);
movies.add(movieToAdd);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("file was not Found!");
}
System.out.println("==============================================");

}

public static void parseRatingFile() throws FileNotFoundException, IOException{

try {
BufferedReader br = new BufferedReader(new FileReader("ratings.dat"));
String readFile = br.readLine();
while (readFile != null) {
String tokenDelimiter = readFile.split("\|");
String userID = tokenDelimiter[0];
String movieID = tokenDelimiter[1];
double rating = Double.parseDouble(tokenDelimiter[2]);

ratings.put(movieID, rating);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("File was not Found!");
}
}



public static double computeCosineSimilarity(HashMap<String, Double> movieA, HashMap<String, Double> movieB) {

double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
parseRatingFile();
for (int j = 0; j < ratings.size(); j++) {
movieA.put(ratings.get(3), ratings.values());
}

for (int i = 0; i < movieA.size(); i++) {
dotProduct += movieA[i] * movieB[i];
normA += Math.pow(movieA[i], 2);
normB += Math.pow(movieB[i], 2);
}

return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}




}


What can I do to improve the code? It looks very sloppy.










share|improve this question
















bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 3




    Math.sqrt(normA) * Math.sqrt(normB) == Math.sqrt(normA * normB)
    – coderodde
    May 3 '17 at 13:04






  • 1




    How exactly do you plan to use the method computeCosineSimilarity ? I mostly find it strange that you would call parseRatingFile() each time but store the result in a static list. Also, my IDE complains that the types are incorrect on the line movieA.put(ratings.get(3),ratings.values()); You sure this code works?
    – Imus
    May 3 '17 at 14:14










  • Not sure what the purpose is. What does cosine similarity of all rating for a movie represent?
    – paparazzo
    Jun 1 at 12:45













up vote
8
down vote

favorite
1









up vote
8
down vote

favorite
1






1





I have a very large data file full of movie ratings that I am looking at for work. I wanted to do this in a clean and very effective manner. The ratings file contains on a per column by column basis:



userID, movieID, rating...



I have parsed the files, and I am now trying to compute cosine similarity of all 100,000 ratings for each movie. Thus, I'm using an ADT Hashmap to store the values of the ratings of each movie as follows HashMap. For each 1000 or so movie, I'm to compute the cosine Similarity. This is what i have done so far, what do you guys think?



import java.util.*;
import java.io.*;

public class MovieRatingParser {
static HashMap<String, Double> ratings = new HashMap<>();

public void parseMovieFile() throws FileNotFoundException, IOException {
//Create an ArrayList to store movies
ArrayList<Movie> movies = new ArrayList<Movie>();
try {
//Create a buffered file reader for FileReader to read in movies.dat
BufferedReader br = new BufferedReader(new FileReader("movies.dat"));

String readFile = br.readLine();
while (readFile != null) {
//Use String split delimiter to load each movie one by one
//File delimiter is “\|"
String tokenDelimiter = readFile.split("\|");
String movieID = tokenDelimiter[0];
String movieTitle = tokenDelimiter[1];


Movie movieToAdd = new Movie(movieID, movieTitle);
movies.add(movieToAdd);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("file was not Found!");
}
System.out.println("==============================================");

}

public static void parseRatingFile() throws FileNotFoundException, IOException{

try {
BufferedReader br = new BufferedReader(new FileReader("ratings.dat"));
String readFile = br.readLine();
while (readFile != null) {
String tokenDelimiter = readFile.split("\|");
String userID = tokenDelimiter[0];
String movieID = tokenDelimiter[1];
double rating = Double.parseDouble(tokenDelimiter[2]);

ratings.put(movieID, rating);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("File was not Found!");
}
}



public static double computeCosineSimilarity(HashMap<String, Double> movieA, HashMap<String, Double> movieB) {

double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
parseRatingFile();
for (int j = 0; j < ratings.size(); j++) {
movieA.put(ratings.get(3), ratings.values());
}

for (int i = 0; i < movieA.size(); i++) {
dotProduct += movieA[i] * movieB[i];
normA += Math.pow(movieA[i], 2);
normB += Math.pow(movieB[i], 2);
}

return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}




}


What can I do to improve the code? It looks very sloppy.










share|improve this question















I have a very large data file full of movie ratings that I am looking at for work. I wanted to do this in a clean and very effective manner. The ratings file contains on a per column by column basis:



userID, movieID, rating...



I have parsed the files, and I am now trying to compute cosine similarity of all 100,000 ratings for each movie. Thus, I'm using an ADT Hashmap to store the values of the ratings of each movie as follows HashMap. For each 1000 or so movie, I'm to compute the cosine Similarity. This is what i have done so far, what do you guys think?



import java.util.*;
import java.io.*;

public class MovieRatingParser {
static HashMap<String, Double> ratings = new HashMap<>();

public void parseMovieFile() throws FileNotFoundException, IOException {
//Create an ArrayList to store movies
ArrayList<Movie> movies = new ArrayList<Movie>();
try {
//Create a buffered file reader for FileReader to read in movies.dat
BufferedReader br = new BufferedReader(new FileReader("movies.dat"));

String readFile = br.readLine();
while (readFile != null) {
//Use String split delimiter to load each movie one by one
//File delimiter is “\|"
String tokenDelimiter = readFile.split("\|");
String movieID = tokenDelimiter[0];
String movieTitle = tokenDelimiter[1];


Movie movieToAdd = new Movie(movieID, movieTitle);
movies.add(movieToAdd);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("file was not Found!");
}
System.out.println("==============================================");

}

public static void parseRatingFile() throws FileNotFoundException, IOException{

try {
BufferedReader br = new BufferedReader(new FileReader("ratings.dat"));
String readFile = br.readLine();
while (readFile != null) {
String tokenDelimiter = readFile.split("\|");
String userID = tokenDelimiter[0];
String movieID = tokenDelimiter[1];
double rating = Double.parseDouble(tokenDelimiter[2]);

ratings.put(movieID, rating);
readFile = br.readLine();
}
br.close();
} catch (FileNotFoundException e) {
System.out.println("File was not Found!");
}
}



public static double computeCosineSimilarity(HashMap<String, Double> movieA, HashMap<String, Double> movieB) {

double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
parseRatingFile();
for (int j = 0; j < ratings.size(); j++) {
movieA.put(ratings.get(3), ratings.values());
}

for (int i = 0; i < movieA.size(); i++) {
dotProduct += movieA[i] * movieB[i];
normA += Math.pow(movieA[i], 2);
normB += Math.pow(movieB[i], 2);
}

return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}




}


What can I do to improve the code? It looks very sloppy.







java clustering data-mining






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 3 '17 at 12:50









200_success

127k15148412




127k15148412










asked May 3 '17 at 12:44









Al-geBra

411




411





bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 3




    Math.sqrt(normA) * Math.sqrt(normB) == Math.sqrt(normA * normB)
    – coderodde
    May 3 '17 at 13:04






  • 1




    How exactly do you plan to use the method computeCosineSimilarity ? I mostly find it strange that you would call parseRatingFile() each time but store the result in a static list. Also, my IDE complains that the types are incorrect on the line movieA.put(ratings.get(3),ratings.values()); You sure this code works?
    – Imus
    May 3 '17 at 14:14










  • Not sure what the purpose is. What does cosine similarity of all rating for a movie represent?
    – paparazzo
    Jun 1 at 12:45














  • 3




    Math.sqrt(normA) * Math.sqrt(normB) == Math.sqrt(normA * normB)
    – coderodde
    May 3 '17 at 13:04






  • 1




    How exactly do you plan to use the method computeCosineSimilarity ? I mostly find it strange that you would call parseRatingFile() each time but store the result in a static list. Also, my IDE complains that the types are incorrect on the line movieA.put(ratings.get(3),ratings.values()); You sure this code works?
    – Imus
    May 3 '17 at 14:14










  • Not sure what the purpose is. What does cosine similarity of all rating for a movie represent?
    – paparazzo
    Jun 1 at 12:45








3




3




Math.sqrt(normA) * Math.sqrt(normB) == Math.sqrt(normA * normB)
– coderodde
May 3 '17 at 13:04




Math.sqrt(normA) * Math.sqrt(normB) == Math.sqrt(normA * normB)
– coderodde
May 3 '17 at 13:04




1




1




How exactly do you plan to use the method computeCosineSimilarity ? I mostly find it strange that you would call parseRatingFile() each time but store the result in a static list. Also, my IDE complains that the types are incorrect on the line movieA.put(ratings.get(3),ratings.values()); You sure this code works?
– Imus
May 3 '17 at 14:14




How exactly do you plan to use the method computeCosineSimilarity ? I mostly find it strange that you would call parseRatingFile() each time but store the result in a static list. Also, my IDE complains that the types are incorrect on the line movieA.put(ratings.get(3),ratings.values()); You sure this code works?
– Imus
May 3 '17 at 14:14












Not sure what the purpose is. What does cosine similarity of all rating for a movie represent?
– paparazzo
Jun 1 at 12:45




Not sure what the purpose is. What does cosine similarity of all rating for a movie represent?
– paparazzo
Jun 1 at 12:45










1 Answer
1






active

oldest

votes

















up vote
0
down vote













I'm not familiar with the algorithm you've implemented. So I cannot point to improvements there. But some things in the code can be enhanced.





Use informative error messages. For instance, instead of:



    ...
} catch (FileNotFoundException e) {
System.out.println("file was not Found!");
}
...


consider something like:



    ...
} catch (FileNotFoundException e) {
String detailedMessage =
format("File [%s] was not found. Reason was [%s]!", "movies.dat", e.getMessage());
// BTW "movies.dat" can be extracted into constant.
System.out.println(detailedMessage);
}
...


In the latter snippet you can see that error message includes detailed info about what really happened. And please note that surround variable data: such placeholders not only help to see corner cases in log (for example, when empty name of input file was specified by mistake) but do grep (or any other text search) efficiently.





Consider try-with-resources. That will reduce amount of boilerplate code when dealing with readers.





Move parsing logic, e.g.:



...
String tokenDelimiter = readFile.split("\|");
String userID = tokenDelimiter[0];
String movieID = tokenDelimiter[1];
double rating = Double.parseDouble(tokenDelimiter[2]);
...


into separate helper method like it's already done for computeCosineSimilarity().





After all "little" improvements are done you will see the code more clearly. Then you can concentrate on the algorithm (e.g. on pure logic), add checks for corner cases (like empty input file), use strict math for floating point numbers, handle encoding of input files gracefully, improve overall processing speed for large files, etc.






share|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f162401%2fcosine-similarity-on-huge-dataset%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I'm not familiar with the algorithm you've implemented. So I cannot point to improvements there. But some things in the code can be enhanced.





    Use informative error messages. For instance, instead of:



        ...
    } catch (FileNotFoundException e) {
    System.out.println("file was not Found!");
    }
    ...


    consider something like:



        ...
    } catch (FileNotFoundException e) {
    String detailedMessage =
    format("File [%s] was not found. Reason was [%s]!", "movies.dat", e.getMessage());
    // BTW "movies.dat" can be extracted into constant.
    System.out.println(detailedMessage);
    }
    ...


    In the latter snippet you can see that error message includes detailed info about what really happened. And please note that surround variable data: such placeholders not only help to see corner cases in log (for example, when empty name of input file was specified by mistake) but do grep (or any other text search) efficiently.





    Consider try-with-resources. That will reduce amount of boilerplate code when dealing with readers.





    Move parsing logic, e.g.:



    ...
    String tokenDelimiter = readFile.split("\|");
    String userID = tokenDelimiter[0];
    String movieID = tokenDelimiter[1];
    double rating = Double.parseDouble(tokenDelimiter[2]);
    ...


    into separate helper method like it's already done for computeCosineSimilarity().





    After all "little" improvements are done you will see the code more clearly. Then you can concentrate on the algorithm (e.g. on pure logic), add checks for corner cases (like empty input file), use strict math for floating point numbers, handle encoding of input files gracefully, improve overall processing speed for large files, etc.






    share|improve this answer

























      up vote
      0
      down vote













      I'm not familiar with the algorithm you've implemented. So I cannot point to improvements there. But some things in the code can be enhanced.





      Use informative error messages. For instance, instead of:



          ...
      } catch (FileNotFoundException e) {
      System.out.println("file was not Found!");
      }
      ...


      consider something like:



          ...
      } catch (FileNotFoundException e) {
      String detailedMessage =
      format("File [%s] was not found. Reason was [%s]!", "movies.dat", e.getMessage());
      // BTW "movies.dat" can be extracted into constant.
      System.out.println(detailedMessage);
      }
      ...


      In the latter snippet you can see that error message includes detailed info about what really happened. And please note that surround variable data: such placeholders not only help to see corner cases in log (for example, when empty name of input file was specified by mistake) but do grep (or any other text search) efficiently.





      Consider try-with-resources. That will reduce amount of boilerplate code when dealing with readers.





      Move parsing logic, e.g.:



      ...
      String tokenDelimiter = readFile.split("\|");
      String userID = tokenDelimiter[0];
      String movieID = tokenDelimiter[1];
      double rating = Double.parseDouble(tokenDelimiter[2]);
      ...


      into separate helper method like it's already done for computeCosineSimilarity().





      After all "little" improvements are done you will see the code more clearly. Then you can concentrate on the algorithm (e.g. on pure logic), add checks for corner cases (like empty input file), use strict math for floating point numbers, handle encoding of input files gracefully, improve overall processing speed for large files, etc.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        I'm not familiar with the algorithm you've implemented. So I cannot point to improvements there. But some things in the code can be enhanced.





        Use informative error messages. For instance, instead of:



            ...
        } catch (FileNotFoundException e) {
        System.out.println("file was not Found!");
        }
        ...


        consider something like:



            ...
        } catch (FileNotFoundException e) {
        String detailedMessage =
        format("File [%s] was not found. Reason was [%s]!", "movies.dat", e.getMessage());
        // BTW "movies.dat" can be extracted into constant.
        System.out.println(detailedMessage);
        }
        ...


        In the latter snippet you can see that error message includes detailed info about what really happened. And please note that surround variable data: such placeholders not only help to see corner cases in log (for example, when empty name of input file was specified by mistake) but do grep (or any other text search) efficiently.





        Consider try-with-resources. That will reduce amount of boilerplate code when dealing with readers.





        Move parsing logic, e.g.:



        ...
        String tokenDelimiter = readFile.split("\|");
        String userID = tokenDelimiter[0];
        String movieID = tokenDelimiter[1];
        double rating = Double.parseDouble(tokenDelimiter[2]);
        ...


        into separate helper method like it's already done for computeCosineSimilarity().





        After all "little" improvements are done you will see the code more clearly. Then you can concentrate on the algorithm (e.g. on pure logic), add checks for corner cases (like empty input file), use strict math for floating point numbers, handle encoding of input files gracefully, improve overall processing speed for large files, etc.






        share|improve this answer












        I'm not familiar with the algorithm you've implemented. So I cannot point to improvements there. But some things in the code can be enhanced.





        Use informative error messages. For instance, instead of:



            ...
        } catch (FileNotFoundException e) {
        System.out.println("file was not Found!");
        }
        ...


        consider something like:



            ...
        } catch (FileNotFoundException e) {
        String detailedMessage =
        format("File [%s] was not found. Reason was [%s]!", "movies.dat", e.getMessage());
        // BTW "movies.dat" can be extracted into constant.
        System.out.println(detailedMessage);
        }
        ...


        In the latter snippet you can see that error message includes detailed info about what really happened. And please note that surround variable data: such placeholders not only help to see corner cases in log (for example, when empty name of input file was specified by mistake) but do grep (or any other text search) efficiently.





        Consider try-with-resources. That will reduce amount of boilerplate code when dealing with readers.





        Move parsing logic, e.g.:



        ...
        String tokenDelimiter = readFile.split("\|");
        String userID = tokenDelimiter[0];
        String movieID = tokenDelimiter[1];
        double rating = Double.parseDouble(tokenDelimiter[2]);
        ...


        into separate helper method like it's already done for computeCosineSimilarity().





        After all "little" improvements are done you will see the code more clearly. Then you can concentrate on the algorithm (e.g. on pure logic), add checks for corner cases (like empty input file), use strict math for floating point numbers, handle encoding of input files gracefully, improve overall processing speed for large files, etc.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 5 '17 at 14:59









        flaz14

        1263




        1263






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f162401%2fcosine-similarity-on-huge-dataset%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            List directoties down one level, excluding some named directories and files

            list processes belonging to a network namespace

            list systemd RuntimeDirectory mounts