Trie structure using Hash in Ruby

up vote
0
down vote

favorite

This is a follow-up to Boggle board solver in Ruby

class Trie < Hash

  def build(string)

    string << '.' # Terminator indicating end of a complete word

    string.chars.inject(self) do |h, char|

      h[char] ||= {}

    end

  end

end



# Build the trie from all the words

@trie = Trie.new

File.foreach('wordlist.txt') { |w| @trie.build(w.chomp) }

Here I use a simplistic Trie structure based on a Hash in Ruby. It does the trick very nicely, but I am wondering whether there is a better way to implement this in Ruby that will consume less memory.
(It lacks lookup methods which is perfectly fine, because I track through the Trie along with the path through letters on a board - the '.' acts as a marker for a complete word).

When I load the file, which contains 270K+ words and is ~2.9MB in size, the memory consumption is over 188 megabytes.

Is there a way to reduce the memory consumption, perhaps, implementing this differently?

asked Sep 10 at 5:18

mydoghasworms

1214

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

up vote
0
down vote

favorite

This is a follow-up to Boggle board solver in Ruby

class Trie < Hash

  def build(string)

    string << '.' # Terminator indicating end of a complete word

    string.chars.inject(self) do |h, char|

      h[char] ||= {}

    end

  end

end



# Build the trie from all the words

@trie = Trie.new

File.foreach('wordlist.txt') { |w| @trie.build(w.chomp) }

When I load the file, which contains 270K+ words and is ~2.9MB in size, the memory consumption is over 188 megabytes.

Is there a way to reduce the memory consumption, perhaps, implementing this differently?

asked Sep 10 at 5:18

mydoghasworms

1214

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

up vote
0
down vote

favorite

This is a follow-up to Boggle board solver in Ruby

class Trie < Hash

  def build(string)

    string << '.' # Terminator indicating end of a complete word

    string.chars.inject(self) do |h, char|

      h[char] ||= {}

    end

  end

end



# Build the trie from all the words

@trie = Trie.new

File.foreach('wordlist.txt') { |w| @trie.build(w.chomp) }

When I load the file, which contains 270K+ words and is ~2.9MB in size, the memory consumption is over 188 megabytes.

Is there a way to reduce the memory consumption, perhaps, implementing this differently?

asked Sep 10 at 5:18

mydoghasworms

1214

This is a follow-up to Boggle board solver in Ruby

class Trie < Hash

  def build(string)

    string << '.' # Terminator indicating end of a complete word

    string.chars.inject(self) do |h, char|

      h[char] ||= {}

    end

  end

end



# Build the trie from all the words

@trie = Trie.new

File.foreach('wordlist.txt') { |w| @trie.build(w.chomp) }

When I load the file, which contains 270K+ words and is ~2.9MB in size, the memory consumption is over 188 megabytes.

Is there a way to reduce the memory consumption, perhaps, implementing this differently?

ruby memory-optimization trie

asked Sep 10 at 5:18

mydoghasworms

1214

asked Sep 10 at 5:18

mydoghasworms

1214

asked Sep 10 at 5:18

mydoghasworms

1214

asked Sep 10 at 5:18

mydoghasworms

1214

asked Sep 10 at 5:18

mydoghasworms

1214

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Bloom Filters

I suggest a new data structure, specifically a Bloom Filter. A bloom filter is a boolean array of length m. You will also need k different hashing algorithms (more on m and k in a moment). Each hash should accept complete word and return a number 0 through m.

A bloom filter is a big hashing data structure that returns one of two results when checking if an item is in the hash:
1) The item is definitely not in the data structure
2) The item is maybe in the data structure

To use the bloom filter, you will load it with every word in your list. Specifically, for every word, load the first two letters, then first three, then so on, until the whole word is loaded. I like the idea of using a period to mark the end of the word, so do that too. For example, for the word 'cart', you would load your filter with ['ca', car', 'cart'].

Then, on the boggle array, you can do an exhaustive DFS search fromevery starting position. if the letter combination (with or without period) is in the list, then you keep branching and searching. If the combination isn't in the list, you choose a different path, etc.

Simple concrete example

Here is a simple example of a bloom filter being used. Let's set m=100 (array of 100 booleans, initialized to true) and k=2 (use two different hashing algorithms).

When hashing the word 'cat', hash1 might return 7 and hash2 might return 36, so you set indexes 7 and 36 to false. When hashing the word 'dog', hash1 might return 7 and hash2 might return 70, so those indices are set to false. Next, we check to see if 'fish' is in the array. Hash1 returns 80 and hash2 returns 7. Because all the indices are not false, we know that 'fish' is definitely not in the array. Finally, we check if 'bird' is in the array. Hash1 returns 70 and Hash2 returns 7. 'Bird' might be in the array and we will have to check the actual wordlist to confirm.

Some notes

1) You should be able to search using only the bloom filter until you are checking for a complete word (For example when checking for 'car.', and not checking 'car' as a substring of 'cart.' or 'carriage.'). Otherwise if the bloom filter says it's probably in the list, just assume it's true.

2) Depending on how thorough your edge cases are, it may be possible to occasionally miss a possible word depending on the hash algoirthms chosen, size of filter, etc. In general, I would say that while you are not 100% guaranteed to find the highest scoring word, you will definitely find a high scoring word.
3) The bloom filter here is chosen to reduce the memory space of the search. The running time may very well be longer than the trie structure.
4) When confirming a word exist in your word list, you should, of course, use a binary search.

Choosing m and k

There is some math involved in choosing m and k. I'll spare you the finer details and suggest you look at wiki section about choosing optimal sizes.

answered Oct 4 at 14:37

Zack

1,736312

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203441%2ftrie-structure-using-hash-in-ruby%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Bloom Filters

Simple concrete example

Here is a simple example of a bloom filter being used. Let's set m=100 (array of 100 booleans, initialized to true) and k=2 (use two different hashing algorithms).

Some notes

Choosing m and k

There is some math involved in choosing m and k. I'll spare you the finer details and suggest you look at wiki section about choosing optimal sizes.

answered Oct 4 at 14:37

Zack

1,736312

add a comment |

up vote
0
down vote

Bloom Filters

Simple concrete example

Here is a simple example of a bloom filter being used. Let's set m=100 (array of 100 booleans, initialized to true) and k=2 (use two different hashing algorithms).

Some notes

Choosing m and k

There is some math involved in choosing m and k. I'll spare you the finer details and suggest you look at wiki section about choosing optimal sizes.

answered Oct 4 at 14:37

Zack

1,736312

add a comment |

up vote
0
down vote

Bloom Filters

Simple concrete example

Here is a simple example of a bloom filter being used. Let's set m=100 (array of 100 booleans, initialized to true) and k=2 (use two different hashing algorithms).

Some notes

Choosing m and k

There is some math involved in choosing m and k. I'll spare you the finer details and suggest you look at wiki section about choosing optimal sizes.

answered Oct 4 at 14:37

Zack

1,736312

Bloom Filters

Simple concrete example

Here is a simple example of a bloom filter being used. Let's set m=100 (array of 100 booleans, initialized to true) and k=2 (use two different hashing algorithms).

Some notes

Choosing m and k

There is some math involved in choosing m and k. I'll spare you the finer details and suggest you look at wiki section about choosing optimal sizes.

answered Oct 4 at 14:37

Zack

1,736312

answered Oct 4 at 14:37

Zack

1,736312

answered Oct 4 at 14:37

Zack

1,736312

answered Oct 4 at 14:37

Zack

1,736312

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk