#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

# Blog - Linear maps that fill pigeon holes

Hi. Over at Blog - Linear maps that fill pigeon holes there's most of an article that might be suitable for the Azimuth blog. It's not yet in the stages appropriate for final copy editing, but it's hopefully in a pretty much readable state. I still need to produce some final graphs for the results section and polish some of the phrasing, but other than that it should be reasonably complete. (There's a couple of formatting glitches I need to figure out how to fix, and the references are (((1))) style because I'm working off a master version in LaTeX at this stage and those don't get screwed up by the conversion software; when complete they'll be changed to [1] style references.)

If anyone feels like reading it and letting me know of any bits that are incomprehensible, etc, that'd be much appreciated.

• Options
1.

Didn't John say he avoids creating clever titles in favor of a descriptive name that can be used for searching? It looks like you are describing a hashing technique, so maybe hashing should be in the title as well.

I remember incorporating a hashing algorithm when I wrote a Prolog interpreter years ago. I recall that it wasn't important to get anywhere near a perfect hash key the first time, because it would just go into a linear linked list search if a collision occurred, and that was efficient enough.

Comment Source:Didn't John say he avoids creating clever titles in favor of a descriptive name that can be used for searching? It looks like you are describing a hashing technique, so maybe hashing should be in the title as well. I remember incorporating a hashing algorithm when I wrote a Prolog interpreter years ago. I recall that it wasn't important to get anywhere near a perfect hash key the first time, because it would just go into a linear linked list search if a collision occurred, and that was efficient enough.
• Options
2.
edited September 2016

Thanks WebHubTel: it may be that John said that during a period when I wasn't reading the forum. I'll try to find another title.

Regarding the hashing question: it's a bit tricky because, while I want to point out that this mathematical problem arises in real-world problems I also don't really want to have things get too easily sidetracked into a discussion of alternative methods of hashing rather than alternative approaches to the mathematical problem discussed here.

In terms of general hashing vs perfect hashing, they're appropriate in different situations and there's lots of active research on both general hashing and perfect hashing. In the situations where it's able to be used (which isn't everywhere), perfect hashing has some very nice properties particularly as you scale to huge amounts of data (100s of GB, even terabytes), such as the ability to do sorting via a linear-time counting sort rather than needing to do general comparison based sorting.

Comment Source:Thanks WebHubTel: it may be that John said that during a period when I wasn't reading the forum. I'll try to find another title. Regarding the hashing question: it's a bit tricky because, while I want to point out that this mathematical problem arises in real-world problems I also don't really want to have things get too easily sidetracked into a discussion of alternative methods of hashing rather than alternative approaches to the mathematical problem discussed here. In terms of general hashing vs perfect hashing, they're appropriate in different situations and there's lots of active research on both general hashing and perfect hashing. In the situations where it's able to be used (which isn't everywhere), perfect hashing has some very nice properties particularly as you scale to huge amounts of data (100s of GB, even terabytes), such as the ability to do sorting via a linear-time counting sort rather than needing to do general comparison based sorting.
• Options
3.

Here was the example that you missed: https://forum.azimuthproject.org/discussion/comment/15422/#Comment_15422

Maybe your title isn't too bad but I didn't get the pigeon-holing double meaning at first. I think I get it, as hashing is a pigeon-holing of information?

Comment Source:Here was the example that you missed: https://forum.azimuthproject.org/discussion/comment/15422/#Comment_15422 Maybe your title isn't too bad but I didn't get the pigeon-holing double meaning at first. I *think* I get it, as hashing is a pigeon-holing of information? 
• Options
4.
edited October 2016

Thanks.

I was thinking by analogy with the physical "pigeon hole" within a box of pigeon-holes that an individual may have for their mail or a university course might have for feedback forms. The pigeon hole for "Dr Dirichlet" is, say, the 17th one in the block in the same way one would say that the perfect hash assigns hash code 17 to the string "Dr Dirichlet".

Comment Source:Thanks. I was thinking by analogy with the physical "pigeon hole" within a box of pigeon-holes that an individual may have for their mail or a university course might have for feedback forms. The pigeon hole for "Dr Dirichlet" is, say, the 17th one in the block in the same way one would say that the perfect hash assigns hash code 17 to the string "Dr Dirichlet".
• Options
5.
edited September 2016

Nice!

I took the liberty of adding a couple of paragraph breaks to the introduction, which to me improves flow. Revert if you prefer the original.

I suggest adding a couple of sentences at the end of the intro, to the effect of:

• All these problems can be abstracted to a vector-processing problem. In the book title example, each title gets represented by a vector of integers, one per character.

• These techniques have application to a technique known in computer science as perfect hashing, in which hashing gets performed in guaranteed constant time.

On a first, cursory, reading, I found the sections "Extending solutions to satisfy more equations" and "A concrete procedure," to be fairly dense. I will have to reread and stare at it again. I would revisit this, to see how you can help the reader along there.

The references sound interesting. Can you edit them so that the links work.

Overall: good work. Thanks for contributing.

Comment Source:Nice! I took the liberty of adding a couple of paragraph breaks to the introduction, which to me improves flow. Revert if you prefer the original. I suggest adding a couple of sentences at the end of the intro, to the effect of: * All these problems can be abstracted to a vector-processing problem. In the book title example, each title gets represented by a vector of integers, one per character. * These techniques have application to a technique known in computer science as perfect hashing, in which hashing gets performed in *guaranteed* constant time. On a first, cursory, reading, I found the sections "Extending solutions to satisfy more equations" and "A concrete procedure," to be fairly dense. I will have to reread and stare at it again. I would revisit this, to see how you can help the reader along there. The references sound interesting. Can you edit them so that the links work. Overall: good work. Thanks for contributing. 
• Options
6.

Sorry I've been so slow about looking at the Forum. Within the last 2 weeks I came back from Singapore, did a lot of household repairs---and then classes started and so did the CASCADE project. If anyone ever gets impatient about a slow reply from me, please feel free to send me an email. I think y'all know my address.

I indeed advocate dull, precisely descriptive titles for blog articles, like "The Circular Electron Positron Collider", "Complex Adaptive System Design", and so on. The idea is that one should know exactly what one is going to read about.

Comment Source:Sorry I've been so slow about looking at the Forum. Within the last 2 weeks I came back from Singapore, did a lot of household repairs---and then classes started and so did the [CASCADE project](https://johncarlosbaez.wordpress.com/2016/10/02/complex-adaptive-system-design-part-1/#comment-83908). If anyone ever gets impatient about a slow reply from me, please feel free to send me an email. I think y'all know my address. I indeed advocate dull, precisely descriptive titles for blog articles, like "The Circular Electron Positron Collider", "Complex Adaptive System Design", and so on. The idea is that one should know exactly what one is going to read about. 
• Options
7.

Thanks. I'm still working on this, but I've been adding (hopefully illuminating) animations that take a lot longer to make than additional words. (I still seem to be hitting the old issue where the wiki thinks a link to an uploaded illustration points to something that's not there when it is; all the linked illustrations do exist.) Hopefully I'll get a "ready for serious copy-editing" version by the the weekend. Whenever that is I'll announce it here and also drop John a personal email.

I'll change the title to something else, but I'm leaving it as is for the moment just so this discussion stays associated with it.

Comment Source:Thanks. I'm still working on this, but I've been adding (hopefully illuminating) animations that take a lot longer to make than additional words. (I still seem to be hitting the old issue where the wiki thinks a link to an uploaded illustration points to something that's not there when it is; all the linked illustrations do exist.) Hopefully I'll get a "ready for serious copy-editing" version by the the weekend. Whenever that is I'll announce it here and also drop John a personal email. I'll change the title to something else, but I'm leaving it as is for the moment just so this discussion stays associated with it.
• Options
8.
edited October 2016

Dave Tanzer: I constructed a reply to your comments in my head when I read them, but I see I clearly never actually got around to typing it up into the forum! (I spend a lot time accessing the internet on trains on my smartphone these days.) So apologies for the delay in replying, and many thanks for the comments.

I've kept your breaks in the introduction. I'm working on trying to make the sections on extending the solution more apparent. In terms of the concrete recipe section, it's partly there because I get a little bit annoyed when I read a document that claims to describe an algorithm but doesn't include the details to actually implement what the author did. So it's partly there just so I don't do that; I'm not sure if expanding it with further words is worthwhile?

Comment Source:Dave Tanzer: I constructed a reply to your comments in my head when I read them, but I see I clearly never actually got around to typing it up into the forum! (I spend a lot time accessing the internet on trains on my smartphone these days.) So apologies for the delay in replying, and many thanks for the comments. I've kept your breaks in the introduction. I'm working on trying to make the sections on extending the solution more apparent. In terms of the concrete recipe section, it's partly there because I get a little bit annoyed when I read a document that claims to describe an algorithm but doesn't include the details to actually implement what the author did. So it's partly there just so I _don't_ do that; I'm not sure if expanding it with further words is worthwhile?
• Options
9.

Hi, David! Again, I'm sorry to be slow about checking in here these days; just send me an email if you want something done quick. Your article looks okay except for some typos and two "notes to self":

EXPLANATION OF HOW NUMBER OF COMPONENTS INCREASES IN SPURTS

and this:

TODO: point out how we’re choosing the RHS elements, not being given them

As far as I'm concerned you can either include them or leave them out.

I'll give your article a careful read when I enter it into the Azimuth blog, and I may have more questions at this time.

The title does not, on reflection, seem too "cute", though I think the usual spelling is "pigeonhole", not "pigeon hole".

Comment Source:Hi, David! Again, I'm sorry to be slow about checking in here these days; just send me an email if you want something done quick. Your article looks okay except for some typos and two "notes to self": > EXPLANATION OF HOW NUMBER OF COMPONENTS INCREASES IN SPURTS and this: > TODO: point out how we’re choosing the RHS elements, not being given them As far as I'm concerned you can either include them or leave them out. I'll give your article a careful read when I enter it into the Azimuth blog, and I may have more questions at this time. The title does not, on reflection, seem too "cute", though I think the usual spelling is "[pigeonhole](http://www.merriam-webster.com/dictionary/pigeonhole)", not "pigeon hole". 
• Options
10.

Hi John. The article has been hit been hit by the trifecta of issues: * a certain perfectionism in the desired result graphs * generating plots is harder work than writing * whenever I get close to finishing something like this, there's always a panic at work.

I'll try to get those final couple of points finished off by the end of this week and let you know both on the forum and by direct email.

Comment Source:Hi John. The article has been hit been hit by the trifecta of issues: * a certain perfectionism in the desired result graphs * generating plots is harder work than writing * whenever I get close to finishing something like this, there's always a panic at work. I'll try to get those final couple of points finished off by the end of this week and let you know both on the forum and by direct email.
• Options
11.

Okay... it's been a month so far. Drop the perfectionism! Send me an email.

Comment Source:Okay... it's been a month so far. Drop the perfectionism! Send me an email.