I talked with Connie Reece (@conniereece) today about Tweeterboard and gave away some information I haven’t talked about before. You can listen to the podcast and pick up a few details on her blog.

The top-secret algorithmic mojo 

The one thing I mentioned that I haven’t talked about anywhere is how the top 100 algorithm works.  It’s pretty simple, actually: it’s PageRank.  I implemented it in the same way it’s described in the original paper. (I’m no computer scientist, but I’m pretty sure I’ve got it right. Also, I’m hoping that I don’t get a cease and desist for patent infringement.)

The interesting thing for me was applying a solution that works well in one domain (web search) to a completely different one (Twitter conversations).  And I was happy to see that it confirmed my intuitions about influence within Twitter–basically that @ replies are the most useful metric so far for understanding who is influential.

Anyway, listen to Connie’s podcast for more details.

Last week I changed how Tweeterboard polls — the fancy word for capturing and processing — your tweets. This is mainly an under-the-hood kind of change, but a very few of you might notice some changes in your stats. And if so, please let me know.

Hashtags

I also started tracking hashtags. Hashtags are a simple way of tagging in your tweets. You place the hash mark (#) in front the word or words you want to tag (e.g. #sxsw, #microformats, #yahoo), and then sites like Terraminds and Hashtags.org and eventually Tweeterboard can aggregate them. [*]

Right now Tweeterboard is capturing any alphanumeric tags. If you want to use multi-word tags, you can separate the words with a plus sign (+) or underscore(_). All tags are normalized, which means they’re converted to lowercase with pluses and underscores removed, before they’re stored. So #san_francisco ends up being the same as #sanfrancisco, and #new+york+city becomes #newyorkcity.

This is a preliminary implementation so I can see what the data looks like. I’ll eventually show your tags, and let you find people and URLs by tags. But for now I have a couple of minor technical details to figure out.

Just for fun, here are a few of the most popular tags since I’ve been tracking:

  • ipan – 226 mentions
  • 8217 – 142
  • 1 – 114
  • ip2 – 79
  • cparty – 78
  • 2 – 63
  • lift08 – 50
  • 8230 – 40
  • socon08 – 38
  • barcampmd – 35
  • 3 – 34
  • 4 – 30
  • twitter – 28
  • supertuesday – 28
  • music – 28
  • caprimary – 27
  • myspace – 27

You can probably guess what those technical details are. Anyway, if you know your way around PHP and regular expressions and you have some advice, please add a comment.

* In hindsight, I probably should’ve written about hashtags in my book.

A few people have commented on how Tweeterboard handles capitals. Just the other day nateritter said “fix your capitalization problems. They’re the same people, sheesh.” And then a few hours ago, yndyngo said “you show up twice on my Tweeterboard stats – once as “maxweb” and once as “MaxWeb” – gotta love case sensative apps.”

A case-sensitive app? I like to think I’m smarter than that, but I had to hunt through the database and code to understand what was happening. The good news is that Tweeterboard doesn’t treat “MaxWeb” and “maxweb” as different users. However, it does store that user name in a couple of different places. In the users table, it’s “MaxWeb” which is how it’s spelled on Twitter. In the table that records conversations between users it’s “maxweb” because the function that extracts user names from tweets converts everything to lowercase.

On yndygo’s profile, you see “MaxWeb” under the “Gets Love” tab because that name is pulled from the users table. Under the “Gives Love” it pulls the name from the conversations table so it’s “maxweb.” If Tweeterboard were truly case sensitive, “MaxWeb” and “maxweb” would be treated as separate user names and would appear in both places. But they don’t–a fact I confirmed with a bunch of manual database inspection so, while I wouldn’t bet my life on it, I’m pretty sure that’s the case. Like 99%.

In the next day or so I’m going to tidy things up so all user names appear in lowercase. (I’ll leave it up for now so you can see the differences I point out above.)

I thought it would be helpful to have a blog.  I can push announcements out via Twitter, but it’s hard to discuss the merits of this feature vs. that in 140 characters.  I’m sure I won’t post much (there’s a lot of code to tidy up), but this will be the place for announcements, discussions, new features, etc.