Category Archives: Search Engines

Short URLs Re-defining SEO

It’s conventional search engine optimization wisdom that URLs should contain words, separated by either dashes or underscores. This approach improves the readability of the URL – making it more usable for people while simultaneously giving internet robots something to work on.

But with people sharing URLs within places – like Twitter and Facebook (and … and … and …) – places with a default social context, we’re seeing a URL’s context trump its readability as a significant usability factor.

Who is sharing and how they describe what they’re sharing is more important than the readability of the shared URL itself.

Leaving the search engine robots blocked out completely (disallow, nofollow, etc) or piecing together a pile of redirect URLs (which may or may not exist tomorrow, e.g. RE07.US).

Additionally, the share-er’s pays for each URL with their social capital. ‘Good’ URLs (as deemed by each individual follower) raise the share-er’s capital while ‘bad’ URLs lowers.

Throw in the proliferation of other difficult to index assets like images and video – and we’re talking about an internet that’s not Search Engine Optimized, but Social Engagement Optimized.

Search Engines Not Following

There are two semantic phenomena made prominent with the advent of tagging:

  1. We use related words to describe a concept wrapped in a point-of-view.
    If memory serves, in his Ontology is Overrated presentation Clay Shirky uses “film”, “movies”, and “cinema” as an example. Each of these words describes similar, but different things.
  2. We use the same word to describe vastly different concepts.
    Take “java” for example it could be referring to coffee, code, or a country.

Surprisingly, Google, Yahoo, and MSN haven’t yet connected people with their points of view. Dave Winer suggests:

“Let me tell [search engines] where my weblog is. Then it knows what my interests are. Give me search results relevant to who I am.”

Reminds me of something I wrote about on why Google Adsense doesn’t work.

Whether it’s the words in this blog or other sites I’ve read – I’m implicitly declaring context and point-of-view every time the browser refreshes. Then it goes straight down the memory hole.

(What I’ve looked at before + What I’m looking at now) / What you’re trying to tell me = Targeted ad

Excuse me, could I get a refill on the Attention Kool-aid?

Aggregation Not Adding Value?

Splogs or spam-blogs are a problem I’ve touched on before. I find them annoying and whenever Technorati points me to something smelling sploggy, I hit my SplogReporter bookmarklet.

My criteria for splog:

  • whole-cloth copying of another weblog’s post
  • minimal or nonexistent attribution to the original authors and weblogs
  • no explicit “we’re aggregating these sites” messaging

RSS makes it real easy to communicate with readers frequently and automatically – and real easy for robots to make splogs. Simply subscribing to an RSS feed isn’t “content theft” – doing so and not explicitly crediting the original site/author is. Absolutely. No Question.

I can appreciate Mark Cuban’s position that “a search on any blog engine should uncover the unique content on their original source” – not any of the derivatives. The lack of this strictness is why slogs exist anyway. I don’t agree with his position that aggregation doesn’t add value. Aggregation is a very simple way to provide value – Bloglines, Yahoo, and Google have based a number of products on that belief. To me, aggregation and search are two ways of answering the same problem. The trick is to know who’s the aggregator and who’s the source when the aggregator is being dishonest.

When I’m pulling together some feeds for an aggregator, say PodcastMN or MNRep I use the link – or preferably the guid – element in RSS to point back to the original author. Upon reviewing the spec while writing this post, looks like source exists “to propagate credit for links, to publicize the sources of news items.”

Makes sense – and I’ve just added that tag into the aggregators. Seems to me being strict about RSS tags first and checking sources second is a useful to fight splogs and un-attributed content aggregation.