Cullect was originally built during a time of stagnation within feed readers. Since it went down, sites like Facebook and Twitter have increased the number of people comfortable with the noisy-ness of real-time publishing and processing. Simultaneously, the innovation those services once provided has also stagnated. Additionally, many of the services Cullect integrated with are also down for the count.
Though Cullect.com is down – the Cullect engine is still being actively worked on – it’s powering RealTimeAds.com
If you’re wondering – no, I haven’t used another feed reader since Cullect.
In my work to bring Cullect back from hiatus, I’ve been doing a full code review and asking myself what should stay, what should be fixed, and what should go.
A number of the services Cullect originally integrated with no longer exist (Ma.gnol.ia for example). Cullect had fairly deep Twitter integration (at the time) but that seems extraordinarily less useful or valuable today.
Importance is difficult to discern with a 5-minute half-life.
Adding to that – I’ve got another project with fairly significant Twitter integration – and I’m just not terribly interested in building it out. Nor am I seeing the demand for it.
Use Your Own Domain Name
At Culld.us, you get a subdomain – like grv.culld.us – and just point a CNAME record to it from your domain.
Use Your Own Web Analytics
Put the statistics on your short URLs in your existing web analytics package, whether it be Google Analytics or another package, just paste the tracking code in your subdomain’s settings.
.htaccess and archival feeds
If you want to leave Culld.us, you can take your redirects with you. Anytime you want – you can grab the .htaccess file, containing your shortened urls and the webpages they redirect to, and upload to your own server.
You can also grab the RSS or JSON feeds.
Fully Customized Stylesheet
Anything you can change in CSS can be changed in your Culld.us subdomain.
Anyone you authorize has access to add URLs to your Culld.us subdomain. Everyone gets their own login and API tokens.
While both of them pointed in the right direction, I realized I was caching the wrong stuff in the wrong way.
Since then, I tried replicating the database – one db for writes, one for reads. Unfortunately, they got terribly out of sync just making things worse. I turned off the 2nd database and replaced it with another pack of mongrels (a much more satisfying use of the server anyway).
Cullect has 2 very database intensive processes: grabbing the items within a given reading list (of which calculating ‘importance’ is the most intensive) and parsing the feeds.
Both cause problems for the opposite reasons – the former is read and sort intensive while the latter is write intensive. Both can grind the website itself to a halt.
Over the last couple weeks, I moved all the intensive tasks to a queue processed by Delayed_Job and I’m caching the reading lists’ items in database table – rather than memcache.
Yes, this means every request is pulled from the cache, and a ‘update reading list’ job is put into the queue.
So far, this ‘stale while update’ approach is working far better than the previous approaches.
BTW, as this practice confirms, the easiest way to increase the performance of your Ruby-based app is to give your database more resources.
After a couple very rough weeks – I’m happy with where Cullect and it’s caching strategy is. It’s slightly different from where I talked about last. I’ve also added a slave DB to the mix since my last write up. Overall, it feels more solid, and is performing at or better than before
In my part 1, I laid out my initial approach on caching in Cullect.
It had some obvious deficiencies;
This approach really only sped up the latest 20 (or so items). Fine if those items don’t change frequently (i.e. /important vs /latest) or you only want the first 20 items (not the second 20),
The hardest, most expensive database queries weren’t being cached effectively. (um, yes that’s kinda the purpose).
I just launched a second approach. It dramatically simplified, cache key (6 attributes down from 10) and rather than caching entire the items in that key, I just stored the pointer object to them.
Unfortunately, even the collection of pointer objects was too big to store in the cache, so I tried a combination of putting a LIMIT on the database query and trying to store 20 items a time in a different cache object.
This second approach had the additional problem of continually presenting hidden/removed items ( there’s 2 layers of caching that need to be updated).
Neither was a satisfactory performance improvement.
I’ve just launched a solution I’m pretty happy with and seems to be working (the cache store is updating as I write this).
Each Cullect.com reading list has 4 primary caches – important, latest, recommended, hidden – with a variants for filters in keyword searches. Each of these primary caches is a string containing the IDs of all the items in that view. Not the items themselves, or any derivative objects – both of those take up too much space.
When items are hidden, they’re removed from the appropriate cache strings as well.