Liveblogging a Joule fix
Jun. 27th, 2009 06:31 pmRyan Tucker reported a bug in Joule. When a user has more than 5,000 followers, on some days Joule will throw a database error about a duplicate key. This is mysterious, since the keys come from a hash and should be unique. I thought I'd try liveblogging fixing it, in case anyone wanted to watch. Times are EST.
- 18:30: Can we replicate it in staging? I don't want to bring the real Joule down while I look for a fix.
- 18:39: Yes, astronautics on Twitter has >5,000 followers and causes Joule to exhibit the bug.
- 18:44: Okay, Joule is instrumented so it will dump the old and new lists to a file, plus what it thinks the changes are.
- 18:45: It failed again and I have the log file. Good! I hate when you set up debugging and it suddenly starts working.
- 18:48: Well, it's not because there are duplicates in the old or the new lists, so it must be a comparison error.
- 19:01: Fascinating. The new version of the comparison code is reporting one of the userids as both added and removed, which the DB constraints obviously won't allow. This didn't come up in testing...
- 19:24: Seems that when you have two users A and B, and A's name is a prefix of B's, and A unfriends you, that the system gets confused and reports B having both friended and unfriended you. Fixing now.
- 19:31: I think I have a solution. Taking out all the instrumentation to test it.
- 19:36: Tests pass. So do old tests. Will write a regression test in a few minutes.
- 19:37: The moment of truth... yes! it works on staging. Rolling out to production.
- 19:52: Fix checked in and in production. The remaining problem here is that astronautics had 3064 follows and 2421 unfollows, and Joule is fixed so that it shows "Hiccup" if you have more than 100 on the same day (for three reasons; I could tell you, but does anyone care?) Suggestions for working around this one are welcome.
- We have to do a separate lookup in Twitter for every userid we haven't seen before, to get the icon and username. For 5000 changes in a day, that slows page load times a lot. This is still a problem.
- There is an old pre-Twitter assumption that 100 follows or unfollows means either that Joule broke, or that LJ broke when it sent us the names. Clearly this is outdated.
- There isn't enough space in the chart for more than a few hundred names a day without making the page insanely long.