Thursday, November 27, 2008

SourceForge.net application hosting cache issue

I have a new open source project at VolunteerCake that is using their recently released web hosting service. This service includes the typical LAMP stack with MySQL, Apache and PHP, so I thought it would be a great place to keep a demo of the site running.

It was working fine, and then one day I noticed that the pages were being over aggressively cached. For instance, if I clicked the login button on the front page, and logged in successfully, I expected to see a “logout” button and my user name, but instead was seeing the original page. By hitting “shift-refresh”, I was able to get the right page to display, but obviously that wasn’t a good way to demonstrate the software.

During my work on figuring out my Plaxo problem, I found a really cool tool called Fiddler2 that acts as a web proxy and lets you do nifty things like see the headers on web requests. Using this tool, I was able to look at the cache headers being sent by the server which looked like:

HTTP/1.1 200 OK
Server: nginx/0.6.31
Date: Tue, 18 Nov 2008 22:02:49 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.2.6
Set-Cookie: CAKEPHP=b7pvoorvj11tb45micnfqhc4b2; path
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Cache-Control: max-age=172800
Expires: Thu, 20 Nov 2008 22:02:46 GMT

Content-Length: 444

The part marked in red was the problem, the Cache-Control and Expires headers were being set to 48 hours in the future for my pages, so the browser was displaying the cached version of the page instead of asking the server for a new copy.

Knowing this, I opened a case with the SF.net support team to see if they could help figure out why the server was setting these headers for the PHP pages. I had a suspicion it had to do with the fact that Cake uses a new file extension of “.ctp” for the view files, but I really had no proof of this.

The SourceForge.net guys told me that their service had just been moved to some new servers, so it was possible this was related to that. They suggested that my application was responsible for setting the cache headers. While Cake does do some caching, it didn’t fit with what I knew. This exact same setup was working on my hosting service at http://volunteer.lctd.org/, which didn’t send those same headers.

I did some research on the Apache settings for cache, and while it is generally something you do at the server level, I found that it is possible to override these settings in the .htaccess file for a particular directory. Having had to tweak this file before to get Cake to work properly, my .htaccess file looked something like:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteRule    ^$    webroot/    [L]
RewriteRule    (.*) webroot/$1    [L]
</IfModule>

So what I needed to do was to tell the server not to set the Cache-control or Expires headers. After some experiments, I ended up with a new .htaccess file that looked like:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteRule ^$ webroot/ [L]
RewriteRule (.*) webroot/$1 [L]
</IfModule>

# Turn off Expires and set default to 0
ExpiresActive Off

Which basically turned off the whole caching on the http://volunteercake.sourceforge.net site. Since this is just a demo application, I figured that was good enough, so I didn’t spend any more time on figuring out how to restrict the change to a specific type of file (which would be important if this were a large application).

Wednesday, November 26, 2008

Fun with HTML and CSS

I spent some time yesterday figuring out CSS problems for Job Connections.

The Job Connections site was built using a CSS for printing that wasn’t including all of the parts of the page that should be printed. They use a stylesheet called print.css, and when somebody would try to print a page, they weren’t getting anything but the text in the middle of the page.

I took a look and found that the stylesheet was setting all of the region styles to “display: none”, which tells CSS not to display them. Editing the stylesheet to remove these bits was all that was needed, so I set it up to print everything but the menu bar at the top and down the side.

In the same file, there was a reference that looked like an attempt to make the links display as bolded when the page was printed. The code that was trying to do this looked like:

a { font-weight: bold;
border-width: 0px;
text-decoration: none;
}

That wasn’t working, mostly because the style was being applied to all anchors. I updated it to look like:

a:link, a:visited {
font-weight: bold;
border-width: 0px;
text-decoration: underline;
color: #520;
background: transparent;
}

This change applied the style to both links and visited links. I then went one step further and added some magic to get the actual link to print (works in CSS2 compliant browsers):

/* this bit will print the URL after the link
when done from the CSS2 compliant browsers */
a:link:after, a:visited:after {
content: ” (” attr(href) “) “;
font-size: 90%;
}

The magic is in the “:after” bit, which basically says “after you display the link, display something else”. With this applied, the links all get bolded, underlined, and are followed by the actual URL in parentheses afterward.

I don’t have access to post this to the JobConnections web site yet, so you can see the problem if you go there and look at the printed page (print preview from your browser).

I got access to the web site (thanks to Walt Feigenson), so this is partially fixed now. It looks pretty good except the content still has quite a large area of whitespace to the left due to the way the style sheets are interacting. I’m playing with updating this now to make the print CSS work the way it should and not inherit the styles that cause this from the “screen” CSS.

Friday, November 21, 2008

Plaxo: the service I love/hate

A couple of days back, I solved a problem I was having with Plaxo. For a few weeks, I was unable to connect to any of the Plaxo web servers from any of my home machines.

Being a fairly knowledgeable network person, I spent hours trying to diagnose the problem. I could get to all other web sites, but not to anything in the plaxo.com domain. Worse, I could resolve, ping and traceroute looked fine.

First I thought it might be something caused by Plaxo being bought by Comcast. Comcast had just recently been in the news for blocking traffic to keep bandwidth available, so I figured it wasn’t inconceivable that somebody made a mistake in a firewall somewhere that was blocking traffic between them and AT&T.

I sent an email to Plaxo to ask them if their site was up, and called AT&T to see if we could diagnose the problem. AT&T as usual was very nice (and annoying) and started me out with the normal insane steps:

  1. Turn off your firewall
  2. Clear your cache
  3. Turn off your router

After getting past all the annoying stuff, I got to their level 2 support, and then to the 2Wire support to see if they could find anything with my router that might be causing this. Naturally they found nothing, and everything looked OK.

So I escalated with Plaxo, calling them on the phone to see if there was anything they could do. There were emails and phone calls back in forth that never solved the problem:

  • First call I was told that there was a problem with one of their servers, and that it would be working the next day (not).
  • Another call I was told they had found the problem in their web server, and it would be fixed shortly
  • I got numerous emails telling me to uninstall the Plaxo software and log in again, which of course didn’t work since I couldn’t even get to the web site.
  • I had numerous emails diagnosing the problem as a Mac issue, or a PC issue, which again it wasn’t since it was happening on the Mac, iPhone and PC (and the iPhone doesn’t even have a Plaxo client).

Finally at some point, I got a support guy who told me that my IP address was indeed blocked at their server. Now we’re getting somewhere. But no, it still doesn’t work.

Luckily for me this guy is good, so he tells me that there was an old version of the Plaxo client for Mac that their servers were detecting as a bot attack, so if I uninstall that everything should be golden. I do, and lo and behold I can get to Plaxo again …

So it appears that Plaxo can be incompatible with itself …

I wonder how many people are blocked with the same problem right now.

Wednesday, November 19, 2008

Web marketing

Recently I’ve entered the world of using the web for self marketing.

I saw a very interesting talk by Walter Feigenson at the last CPC Job Connections meeting about marketing yourself using the web.

I already had a LinkedIn profile, and had my resume on a couple different places, but his talk convinced me that I ought to do some more. So I did the following:

  1. Set up Google reader so I can see all the web changes in one place.
  2. Built a profile on Naymz (http://www.naymz.com), unclear on exactly what this one does.
  3. Ziki (http://www.ziki.com) – Signed up, but never got the validation email. This is supposed to be a job finding service.
  4. Spokeo (http://www.spokeo.com) – Signed up – not clear on what this site does beyond search for names.
  5. Ziggs (http://www.ziggs.com) – Signed up and built profile, this one looks interesting.

Just signing up for these things takes time, getting them to be consistent seems like it will be a pain. It reminds me of posting your resume to all of the job search sites. Not too bad the first time, but then going back to update is going to be hard.

Next thing I did was to add cross links from as many different places as I could to my web site (http://www.accuweaver.com). This is supposed to help with the ranking on the search engines, since the search engines use the assumption that if a lot of sites link to you, you must be important.

I also cleaned up my LinkedIn profile, added links, and added my company to the Companies part of LinkedIn.Then after all of this, I got hit again with the suggestion that I should set up a Facebook profile. Walt had mentioned it, but it took hearing it a few more times for me to act.  It still seems a bit smarmy, and unlikely to be useful as a business networking tool, but we’ll see.

Next: Making sure I’m posted on a huge list of sites I got from Valerie Colber