My favorite (dead simple) PHP debugging tool

Last week, I was sitting on my couch, binging on some new episodes on NetFlix.


A text message? This late? Can't be good.

Grudgingly, I pause NetFlix after a few seconds ("How it's made" is super addicting) to check out the messages…

From PagerDuty: Your API is Down!

Fuck! An alert telling me that my site is down. EXCUSE ME! I'm over here trying to learn how Nail Clippers are made. Damnit. Off to the computer.

I sit down and try to load up and… nothing. Chrome just sits there, the condescending little blue circle thing spinning into oblivion.

My site is DOWN, and it's down hard. Requests are timing out.

What I do next might surprise you.. but first, a question—

What tactics would you use? Some good answers might be…

  • Check out automated alerts, like ones sent from Nagios
  • Look at your alerting/graphing systems like Copperegg or Graphite
  • Verify the "usual suspects" like Memcache or MySQL are healthy

But that's NOT what I did. Any guesses?

I logged into one of my PHP servers, grabbed one of the PIDs for a PHP-FPM worker, and ran strace -p.

What?? Isn't strace a tool for hardcore unix neckbeards?


If you're not using strace on a reocurring basis, you're doing yourself a MAJOR disservice. It's perfect for quick and dirty debugging. Sure, in an IDEAL world, you'd get a notification telling you exactly which service was down, but I'm a realist and I know that doesn't always happen.

When your site is down, wishing you had better alerting isn't the answer. You need to get your API back up ASAP. Everything else comes second.

strace is your cheat code

Strace is a free unix tool that comes with almost every linux distro out there. You give Strace a PID (in my case, the PID of a PHP-FPM worker) and it instantly hooks into the running process and tells you what kernel system calls it's making.

A system call is like a low-level function that your PHP code makes to the operating system. Things like— "read a file" and "connect to this mysql server" use system calls.

It so happens that when your app is blocking and dead-in-the-water, it's because your PHP code is waiting for a system call to respond. Viola!

So, I log into my webserver over SSH and run ps to grab a random PHP-FPM process id, like this:

$ ps aux | grep php-fpm
  myapp 128131 2.9 0.0 324132 57396 ? R 10:54 11:50 php-fpm: pool www

See 128131? That's a process id I can get a sneak peak into using strace.

Next, I attach strace to the process and see what's up…

$ sudo strace -p 128131

  gettimeofday({1399247077, 563898}, NULL) = 0
  sendto(17, "SELECT * FROM users WHERE id =", 254, MSG_DONTWAIT, NULL, 0) = 254

I see it waiting on the last line for several seconds. Just, like that, I can immediately see that it's sending a SQL query to the database server and is waiting for the response.. looks like my database server is foobared again!

And just like magic, I'm immediately able to diagnose the downtime. Strace gives me the power to address the database server faster, fix the problem, and get back to watching "How It's Made".

A good friend of mine is a scaling expert…

He's the kind of guy that can not only program well, but can also talk for hours about hardcore linux internals. I asked him—

"If you could teach everyone just ONE thing about scaling… what would it be?"

His response? Strace. 1000 times strace. Nothing else can give you so much insight and information about your stack in a single command. You can have all of the monitoring IN THE WORLD, and strace will STILL give you new insights into your application.

Get a free chapter from the book

Want to check out the book before buying? Just tell me where to send the entire first chapter— I'll let you check it out for free.

It's your choice, you can spend 2 years learning this stuff on your own, or I can guide you through it in a weekend.