Quantcast
Channel: Tom Limoncelli's EverythingSysadmin Blog
Viewing all 568 articles
Browse latest View live

January LOPSA NJ Chapter Meeting Announcement


SSH debugging sucks

$
0
0

How much human productivity is lost every day due to the horrible debugging messages in SSH? I bet it is thousands of hours world-wide. It isn't just sysadmins: programmers, web developers, and many non-technical users are frustrated by this.

I'm pretty good at debugging ssh authentication problems. The sad fact is that most of my methodology involves ignoring the debug messages and just "knowing" what to check. That's a sad state of affairs.

The debug messages for "ssh -v" should look like this:

HELLO!
I AM TRYING TO LOG IN. I'VE TOLD THE SERVER I CAN USE (method,method,method).
I AM NOW TRYING TO LOG IN VIA (method).
I AM SENDING (public key).
THAT DID NOT WORK. I AM SAD.
I AM NOW TRYING TO LOG IN VIA (method).
I AM SENDING USERNAME foo AND a password of length x.
THAT DID WORK. I AM LOGGING IN.  I AM HAPPY.</code>

Similarly on the server side, "ssd -d" should look more like:

HELLO!
SOMEONE HAS CONTACTED ME FROM IP ADDRESS 1.1.1.1.
THEY HAVE TOLD ME THEY CAN LOG IN USING THE FOLLOWING METHODS: (method1,method2,method3).
THEY ARE NOW TRYING (method)
THEY GAVE ME (first 100 bytes of base64 of public key)
THAT DID NOT WORK.
TIME TO TRY THE NEXT METHOD.
THEY ARE NOW TRYING (method)
THEY GAVE ME A PASSWORD OF LENGTH x
THAT DID WORK.
I WILL LET THEM LOG IN NOW.

Instead we have to look at messages like:

debug1: monitor_child_preauth: tal has been authenticated by privileged process
debug3: mm_get_keystate: Waiting for new keys
debug3: mm_request_receive_expect entering: type 26
debug3: mm_request_receive entering
debug3: mm_newkeys_from_blob: 0x801410a80(150)
debug2: mac_setup: found hmac-md5-etm@openssh.com
debug3: mm_get_keystate: Waiting for second key
debug3: mm_newkeys_from_blob: 0x801410a80(150)

Sigh.

I actually started looking at the source code to OpenSSH today to see how difficult this would be. It doesn't look too difficult. Sadly I had to stop myself because I was procrastinating from the project I really needed to be working on.

I'd consider paying a "bounty" to someone that would submit a patch to OpenSSH that would make the debug logs dead simple to understand. Maybe a kickstarter would be a better idea.

The hard part would be deciding what the messages should be. I like the Kibo-esque (well, actually B1FF-esque) version above. I hope you do too.

If anyone is interested in working on this, I'd be glad to give input. If someone wants to do a kickstarter I promise to be the first to donate.

bash: Restart bash if old version detected

$
0
0

I write a lot of small bash scripts. Many of them have to run on MacOS as well as FreeBSD and Linux. Sadly MacOS comes with a bash 3.x which doesn't have many of the cooler features of bash 4.x.

Recently I wanted to use read's "-i" option, which doesn't exist in bash 3.x.

My Mac does have bash 4.x but it is in /opt/local/bin because I install it using MacPorts.

I didn't want to list anything but "#!/bin/bash" on the first line because the script has to work on other platforms and on other people's machines. "#!/opt/local/bin/bash" would have worked for me on my Mac but not on my Linux boxes, FreeBSD boxes, or friend's machines.

I finally came up with this solution. If the script detects it is running under an old version of bash it looks for a newer one and exec's itself with the new bash, reconstructing the command line options correctly so the script doesn't know it was restarted.

#!/bin/bash
# If old bash is detected. Exec under a newer version if possible.
if [[ $BASH_VERSINFO < 4 ]]; then
  if [[ $BASH_UPGRADE_ATTEMPTED != 1 ]]; then
    echo '[Older version of BASH detected.  Finding newer one.]'
    export BASH_UPGRADE_ATTEMPTED=1
    export PATH=/opt/local/bin:/usr/local/bin:"$PATH":/bin
    exec "$(which bash)" --noprofile "$0" """$@"""
  else
    echo '[Nothing newer found.  Gracefully degrading.]'
    export OLD_BASH=1
  fi
else
  echo '[New version of bash now running.]'
fi

# The rest of the script goes below.
# You can use "if [[ $OLD_BASH == 1]]" to
# to write code that will work with old
# bash versions.

Some explanations:

  • $BASH_VERSINFO returns just the major release number; much better than trying to parse $BASH_VERSION.
  • export BASH_UPGRADE_ATTEMPTED=1 Note that the variable is exported. Exported variables survive "exec".
  • export PATH=/opt/local/bin:/usr/local/bin:"$PATH":/bin We prepend a few places that the newer version of bash might be. We postpend /bin because if it isn't found anywhere else, we want the current bash to run. We know bash exists in /bin because of the first line of the script.
  • exec $(which bash) --noprofile "$0" """$@"""
    • exec This means "replace the running process with this command".
    • $(which bash) finds the first command called "bash" in the $PATH.
    • "$(which bash)" By the way... this is in quotes because $PATH might include spaces. In fact, any time we use a variable that may contain spaces we put quotes around it so the script can't be hijacked.
    • --noprofile We don't want bash to source .bashrc and other files.
    • "$0" The name of the script being run.
    • """$@""" The command line arguments will be inserted here with proper quoting so that if they include spaces or other special chars it will all still work.
  • You can comment out the "echo" commands if you don't want it to announce what it is doing. You'll also need to remove the last "else" since else clauses can't be empty.

Enjoy!

DrupalCamp NJ, Princeton, Feb 1, 2014

$
0
0

Feb 1 will be the 3rd annual DrupalCamp NJ on the campus of Princeton University http://www.drupalcampnj.org/. This is the first year with a keynote speaker - Brian Kernighan! Tickets are only $25, which includes coffee, lunch, and an after-party.

In addition, the day prior on Jan 31, there are 4 low-cost, full-day training sessions http://www.drupalcampnj.org/training.

awk. How I missed you.

$
0
0
awk </dev/null \
'END { for (i=0; i <13 ;i++) \
{ printf("%02d:00-%02d:30\n%02d:30-%02d:00\n", i, i, i, i+1) }}'

The output was pasted into a spreadsheet.

I don't think this is how the creators of the original spreadsheet imaged things.

LOPSA-East - CFP deadline Jan 22, 2014

$
0
0

The call for participation deadline is Wednesday, January 22nd, 2014.

LOPSA-East is looking for talks on system-administration related topics especially advanced techniques, DevOps stuff, and etc. I particularly enjoy hearing about project successes... if you have done something exciting where you work, propose a talk about it. That how I got my start!

The full CFP is here: http://lopsa-east.org/2014/

If you haven't heard of LOPSA-East, it is our regional Linux/Sysadmin conference; we expect about 150 people. People come from all over the east coast (and often Europe!).

The event is May 2 - 3, 2014, in lovely New Brunswick, NJ, USA.

Spread the word!

LOPSA-East: CFP extended to Jan 31!

$
0
0

The LOPSA-East "call for participation" has extended the submission deadline to Fri, Jan 31. You have an extra week to send in your proposed talks.

In particular, anything related to cutting edge operational issues ("devops") and new technology (wha t sysadmins should know about "new" things like SSDs, etc). Personally I'd like to see more "culture" talks. If you've done an awesome project in the last year and would like to talk about it, write it up and submit it soon!

LOPSA-East is May 2-3, 2014 in New Brunswick, NJ. Easy to get to via train or car from anywhere on the east coast.

Seattle: CascadiaIT'14 Registration is OPEN

$
0
0

Tell your friends, tell your neighbors, tell your friends' neighbors and your neighbors' friends!

http://casitconf.org/casitconf14/registration-is-now-open/

I'll be teaching "Evil Genius 101" and "Team Time Management & Collaboration" half-day tutorials. Plus I'll be giving a talk on Saturday about "The Stack at StackExchange".

The conference is March 7-8, 2014 in Seattle, WA. While it is a regional conference, people come from all over.

Hope to see you there!


We're hiring at StackExchange! SRE with Networking focus

$
0
0

Come join the team that runs ServerFault, StackOverflow, and over 100 other Q&A websites plus "Careers 2.0", the most awesome job site around. We have a great manager (I'm not just saying that because he reads my blog) and cool coworkers!

Site Reliability Engineer, Networking (by which mean mean a Linux SRE that knows networking really well; and isn't afraid of Windows)

Examples of projects that you'll work on:

  • Bring configuration management to our network infrastructure so we can scale to N datacenters
  • Tune both our network equipment and servers to have the lowest latency possible
  • Make our site-to-site VPN connections highly available

https://careers.stackoverflow.com/jobs/47588

LOPSA NJ Chapter: Feb meeting is a dinner meetup

$
0
0

LOPSA NJ's February meeting is at two different restaurants, Northern NJ and Southern-ish NJ. The planned discussion topic is "What are some of the most challenging problems that have come up in the last 24 months?"

In the past these "cluster meetings" have been really fun, full of interesting war stories as well as technical info.

If you are in the area, I hope see see you there!

Seattle: CascadiaIT'14 hotel discount ends Feb 8!

$
0
0

The hotel discount ends on Feb 8th so book your room as soon as possible!

CascadiaIT is an awesome regional conference for sysadmins and devops. If you look at the schedule you're sure to see talks and tutorials you won't want to miss.

I'll be teaching "Evil Genius 101" (on how to influence your boss and team) and " Team Time Management & Collaboration". On Saturday I'll be giving a talk about how StackExchange works.

While this is a "regional conference" it is drawing people from all over the West coast, Pacific North West, and more. You should be there too.

http://casitconf.org

Time Management Tip: The Oscars

$
0
0

Do you marathon through entire seasons of TV shows in a weekend? You might want to check this out.

AMC has an event where you can watch all the "best picture" nominations. It is pretty intense but awesome. On the first Saturday you watch 4 of them in a row in a 10-hour session. On the second Saturday you watch the other 5 in a 12-hour session. The following say is the award show.

Watching the awards when you've seen 9 of the most nominated films is a different experience.

Some of the benefits: If you've been too busy to get to the theaters all year, this is a great way to consolidate a lot of what you missed into 2 days. The theater is filled with film devotees, so there's no talking, no kids, no txting. There is a dinner break around 5pm, but otherwise you get to gorge yourself on movie theater junk food. (The AMCs near me have pizza.) If you are an AMC "Stubs" member you get $5 worth of "Stubs Bonus Bucks" each day.

https://www.amctheatres.com/events/best-picture-showcase

I try to do this every year. This year I might only be able to get to one of the two days, which is a shame.

If you are a movie lover I highly recommend this.

(P.S. This isn't a paid endorsement but for full disclosure, I own a small number of AMC shares.)

Sysadmins that can't script have a choice.

$
0
0

Scripting is becoming more and more important. With everything from computers to networks going virtual, installation is becoming an API call, not a "walk to a rack/desk/whatever and plug it in" call.

If you know how to script, you can automate those things.

In a few years I can't imagine a system administrator being able to keep their job and/or compete with others if they can't script.

There is an exception, of course: People that do desktop/laptop system administration and general in-office IT service. However those jobs are turning more and more into the equivalent of working at a mobile phone store: helping people is basic equipment problems and customer support. Those jobs pay less than half what a sysadmin normally gets paid.

So... do you want to change jobs to one that cuts your salary in half, or lose your job completely when someone that does know how to script replaces you?

It's your choice.

Tool Building Versus Automation

$
0
0

I make a distinction between tool building and automation. Tool building improves a manual task so that it can be done better. Automation eliminates the task. A process is automated when a person does not have to do it any more. Once a process is automated a system administrator's role changes from doing the task to maintaining the automation.

There is a discussion on Snopes about this photo. It looks like the machine magically picks and places bricks. Sadly it does not.

TIger Stone brick road laying machine

If you watch this video, you see that it requires people to select and place the bricks. It is a better tool. It doesn't eliminate the skilled work of a bricklayer, but it assists them so that their job is done easier and better. Watch the video:

(The music was very soothing, wasn't it?)

This machine doesn't automate the process of bricklaying. It helps the bricklayer considerably.

In a typical cloud computing environment every new machine must be configured for its role in the service. The manual process might involve loading the operating system, installing certain packages, editing configuration files, running commands and starting services. An SA could write a script that does these things. For each new machine the SA runs the script and the machine is configured. This is an improvement over the manual process. It is faster, less error-prone, and the resulting machines will be more consistently configured. However, an SA still needs to run the script, so the process of setting up a new machine is not automated.

Automated processes do not require system administrator action. To continue our example, an automated solution means that when the machine boots, it discovers its identity, configures itself, and becomes available to provide service. There is no role for an SA who configures machines. The SA's role transforms into maintaining the automation that configures machines.

Cloud administrators often maintain the systems that make up a service delivery platform. To give each new developer access an SA might have to create accounts on several systems. The SA can create the accounts manually, write a script that creates the accounts, or write a job that runs periodically to check if new developers are listed in the human resources database and then automatically create the new accounts. In this last case, the SA no longer creates accounts---the human resources department does. The SA's job is maintaining the account creation automation.

SAs often repeat particular tasks, such as configuring machines, creating accounts, building software packages, testing new releases, deploying new releases, deciding that more capacity is needed, providing that capacity, failing over services, moving services, moving or reducing capacity. All of these tasks can be improved with better tools. Good tools are a stepping stone to automation. The real goal is full automation.

Another advantage of full automation is that it enables SAs to collect statistics about defects, or in IT terms: failures. If certain situations tend to make the automation fail, those situations can be tracked and investigated. Often automation is incomplete and certain edge cases require manual intervention. Those cases can also be tracked, categorized, and the more pervasive ones become prioritized for automation.

Tool building is good, but full automation is required for scalable cloud computing.

How not to use Cron

$
0
0

A friend of mine told me of a situation where a cron job took longer to run than usual. As a result the next instance of the job started running and now they had two cronjobs running at once. The result was garbled data and an outage.

The problem is that they were using the wrong tool. Cron is good for simple tasks that run rarely. It isn't even good at that. It has no console, no dashboard, no dependency system, no API, no built-in way to have machines run at random times, and its a pain to monitor. All of these issues are solved by CI systems like Jenkins (free), TeamCity (commercial), or any of a zillion other similar systems. Not that cron is all bad... just pick the right tool for the job.

Some warning signs that a cron job will overrun itself: If it has any dependencies on other machines, chances are one of them will be down or slow and the job will take an unexpectedly long time to run. If it processes a large amount of data, and that data is growing, eventually it will grow enough that the job will take longer to run than you had anticipated. If you find yourself editing longer and longer crontab lines, that alone could be a warning sign.

I tend to only use cron for jobs that have little or no dependencies (say, only depend on the local machine) and run daily or less. That's fairly safe.

There are plenty of jobs that are too small for a CI system like Jenkins but too big for cron. So what are some ways to prevent this problem of cron job overrun?

It is tempting to use locks to solve the problem. Tempting but bad. I once saw a cron job that paused until it could grab a lock. The problem with this is that when the job overran there was now an additional process waiting to run. They ended up with zillions of processes all waiting on the lock. Unless the job magically started taking less time to run, all the jobs would never complete. That wasn't going to happen. Eventually the process table filled and the machine crashed. Their solution (which was worse) was to check for the lock and exit if it existed. This solved the problem but created a new one. The lock jammed and now every instance of the job exited. The processing was no longer being done. This was fixed by adding monitoring to alert if the process wasn't running. So, the solution added more complexity. Solving problems by adding more and more complexity makes me a sad panda.

The best solution I've seen is to simply not use cron when doing frequent, periodic, big processes. Just write a service that does the work, sleeps a little bit, and repeats.

while true ; do
   process_the_thing
   sleep 600
done

Simple. Yes, you need a way to make sure that it hasn't died, but there are plenty of "watcher" scripts out there. You probably have one already in use. Yes, it isn't going to run precisely n times per hour, but usually that's not needed.

You should still monitor whether or not the process is being done. However you should monitor whether results are being generated rather than if the process is running. By checking for something that is at a high level of abstraction (i.e. "black box testing"), it will detect if the script stopped running or the program has a bug or there's a network outage or any other thing that could go wrong. If you only monitor whether the script is running then all you know is whether the script is running.

And before someone posts a funny comment like, "Maybe you should write a cron job that restarts it if it isn't running". Very funny.


Seattle: CascadiaIT'14 keynote: Æleen Frisch

$
0
0

I'm excited to see that long-time sysadmin and author Æleen Frisch will be the keynote of this year's Cascadia IT conference, Seattle, March 7-8! If you don't recognize her name, check your bookshelf. You probably have a few of her books! http://casitconf.org/

There is still time to register. There are still a few seats left in the tutorials "Evil Genius 101" and "Team Time Management & Collaboration". Don't wait, register today!

There are also a dozens other excellent tutorials and talks. Plus, there are a lot of networking opportunities.

Hope to see you there!

http://casitconf.org/

LOPSA-East schedule published!

$
0
0

The schedule of talks and tutorials has been published!

I'm glad to announce that I'll be teaching 2 tutorials and giving 2 talks: "Tom's Top 5 Time Management Tips" and "Book Preview: The Practice of Cloud Administration".

My tutorials include "Evil Genius 101", which was standing-room only last year plus "Intro to Time Management for System Administrators" which hasn't been taught at LOPSA-East in quite a few years.

Registration opens soon. I look forward to seeing you at this year's conference!

Tom

Anti-Pattern: "The big project" that never ships

$
0
0

I was reminded of this excellent blog post by Leon Fayer of OmniTI.

As software developers, we often think our job is to develop software, but, really, that is just the means to an end, and the end is to empower business to reach their goals. Your code may be elegant, but if it doesn't meet the objectives (be they time or business) it doesn't f*ing work.

Likewise I've seen sysadmin projects that spent so much time in the planning stage that they never were going to ship unless someone stood up and said, "we've planned enough. I'm going to start coding whether you like it or not". Yes, that means that some aspect of the design wasn't perfect. Yet, the suggestion that more planning would lead to the elimination of all design imperfections is simply hubris. (If not hubris, it is a sign that one's OCD or OCD-like tendencies is being used as a cowardly excuse to not get started.)

But what I really want to write about today is...

"The big project that won't ship".

There once was a team that had a large software base. One part of it was obsolete and needed to be rewritten. It was written in an unsupported language. It didn't have half the features it needed. It didn't even have a GUI.

There were two proposals:

One was to refactor and recode bits of it until the system was replaced. Along the way every few weeks the results would see the light of day. There were many milestones: add a read-only "viewer" GUI, build a better data storage system, refactor the old code to use the new GUI, enhance the GUI to include full editing, etc.

The competing proposal was to assign 4 developers to build a replacement system. They'd be given 2 years to write the new system from scratch. During that time they'd be protected and, essentially, hidden. The justification for this was that the old system was so broken that doing any kind frankenstein half-old half-new system would be flatly impossible or would be a drag to efficiency. It would be more efficient to code it "pure" and not constantly be dealing with the old system.

Management approved the competing proposal. 1.5 years later the project hadn't gotten anywhere. When people were needed for other projects, management looked around and decided to steal the 4 engineers. This is because it is good management to take resources away from low priority projects and put it on high priority projects. Any project, no matter how noble, with no results for 18 months, is lower priority than a project with a burning need. In fact, the definition of a low priority project is that you can wait 2 years for the results.

The project was cancelled and 1.5 years of work was thrown away. 4 engineers times 18 months... at least a million dollars down the tube.

Meanwhile the person that proposed the incremental project had gone forward in parallel with the first milestone: a simple enhancement to the existing system that solved the biggest complaint of the system. It talked to the old datastore and would have to be re-engineered when the new datastore was finally available, but it worked and solved a very serious problem. It was a "half measure" but served its purpose.

The person that created the "half measure" had been scolded for wasting time on a parallel project. Yet, the "big" project was cancelled and this "half measure" is still in use today. At least he had the gravitas to not say "I told you so".

The biggest "cost" to a company is opportunity cost. That is, the loss of $$$ from not taking action. By shipping early and often you grab opportunity.

Imagine a factory that made widgets for 24 months, stored them in a warehouse, and then started shipping them all at once. That would be crazy. A factory sells what they make as soon as they are manufactured. Software companies used to write code for years and then ship it. That was crazy. Now you make a minimum viable product, ship that, and use the knowledge gained to make the next iteration.

My career advice is to only do projects that produce usable output every few weeks or months. Being on a project that will not show any results for a year or more is a good way to hide from management. Being invisible is a career killer. For software projects this means setting early milestones of some kind of minimal viable product. For purely operational projects be able to announce milestones or progress (number of machines converted, number of ms latency improvement, etc.)

At StackExchange there is a big project coming up related to how we provision new machines. While a "green field" approach would be nice, I'm looking into how we can refactor the current cruddy bits so that we can do this project incrementally. The biggest problem is that we have a crappy CMDB with no API. Everything seems to touch that one element and replacing it is going to be a pain. (I'd like to evaluate Flipkart/HostDb if anyone has opinions, let me know.) However I think we can restructure the project into 5 independent milestones. By "independent" I mean they can be done in any order with the other 4 requiring minimal refactoring as a result.

This will have a few benefits: We'll get the benefit of each milestone as it happens. Certain milestones can be done in parallel by different sub-teams. If the first few completed milestones make the process "good enough", we don't have to do the other milestones.

The FreeBSD Journal: Read it even if you don't use FreeBSD

$
0
0

The first issue of The FreeBSD Journal has finally shipped!

I got to read an early draft of the first issue and I was quite impressed by the content. It was a great way to learn what's new and interesting with FreeBSD plus read extended articles about specific FreeBSD technologies such as ZFS, DTrace and more.

Even if you don't use FreeBSD, this is a great way to learn about Unix in general and expand your knowledge of advanced computing technologies.

The Journal is a brand new, professionally produced, on-line magazine available from the various app stores, including Apple iTunes, Google Play, and Amazon Kindle.

Issue #1 is dedicated to FreeBSD 10 and contains articles not only about the latest release but also about running FreeBSD on the ARM-based Beagle Bone Black, working with ZFS, and all about our new compiler tool chain.

The Journal is guided by an editorial board made up of people you know from across the FreeBSD community including: John Baldwin, Daichi Goto, Joseph Kong, Dru Lavigne, Michael W. Lucas, Kirk McKusick, George Neville-Neil, Hiroki Sato, and Robert Watson. It is published 6 times a year.

You can subscribe by going to any of the following links:

The Journal is supported by the FreeBSD Foundation.

Jumpy, nervous, jittery user interfaces reduce "user trust"

$
0
0

What's with the trend of making user interfaces that hide until you mouse over them and then they spring out at you?

How did every darn company hop on this trend at the same time? Is there a name for this school of design? Was there a trendy book that I missed? Is there some UI blog encouraging this?

For example look at the new Gmail editor. To find half the functions you need to be smart enough or lucky enough to move the mouse over the right part of the editor for those functions to appear. Microsoft, Facebook, and all the big names are just as guilty.

I get it. The old way was to show everything but "grey out" the parts that weren't appropriate at the time. People are baffled by seeing all the options, even if they can't use them. I get it. I really do. Only show what people can actually use should be, in theory, a lot better.

However we've gone too far in the other direction. I recently tried to help someone use a particular web-based system and he literally couldn't find the button I was talking about because we had our mouses hovering over different part of the screens and were seeing different user interface elements.

Most importantly the new user interfaces are "jumpy". When you move the mouse across the screen (say, to click on the top menu bar) the windows you pass over all jump and flip and pop out at you. It is unnerving. As someone that already has a nervous and jittery personality, I don't need my UI to compete with me for being more jumpy, nervous and jittery.

I'm not against innovation. I like the fact that these designs give the user more "document space" by moving clutter out of the way. I understand that too many choices is stifling to people. I read The Paradox of Choice before most people. I swear... I get it!

But shouldn't there be a "reveal all" button that shows all the buttons or changes the color of all the "hover areas" so that if, like me, you didn't think of moving the mouse to the-left-side-of-the-screen-but-not-the-top-one-inch-because-for-some-reason-that-isn't-part-of-the-hover-space-oh-my-god-that-prevented-me-from-finding-an-option-for-two-months.

Why can't there be a way to achieve these goals without making a user interface that is jumpy and jittery?

User interfaces should exude confidence. They should be so responsive that they snap. Applications that are jumpy and jittery look nervous, uncomfortable, and unsure.

I can't trust my data to a UI that looks that way.

Viewing all 568 articles
Browse latest View live




Latest Images