Sheeri Dot Org – Page 20 – Musings on topics including product management, tech, art.

Liveblogging: Senior Skills: Python for Sysadmins

Why Python?

– Low WTF per minute factor
– Passes the 6-month test (if you write python code, going back in 6 months, you pretty much know what you were trying to do)
– Small Shift/no-Shift ratio (ie, you use the “Shift” key a lot in Perl because you use $ % ( ) { } etc, so you can tell what something is by context, not by $ or %)
– It’s hard to make a mess
– Objects if you need them, ignore them if you don’t.

Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
  File "<stdin>", line 2
    return mystr.swapcase()
         ^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

List slicing with “:”
>>> v[:2]
[1, 2]
>>> v[4:]
[5, 6, 7, 8, 9]
>>> v[4:9]
[5, 6, 7, 8, 9]
Note that there’s no error returned even though there’s no field 9. If you did v[9], you’d get an error:
>>> v[9]
Traceback (most recent call last):
File ““, line 1, in
IndexError: list index out of range

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

A list can be made of a transformation, an iteration and optional filter:
[ i*i for i in mylist if i % 2 == 0]
transformation is i*i
iteration is for i in mylist
optional filter is if i % 2 == 0

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

Quotes and string formatting
– You can use single and double quotes inside each other
– Inside triple quotes, you can use single and double quotes
– Variables are not recognized in strings, uses printf-style string formatting:

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
 print "x == y"
for k,v in mydict.iteritems():
 if v is None:
  continue
 print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

Python modules for sysadmins:
– sys
– os
– urlib/urlib2
– time, datetime (and calendar)
– fileinput
– stat
– filecmp
– glob (to use wildcards)
– shutil
– gzip
– tarfile
– hashlib, md5, crypt
– logging
– curses
– smtplib and email
– cmd

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:

cabral@pythianbos2 ~
$ python
Python 2.5.2 (r252:60911, Dec  2 2008, 09:26:14)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

This was liveblogged, please let me know any issues, as they may be typos….

Why Python?

Basics
Here’s a sample interpreter session. The >>> is the python prompt, and the … is the second/subsequent line prompt:

>>> x='hello, world!';
>>> x.upper()
'HELLO, WORLD!'
>>> def swapper(mystr):
... return mystr.swapcase()
  File "<stdin>", line 2
    return mystr.swapcase()
         ^
IndentationError: expected an indented block

You need to put a space on the second line because whitespace ‘tabbing’ is enforced in Python:

>>> def swapper(mystr):
...  return mystr.swapcase()
...
>>> swapper(x)
'HELLO, WORLD!'
>>> x
'hello, world!'

Substrings
partition is how to get substrings based on a separator:

>>> def parts(mystr, sep=','):
...  return mystr.partition(sep)
...
>>> parts(x, ',')
('hello', ',', ' world!')

You can replace text, too, using replace.

>>> def personalize(greeting, name='Brian'):
...  """Replaces 'world' with a given name"""
...  return greeting.replace('world', name)
...
>>> personalize(x, 'Brian')
'hello, Brian!'

By the way, the stuff in the triple quotes is automatic documentation. A double underscore, also called a “dunder”, is to print the stuff in the triple quotes:

>>> print personalize.__doc__
Replaces 'world' with a given name

Loop over a list of functions and do that function to some data:

>>> funclist=[swapper, personalize, parts]
>>> for func in funclist:
...  func(x)
...
'HELLO, WORLD!'
'hello, Brian!'
('hello', ',', ' world!')

Lists

>>> v=range(1,10)
>>> v
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> v[1]
2
>>> v[5]
6
>>> v[-1]
9
>>> v[-3]
7

Python uses pointers (or pointer-like things) so v[1:-1] does not print the first and last values:

>>> v[1:-1]
[2, 3, 4, 5, 6, 7, 8]

The full array syntax is [start:end:index increment]:

>>> v[::2]
[1, 3, 5, 7, 9]
>>> v[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> v[1:-1:4]
[2, 6]
>>> v[::3]
[1, 4, 7]

Make an array of numbers with range

>>> l=range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Make a list from another list

>>> [pow(num,2) for num in l]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

append appends to the end of a list

>>> l.append( [pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]]
>>> l.pop()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

extend takes a sequence and puts it at the end of the array.

>>> l.extend([pow(num,2) for num in l])
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

>>> L=range(1,6)
>>> L
[1, 2, 3, 4, 5]
>>> [ i*i for i in L if i % 2 == 0]
[4, 16]

Tuples
Tuples are immutable lists, and they use () instead of []
A tuple always has 2 elements, so a one-item tuple is defined as
x=(1,)

Dictionaries aka associative arrays/hashes:

>>> d = {'user':'jonesy', 'room':'1178'}
>>> d
{'user': 'jonesy', 'room': '1178'}
>>> d['user']
'jonesy'
>>> d.keys()
['user', 'room']
>>> d.values()
['jonesy', '1178']
>>> d.items()
[('user', 'jonesy'), ('room', '1178')]
>>> d.items()[0]
('user', 'jonesy')
>>> d.items()[0][1]
'jonesy'
>>> d.items()[0][1].swapcase()
'JONESY'

There is no order to dictionaries, so don’t rely on it.

>>> word='World'
>>> punc='!'
>>> print "Hello, %s%s" % (word, punc)
Hello, World!

Braces, semicolons, indents
– Use indents instead of braces
– End-of-line instead of semicolons

if x == y:
 print "x == y"
for k,v in mydict.iteritems():
 if v is None:
  continue
 print "v has a value: %s" % v

This seems like it might be problematic because of long blocks of code, but apparently code blocks don’t get that long. You can also use folds in vim [now I need to look up what folds in vim are].

You can’t assign a value in a conditional statement’s expression — because you can’t use an = sign. This is on purpose, it avoids bugs resulting from typing if x=y instead of if x==y.

The construct has no place in production code anyway, since you give up catching any exceptions.

The Zen of Python
To get this, type ‘python’ in a unix environment, then type ‘import this’ at the commandline. I did this on my Windows laptop running Cygwin:
Continue reading “Liveblogging: Senior Skills: Python for Sysadmins”

Liveblogging: Senior Skills: Grok awk

[author’s note: personally, I use awk a bunch in MySQL DBA work, for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]

Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd

Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd

Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd

NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd

NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)

Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.

– “main loop” of input processing is done for you
– awk initializes variables for you, to 0
– input is viewed by awk as ‘records’ which are splittable into ‘fields’

This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.

awk key points:
– splits text into fields
– default delimiter is “any number of spaces”
– reference fields
– $0 is entire line
– create filters using ‘addresses’ which can be regexps (similar to sed)
– Turing-complete language
– has if, while, for, do-while, etc
– built-in math like exp, log, rand, sin, cos
– built-in string sub, split, index, toupper/lower

Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}

only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk

only action, no pattern:
{print $2,$1}
do this to all lines of input

NR % 3 == 0
print every third line (pattern is %NR mod 3)

{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field

built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)

Patterns
– used to filter lines processed by awk
– can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd

– Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd

awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd

Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”

Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″

This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd

This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd

Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.

awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.

Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.

Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt

The part before the END creates the associative array, the part after the END prints the array.

Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt

ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698

For the line “ABC,100,12.14,19.12″
the function becomes

x[ABC] = 100 * (19.12 - 12.14) = 698

Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt

Note that y is a running *sum* (not a running count like before).

Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
  y+=x[$1]
 }
END {
 for(z in x) {
  print z, x[z]
  }
 }  # end for loop
 {
 print "Net:"y
 } # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….