Decouple with kwargs

So I’ve been attempting to make a suite of cli scripts for work. I recently discovered the multiprocessing module for python, and really liked its simplicity, and started using it, with great success. Everything was faster.

This then spurred me to take the scripts that were for the most part, copy common-ish bits and then modify to suite, and turn them into a library of sorts. The neat part then arose when I wanted to import a argument parser, as well as pass off to a proc creation component.

In doing this I had in mind that the ‘script’ would need to only define a function to make a list of commands to run on a given server, and a __main__ section that would pass in a list of servers, the function to make a command list and some other info. This way the script itself would only be two definition sections, and only the parts that were going to be unique for the most part.

The problem that came up in doing this is when I wanted the function that makes the command list to have more arguments that normal. How would I pass them in, and how would I define them so that I don’t have to edit my libraries to accommodate this argument passing. It was kwargs that saved me there, that and some optparse tweaking.

Here is a basic example:

script.py

def get_commands(host, **kwargs):
    command_list = []

    user = kwargs['user']
    key_file = kwargs['key_file']

    command_list.append("echo %s" % user)

    return command_list    

if __name__ == '__main__':
    import automation

    hosts = []

    (hosts, options) = automation.process_args(sys.argv)

    automation.thread_hosts(
            hosts,
            get_commands,
            options,
            user="test"
            )

automation.py (library w/ functions)

def thread_hosts(hosts, get_commands, options={}, **kwargs):
    import multiprocessing
 
    kwargs.update(options)

    jobs = []

    for host in hosts:
        p = multiprocessing.Process(
                target=run_commands,
                args=(
                    host,
                    get_commands(host, **kwargs),
                    ), )

        jobs.append(p)
        p.start()

So this example is a script that defines the function to return a command list, and provides an options var, and list of hosts. The thread hosts then loops over the hosts each time passing the host and the get_commands function to another library function that connects to said host, and loops over the returned command list.

A part that might be confusing is that the parse_args function returns optparse’s options variable but the options.__dict__ representation specifically. This then allows me to be able to update kwargs with any options that I allow to be set at the command line. The example in the script being the key_file variable.

The neat part of all this is being able to take the kwargs for one function and pass it right along to the next. This is key, because it allows for the library function in this case to be able to be entirely decoupled from the script itself.

With this implementation I am able to write a script that defines extra args to use, and only the script need know what they are. In the examples the library will just dumbly pass them along in the kwargs dict, I never have to tell it that I want to pass a user variable to it, and it makes the script a nice self contained unit.

Comments 3

  1. Heikki Toivonen wrote:

    There are some subtle gotchas in the way you coded thread_hosts. First of all the default is dict, which means that if you modify options in the function, the modifications are remembered in future calls. Try this:

    def foo(options={}):
        options[2] = 2
        print options

    foo({1:1})
    foo()

    The normal way to handle this is to make the default value None, and if so, make the value into dict inside the function:

    def foo(options=None):
        if options is None:
            options = {}
        options[2] = 2
        print options

    The other thing is that you are modifying the passed in datastructure, which would probably surprise the callers. Better way would be to create a copy of kwargs to play.

    Posted 24 Jul 2009 at 5:33 pm
  2. Alfred Rossi wrote:

    @Heikki Toivonen

    Your example doesn’t make your point. Since foo always sets option[2] = 2 it never does anything unexpected.

    alfred@alfred-desktop:~$ cat ./foo.py
    def foo(pair = None, options={}):
        if pair:
            key, value = pair
            options[key] = value

        print options

    foo()
    foo((2,2))
    foo()

    alfred@alfred-desktop:~$ python ./foo.py
    {}
    {2: 2}
    {2: 2}
    Posted 24 Jul 2009 at 5:33 pm
  3. Morgan wrote:

    Thats good to know on the function line definition. It wasn’t a case that came up in this instance, only because the thread_hosts is only run once. I don’t see a need to run it more than once, but I’ll probably implement it similar to how you’ve suggested, because I can’t be sure that someone else won’t want to run it twice, in one script.

    I think the most helpful suggestion was on my modification of kwargs by rolling options into it. It got me thinking that the script user needn’t have any interaction with the argument parser at all. Instead I will pass the thread_hosts function sys.argv, and let is deal with deciding what it need. The issue of changing kwargs still is present, but in this implementation its now hidden, and only crops up when the user wants to specify a kwarg that is already used in the optparse options.

    End result looking something like this:
    script.py

    if __name__ == '__main__':
        import automation

        automation.thread_hosts(
                automation.process_args(sys.argv),
                get_commands,
                options,
                user="test"
                )

    automation.py

    def thread_hosts(processed_args, get_commands, **kwargs):
        import multiprocessing

        hosts = []
        jobs = []

        (hosts, options) = processed_args
        kwargs.update(options)

        for host in hosts:
            p = multiprocessing.Process(
                    target=run_commands,
                    args=(
                        host,
                        get_commands(host, **kwargs),
                        ), )

            jobs.append(p)
            p.start()
    Posted 24 Jul 2009 at 5:33 pm

Post a Comment

Your email is never published nor shared. Required fields are marked *