Pages

Showing posts with label Code. Show all posts
Showing posts with label Code. Show all posts

Friday, February 27, 2015

Pipes as Code

Finally we have started to move away from having build pipes as a chain of Jenkins jobs. There has been alot written of the subject that CI systems arnt well suited to implement CD processes. Let me first give a short recap on why before I get into how we now delivery our Pipes as Code.

First of all pipes in CI systems have bad portability. They are usually a chain of jobs set up through either a manual process or through some sort of automation based on a api provided by the CI system. The inherrited problem here is that the pipe executes in the CI system. This means that it is very hard to test and develope a pipe using Continuous Delivery. Yes we need to use Continuous Delivery when implementing our Continuous Delivery tooling otherwise we will nto be able to deliver our CD Processes in a qualitative, rapid and reliable way.

Then there is the problem of that data that we collect during the pipe. By default the data in a CI system is stored in that CI systems. Often on disk on that instans of that CI server. Adding insult to injury navigation of the build data is often tided to the current implementation of the build pipe. This means that a change to the build pipe means that we can no longer access the build data.

For a few years now we have been off loading all the build data into different types of storages depending on what type of data it is. Meta data around the build we store in a custom database. Logs go to our ELK stack, metrices to Graphite and reports to S3.

Still we have had trouble delivering quality Pipes. Now that has changed.

We still use a CI Server to trigger the Pipe. On the CI server we now have one job "DoIt". The "DoIt" job executes the right build pipe for every application. Lets talk a bit on how we pick the pipe.

Each git repo contains a YML file that says how we should build that repo. Thats more or less the only thing that has to be in the repo for us to start building it. We ingore all repos without the YML files. So we listen to all the gerrit triggers and ignore ones withouth

The YML is simply pretty much just

pipe: application-pipe
jdk: JDK8


We describe our build pipes in YML and implement our tasks in Groovy. Here is a simple definition. 

build: 
    first:
     - do: setup.Clean     - do: setup.Init
    main:
     - do: build.Build     - do: test.Test
    last:    
     - do: log.ReportBuildStatus
last: 
   last:    
     - do: notify.Email

Each task has a lifecycle of first, main, last. The first section is always executed and all of the "do´s" in the first section are executed regardless of result. In the main secion the "do´s" are only execute if everything has gone well so far. Last is always executed regardless of how things went.

The "do´s" are references to groovy classes with the first mandatory part of the package stripped. So there is a com.something.something.something.setup.Clean class.

A Context object is passed through all the execute methods of the "do´s". By setting context.mock=true the main executing process adds the sufix "Mock" to all "do´s". This allows us to unit test the build pipe inorder to assert that all the steps that we expect to happen do happen in the correct order.

When alot of things start happening its not really practicall to have a build task all that verbose especially since we have multiple pipes that share the same build task. So we can create a "build.yml"  and a "notify.yml" which we then can include like this.

build: 
    ref : build
last: 
   last:    
     - do: notify

So this is how our build pipes look and we can unit test the pipe, the tasks and each "do" implementaiton.

Looking at a full pipe example we get something like this.

init:
    ref: init
build:
    parallel:
        build:
            ref: build.deployable
        provision:
            ref: provision.create-test-environment  
deploy:            
    ref: deploy.deploy-engine
test:            
    functional-test
        ref: test.functional-tests
    load-test: 
        ref: test.load-tests
release: 
    parallel:
        release:
            ref: release.publish-to-nexus
        bake:
            ref: bake.ami-with-packer
last: 
    parallel:
        deprovision:
            ref: provision.destroy-test-environment
        end:

            ref: end

Thats it.!

This pipe builds, functional tests, load tests and publishes our artifacts as well as baking images for our AWS environments. All the steps report to our meta data database, elk, graphite, s3 and slack.

And ofcourse we use our build pipes to build our build pipe tooling.

Continuous Delivery of Continuous Delivery through build Pipes as Code. High score on the buzzword bingo!

Monday, July 7, 2014

Continuous Deployment in the Cloud Part 2: The Pipeline Engine in 100 lines of code

As I talked about in my previous post in this series we need to treat our Continuous Delivery process as a distributed system and as part of that we need to move the Pipe out of Jenkins and into a first class citizen of its own. Aside from the facts that a CI Tool is a very bad Continuous Delivery/Deploy orchestrator I find the potential of executing my pipe from anywhere in the cloud very tempting.

If my pipe is a first class citizen of its own and executable on any plain old node then I can execute it anywhere in the cloud. Which means I can execute it locally on my laptop, a simple minion in my own cloud or in one of all of the managed CI services that have surfaced in the Cloud.

To accomplish this we need five very basic and simple things

  1. a pipeline configuration that defines what tasks to execute for that pipe
  2. a pipeline engine that executes tasks
  3. a library of task implementations
  4. a definition of what pipe to use with my artefact
  5. a way of distributing the pipeline engine to the node where I want to execute my pipeline
Lets have a look.

Define the pipeline

The pipeline is a relatively simple process that executes tasks. For the purpose of this blog series and for simple small deliveries sequential execution of tasks can be sufficient but at my work we do run a lot of parallel sub pipes to improve the throughput on the test parts. So our pipe should be able to handle both.

We also want to be able to run the pipe to a certain stage. Like from start to regression test and then step by step launch in QA and launch in Prod. Obviously of we want to do continuous deployment we don't need to worry too much about that capability. But I include it just to cover a bit more scope.

Defining pipelines is no real rocket science and in most cases a few archetype pipelines will cover like 90% of the pipes needed for a large scale delivery. So I do like to define a few flavours of pipes that gives us the ability to distribute a base set of pipes for CI to CD.

Once we have defined the base set of pipe flavours the each team should configure which pipe they want to handle their deliverables.

I define my pipes something like this.

name: Strawberry
pipe: 
  - do:
     - tasks:
       - name: Build
         type: mock.Dummy
  - do:
     - tasks:
       - name: Deploy A
         type: mock.Dummy
       - name: Test A
         type: mock.Dummy
     - tasks:
       - name: Deploy B
         type: mock.Dummy
       - name: Test B
         type: mock.Dummy
    parallel: true     
  - do:
     - tasks:
       - name: Publish
         type: mock.Dummy
  
A pipe named Strawberry which builds our services then deploys it in two parallel pipes where it executes two test suites and finally publishes the artefacts in our artefact repo.  At this stage each task is just executed with a Dummy task implementation.

The pipeline engine

We need a mechanism that understands our yml config and links it to our library of executable tasks.

I use Groovy to build my engine but it can just as easily be built in any language. Ive intentionally stripped down some of the logging I do but this is basically it.  In about 80 lines of code we have a engine that loads tasks defined in a yml, executes then in serial or parallel and has the capability to run all tasks, the tasks up to one point or a single task.

@Log
class BalthazarEngine {

    def int start(Map context){
        def status = 0
        def definition = context.get "balthazar-pipe" 
        
        for (def doIt:  definition["pipe"] ){
            status = executePipe(doIt, context)
        }
        return status
    }
    def int executePipe(Map doIt, Map context){
        def status = 0
        if (doIt.parallel == true){
            status = doItParallel(doIt,context)
        } else {
            status = doItSerial(doIt,context)
        }
        return status
    }
    
    def int doItSerial(def doIt, def context){
        def status = 0
        for (def tasks : doIt.do.tasks){
            status = executeTasks(tasks, context)
        }
        return status
    }
    
    def int doItParallel(def doIt, def context){
        def status = new AtomicInteger()
        def th
        for (def tasks : doIt.do.tasks){
            def cloneContext = deepcopy(context)
            def cloneTasks = deepcopy(tasks)
            th = Thread.start {
                status = executeTasks(cloneTasks, cloneContext)
            }
        }
        th.join()
        return status
    }   
    
    def int executeTasks(def tasks, def context){
        def status = 0
        for (def task : tasks){
            //execute if the run-task is not specified or if run-task equqls this task
            if (!context["run-task"] || context["run-task"] == task.name){
                log.info "execute ${task.name}"
                context["this.task"] = task
                def impl = loadInstanceForTask task;
                status = impl.execute context 
            }
            
            if (status != BalthazarTask.TASK_SUCCESS){
                break
            }
            
            if (context["run-to"] == task.name){
                log.info "Executed ${context["run-to"]} which is the last task, done executing."
                break
            }
        }
        return status
    }
    def loadInstanceForTask(def task){
        def className = "balthazar.tasks.${task.type}"
        def forName = Class.forName className
        return forName.newInstance()
    
    }
    def deepcopy(orig) {
        def bos = new ByteArrayOutputStream()
        def oos = new ObjectOutputStream(bos)
        oos.writeObject(orig); oos.flush()
        def bin = new ByteArrayInputStream(bos.toByteArray())
        def ois = new ObjectInputStream(bin)
        return ois.readObject()
    }
}

A common question I tend to get is "why not implement it as a lifecycle in maven or gradle". Well I want a process that can support building in maven, gradle or any other tool for any other language.  Also as soon as we use another tool to do our job (be it a build tool, a ci server or what ever) we need to adopt to its lifecycle definition of how it executes its processes. Maven has its lifecycle stages quite rigidly defined and I find it a pita to redefine them. Jenkins has its pre, build, post stages where its a pita to share variables. And so on. But most importantly use build tools for what they do well and ci tools for what they do well and none of that is implementing CD pipes.

Task library.

We need tasks for our pipe engine to execute. The interfaces for a task is simple.

public interface BalthazarTask {
    int execute(Map<String, Object> context);
}

Then we just implement them. For my purpose I package tasks in "balthazar.task.<type>.<task>" and just define the type and task in my yml. 

Writing tasks in a custom framework over say jobs in Jenkins is a joy.  You no longer need to do workaround to tooling limitations for simple things such as setting variables during execution.  

Anything you want to share you just put it on the context.

Here is an example of how two tasks share data.

  - tasks: 
    - name: Initiate Pipe
      type: init.Cerebro
  - tasks: 
    - name: Build
      type: build.Gradle
      command: gradle clean fatJar

I have two tasks. The first task creates a new version of the artefact we are building in my master data repository that I call Cerebro. (More on Cerebro in the next post). Cerebro is the master of all my build things and hence my version numbers come from there. So the init.Cerebro task takes the version from Cerebro and puts it on the context.

@Log
class Cerebro implements BalthazarTask {
    @Override
    def int execute(Map<String, Object> context){
        def affiliation = context.get("cerebro-affiliation")
        def hero = context.get("cerebro-hero")
        def key = System.env["CEREBRO_KEY"]
        def reincarnation =  CerebroClient.createNewHeroReincarnation(affiliation, key, hero)
        context.put("cerebro-reincarnation",reincarnation)
        return TASK_SUCCESS
    }
}


My build.Gradle task takes the version number from cerebro (called reincarnation) and sends it to the build script. As you can see I can use custom commands and in this case I do as fat jars is what I build. By default the task does gradle build. I can also define what log level I want my gradle script to run.

@Log
class Gradle implements BalthazarTask {
    @Override
    def int execute(Map<String, Object> context){
        def affiliation = context["cerebro-affiliation"]
        def hero = context["cerebro-hero"]
        def reincarnation = context["cerebro-reincarnation"]
        def command = context["this.task"]["command"] == null ? "gradle build": context["this.task"]["command"]
        def loglevel = context["this.task"]["loglevel"] == null ? "" : "--${context["this.task"]["loglevel"]}"
        
        def gradleCommand = """${command} ${loglevel} -Dcerebro-affiliation=${affiliation} -Dcerebro-hero=${hero} -Dcerebro-reincarnation=${reincarnation}"""

        def proc = gradleCommand.execute()                 
        proc.waitFor()                               
        return proc.exitValue()
    }
}

This is how hard it is to build tasks (jobs) if its done with code instead of configuring it in a CI tool. Sure some tasks like building Amazon AMI´s take a bit more of code. (j/k they don't). But ok a launch task that implements a rolling deploy on amazon using a A/B release pattern does but I will come back to that specific case.

Configure my repository

So I have a build pipe executor, pre built build pipes and tasks that execute in them. Now I need to configure my repository.

In my experience 90% of your teams will be able to use prefab pipes without investing too much effort into building tons of prefabs. A few CI a few simple CD and a few parallelized pipes should cover a lot of demand if you are good enough at putting an interface between the deploy tasks and the deploy as well as the test tasks and the deploy tools.

So in my repo I have a .balthazar.yml which contains.

balthazar-pipe: Strawberry

Distributing the pipeline engine and the task library

First thing we need is a balthazar client that starts the engine using the configuration provided inside my repository.  Simply a Groovy script does the trick.

@Log
class BalthazarRunner {
    def int start(Map<String, Object> context){
        Yaml yaml = new Yaml()
        if (!context){
            def projectfile = new File(".balthazar.yml")            
            if (projectfile.exists()){
                context = yaml.load projectfile.text
            } else {
                throw new Exception("No .balthazar.yml in project")
            }
            
        }
        def name = context.get "balthazar-pipe" 
        def definition = yaml.load this.getClass().getResource("/processes/${name}.yml").text 
        BalthazarEngine engine = new BalthazarEngine()
        
        context["balthazar-pipe"] =  definition
        context["run-to"] = System.properties["run-to"]
        context["run-task"] = System.properties["run-task"]
        return engine.start(context)
    }
}
def runner = new BalthazarRunner()
runner.start([:])

Now we need to distribute the client, our engine and our library of tasks to the node where we want to execute the pipeline with our code repository. This can be done in many ways.

We can package balthazar as a installable package and install it using yum or similar tool. This works quite well on build servers but it does limit us a bit on where we can run it as we need "that installer" to be installed on the target environment. In many cases its really isn't a problem because if your a Debian shop then you have your deb everywhere and if your a Redhat shop then you have your yum. 

I personally opted for another way of distributing the client. Partially because Im lazy and partially because it works on a lot of environments. When I make my balthazar.yml I also checkout the balthazar client project as a git submodule.

So all my projects have a .balthazar.yml and a balthazar-client folder. In my client folder I have a balthazar.sh and a gradle.build file. I use gradle to fetch the latest artefacts from my repo and then the shell script does the java -jar part. Not all that pretty but it works.

Summary

So now on all my repos I can just do...

>. balthazar-client/balthazar.sh

... and I run the pipe configured in .balthazar.yml on that repo. Since all my tasks integrate with my Build Data Repository I get ONE view of all my pipe executions regardless of where they where executed.

CD made fun! Cheers!