Location Icon Dublin, Ireland

Deploying on Virtual Machines (VMs): A Comprehensive Guide

When we talk about software deployment, we're discussing the process of moving the source code – crafted for humans – into a location where it can be executed by machines. This guide will delve deep into deploying on virtual machines (VMs), touching upon different methods, considerations, and nuances.

Understanding the Deployment Landscape

In today's diversified technological landscape, there are myriad ways to deploy an application to servers, each with its own pros and cons. Kubernetes, for instance, offers standardized deployment models. However, VM deployments can be more variable, requiring a tailored approach.

Should You Directly Ship Your Git Repository?

One basic deployment technique involves pushing the entire Git repository directly onto the server. If you're dealing with a single server, this might suffice. However, for multi-server environments, it's prudent to consider alternatives. Git directories can corrupt, necessitating manual intervention.

Services like GitHub provide deploy keys that can be added to your production server's SSH configuration. This allows doing git pull on production. This method particularly favors interpreted applications (PHP, Ruby, etc.) which don’t undergo compilation processes. However, modern tech stacks, even UI, often need compilation using tools. In these cases, it's not the source code but the compiled output that gets deployed.

Direct Git repository deployment also poses challenges when tweaking production configurations since manual adjustments become necessary.

Distinguishing Between Source Code and Production Code

Remember: Not everything in your repository should find its way to the production server. Developer tooling, documentation, and code are best kept out of production to enhance performance, simplicity, and security.

The production environment often necessitates a different file set than what's used during development. Files created or modified during the build process, such as compiled code or pre-cached data, are added to the mix before deployment. Additionally, the production environment may require unique configurations and security keys, all of which can be addressed during the deployment orchestration.

The Building and Testing Conundrum

Where to Construct and Validate?

Consistency is paramount when it comes to building and testing projects. Leveraging standardized environments ensures uniformity, scalability, and efficient collaboration. Several CI/CD solutions, including GitHub Actions, Gitlab CI, Circle CI, Travis CI, and Jenkins, facilitate this.

Yet, empowering developers to run builds and tests on their machines can boost productivity. It minimizes the back-and-forth with the CI tool, especially when developers can selectively run tests based on their changes.

Orchestrating Multi-environment Builds, Tests, and Deployments

Your repository should house clear scripts that can build, test, and deploy applications for any given environment. Utilizing these scripts in both CI/CD pipelines and locally provides consistency across all phases.

Depending on complexity, these scripts could be fully-featured console applications written in various languages. While Bash offers simplicity for those proficient in it, other languages like Go, JS/TS, Ruby, and Python can be integrated seamlessly into the process.

My preferences of technologies

I personally like Bash as once you master it, things are very simple and concise to do, but it is difficult to master. Go binaries has the benefit of not needing any virtual execution environment like most languages do, once a Go binary is built targeting a platform, it can run there without any extra dependency but Go is a quite verbose language and integration with shell requires some libraries to be utilised carefully. JS/TS, Ruby, Python and PHP are all languages that requires some execution environment to be present on the servers, and one of the problem with this is execution environment being changed and upgraded. Tool X might require Ruby 2 while another tool might require Ruby 3. It may become difficult to maintain all these, even though there are now nice version managers like asdf. I personally use Bash as the starting point and if the process is very complex we can always integrate those bash scripts into other languages and multiple languages can be used together integrated through the shell environment.

Efficient Testing Techniques

As projects evolve, testing becomes more time-intensive. Now with the AI tools, writing tests are ever easier than before and we can have a lot of tests written very quickly. However, optimization techniques can speed up this phase:

  • Modularizing the project: By separating components, tests can be run more selectively based on changes. This can be done better on mature projects where the components can be easily seen.

  • Parallel Testing: Running tests concurrently reduces time, though not necessarily computational overhead. A distraction-free and fast development environment keeps engineers focused.

  • Test Optimization: Reusing environments by resetting them between tests is often faster than rebuilding from scratch. Tools like Bazel, which tracks file dependencies, can ensure only relevant tests are run after changes on supported languages in the expense of not running all tests which can miss some issues.

  • Continuous Testing: While running all tests before deployment adds a safety layer, selective testing during development or for pull requests can be efficient.

The Deployment Process Post-Build and Post-Testing

After successfully building and verifying your application, the pivotal task ahead is ensuring a secure transfer of these files to the production servers. The Continuous Deployment (CD) pipeline can adopt one of two approaches:

  • Directly execute code that targets the servers.
  • Initiate deployment operations on the servers, which presupposes that these servers are configured for self-updating.

Then it can decide a transfer method:

  • Archiving and Transferring: Archive and version the built files, upload them to cloud storage, and then download and extract them on the production servers. This method's efficiency diminishes as the project size increases.

  • Direct File Transfer: Tools like rsync offer efficient synchronization by updating only the changed files. While powerful, rsync comes with its challenges, keeping file modification timestamps intact in CI/CD or checksums being slower. Other solutions include using shared storage volumes provided by cloud services, though these sometimes pose performance trade-offs.

  • Shared network volumes: A lot of cloud providers have network attached file storages services. Although these can be applicable in some scenarios, due to the lower performance of the network compared to the local disk, this may affect the performance. If all the production assets are loaded in memory, this can work though.

Activating and Switching to the New Build

Once the new build reaches the servers, it's crucial to transition from the old version seamlessly. Strategies vary based on the underlying technology. For instance:

  • Memory-resident applications can be restarted after by marking each server offline at the load balancer during the restart (rolling deployments).
  • Applications not resident in memory can be launched on a different port, and traffic redirected there, allowing the older version to be terminated without downtime.
  • Symlinks can be used to point to the new build directory atomically.
  • SSH is a gold standard for executing commands on remote servers from CD or triggering deploy scripts on the servers themselves.

One significant caveat to consider is inter-service dependencies. When one service relies on another's API, deployment order matters. Implementing API versioning can prevent potential breaks during deploys.

How to activate new files with zero downtime deployment

Now, once the new files are on the servers, preferably in a temporary location, we need to start using them and stop using the old version. While using VMs, I don't suggest getting rid of the whole VM and spinning up a new one and setting it up as it takes time. We can always just update the files and restart the server, if there is a server needing to be restarted, in case of languages like PHP there is none, for many other languages there is.

Depending on the technology of the application, there are various solutions can be applied here. If our application runs in memory, we can safely update the files and restart the application by making the host offline then back online at the load balancer. This can be done at servers one group at a time, i.e. rolling deployments.

If not, we can spin up the same application in another port and target the local reverse proxy to the new port which is often very easy without downtime, then stop the old application. It is also possible to use symlinks and atomically pointing them to the new directory but this often requires a more customised server solution.

There is a big catch with most deployment models though. If service A uses service B's APIs, and if service A or B is updated before the other, there is a chance that things may get broken. It is crucial to have API versioning to support existing APIs while introducing new ones.

Rollbacks: An Essential Safety Net

No matter how rigorous your testing, there's always a chance that issues might emerge post-deployment. A well-defined rollback strategy is essential. By maintaining versioned application files, it's possible to revert to a previous state by merely re-triggering the deployment process with older version tags.

Conclusion

In conclusion, deploying on virtual machines demands meticulous planning, but with the right strategies in place, you can ensure efficient, consistent, and secure application deployments.