I feel pondering hard Questions leads to more knowledge than just seeking answers. Here I'll try to strike a balance between then Questions I've had and the potentially correct Solutions to match.

Friday, April 25, 2014

Python to Scala: Virtualenvs to sbt for project management

At PDX Scala on 2014/4/9, Thomas gave a great introduction to using sbt for simple to complex project management.  Most of my experience dealing with significant dev environments comes from the Python world using Virtualenv and its handy wrapper.

sbt big takeaways:

  • Fully unified tool built in Scala for project management and development
  • Tilda operators give scripting language flexibility to compiled Scala
  • Very similar to Python's Virtualenv-Pip tools, but unified into a single tool
  • (Learned the hard way)  Many of the simplicities of Python/Interpreted languages don't translate to the Scala/JVM world. The sbt documentation expects a certain amount of JVM domain knowledge which I had long forgotten.
    • Ergo: Configuration is more difficult than Virtualenv's
Going further; I hope to compare and contrast the two tool sets to gain a better understanding of both.  All of what I say about the sbt side of things is subject to immense salt and newbie understanding of the Scala/Java world.  Please respond with corrections, constructive criticism, and improvements!

If there's one area of lacking in my understanding of the Scala world, is the legacy of Java and all the paradigms of managing the JVM.  

Setup Project:

Virtualenv:  Use virtualenvwrapper to initialize environment, then create directories for project.
$ mkvirtualenv venv
(venv)$ cd projectdir
(venv)$ mkdir  projectsrc
(venv)$ touch projectsrc/__init__.py
(venv)$ echo 'print("hihi!")' > projectsrc/hihi.py
(venv)$ python -m projectsrc.hihi

sbt: In project's directory, activate sbt tool and create directory structure to match what is expected by sbt:
  • Sources in the base directory
  • Sources in src/main/scala or src/main/java
  • Tests in src/test/scala or src/test/java
  • Data files in src/main/resources or src/test/resources
  • jars in lib
$ cd projectdir
$ touch build.sbt #sbt base config
$ mkdir -p src/main/scala
$ echo 'object Hi { def main(args: Array[String]) = println("hihi!") }' > src/main/scala/hw.scala
$ sbt
$ run
Memory management voodoo from Thomas; in the home directory create an ~/.sbtconfig file and add memory management flags for execution:
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
Apparently sbt can start to eat up a lot of memory if left running for a long time. Don't know details, just standing on larger shoulders here.

Project Customization

Dependency Management

Virtualenv/Python: Using virtualenv pip to install all the necessary libraries makes it easy to export the versioned dependencies of a project:
(venv)$ pip freeze > projectdir/requirements.txt
When cloning a codebase, assuming the owner has been keeping the requirements file up to date, a new user can use the file to mirror install all the necessary dependencies.
(venv)$ pip install -r projectdir/requirements.txt

Furthermore properly configuring the projectdir/setup.py file ,which runs under setuptools to build installable artifacts for deployment, should also contain a manifest of required libraries.

sbt: Like seemingly most jvm systems, configuration runs dark and deep.  sbt is fairly clean but can become very powerful if the dev is knowledgeable enough.  I'm only going to cover the basic build.sbt file. Deeper documentation to use the the Build.scala files can be found here(maybe another blog post).

Simple library requirements: use Maven central repository to find the library, pull up its Artifact Details page, and in Dependency Information; copy the 'Scala SBT' definition and add it to the line separated build.sbt. eg:
libraryDependencies += "com.typesafe.slick" % "slick_2.10" % "2.0.1"
The following $compile will resolve the dependencies.
Additionally there are ways of adding dependencies via sbt's CLI; which can be found TODO:here.


Environment Variables

Virtualenv: Personal taste, using the virtualenv setup script postactivate for loading any environment variables.  Mileage will vary and virtualenvs allow several points of entry for customization. I prefer to tie the env-vars to the virtual env so if you want to check something in the REPL there's no requirement to be in the projects directory as if using autoenv(although it is a cool tool).
cd ~/.virtualenvs//bin
vi postactivate
Write: export POSTGRESPASS="123456"

sbt:  Figuring out how to get environment variables into sbt runtime became my White Whale..  Ultimately it simply required a deeper understanding of sbt's internals and settings management.  Along with realizing that the 'envVars' setting is only applied to runtimes where the compiled process is forked.  

Ultimately while Environment Variables are often used in Python systems for defining sensitive information or development state.  Conversely the JVM ecosystem prefers compilation or runtime configuration (arguments/flags) instead of using system definitions like environment variables which interpreted languages tend to favor.  Via the freenode #scala channel; tpolecat nicely confirmed that the general jvm practice is to specify vm runtime system property configuration via CLI arguments is common practice(I trust his opinion).

HOWEVER, if there is still a wild need to specify environment variables for runtime, sbt recently added support for it(with exceptions).  
In the declarative build.sbt file:
fork := true

envVars ++= Map{"ENVIRONMENT_DEF" -> "dev"}
Caveat/"fork := true" explanation: the "envVars" setting is only applied to VMs which have been forked from the standard sbt process.  Then envVars setting is not loaded into the sbt process and therefore can't be referenced in the 'console' REPL.

The previous build.sbt definition will map "dev" to "ENVIRONMENT_DEF" and can be referenced in a forked vm with:


Virtualenv links to the project python binary which is configured to use all of the libraries which have been installed by the localized pip.  This can include a nicer REPL like iPython which will be scoped to virtualenv.  

sbt has a 'console' command which acts like the normal 'scala' REPL.  When used the interpreter runs under the project's configuration and defined dependencies are accessible.  One caveat mentioned earlier is that the console exists in the same vm process as sbt, so changes set for when forked will not populate to the REPL.


Python: Once a proper setuptools setup.py definition has been created, producing the project artifact to install is simple.
python setup.py sdist
Will tar up all of the specified files into a source distribution artifact which can be installed by pip remotely on the server with fabric.

sbt:Assuming that the library dependencies are properly specified, sbt will build a jar file very simply with the 'package' command.
> package
There is also tooling similar to fabric which can handle deployment under the 'publish' operation, obviously this requires more configuration.

This is just my simple overview of how Virtualenvs and sbt compare, there's more to cover but I think these are good basics to start with.  Please comment to point out any inaccuracies or things I might have missed.