Announcing scikit-survival – a Python library for survival analysis build on top of scikit-learn

I've meant to do this release for quite a while now and last week I finally had some time to package everything and update the dependencies. scikit-survival contains the majority of code I developed during my Ph.D.

About Survival Analysis

Survival analysis – also referred to as reliability analysis in engineering – refers to type of problem in statistics where the objective is to establish a connections between a set of measurements (often called features or covariates) and the time to an event. The name survival analysis originates from clinical research: in many clinical studies, one is interested in predicting the time to death, i.e., survival. Broadly speaking, survival analysis is a type of regression problem (one wants to predict a continuous value), but with a twist. Consider a clinical study, which investigates coronary heart disease and has been carried out over a 1 year period as in the figure below.

Using IPython for parallel computing on an MPI cluster using SLURM

IPython and IPython notebook (or Jupyter as it is now called) are great tools that make interactive Python work incredibly easy. If you want to do large-scale computations, like nested cross-validation I mentioned in a previous post, you want to automatically distribute your work across multiple compute nodes instead of interactively looking at results. Thankfully, the folks behind IPython provide IPython.parallel, which is a library for parallel computing. It is versatile in the sense that you can choose whether you want to distribute your work locally, via SSH or MPI by simply adjusting config files.

MPI-based Nested Cross-Validation for scikit-learn

If you are working with machine learning, at some point you have to choose hyper-parameters for your model of choice and do cross-validation to estimate how well the model generalizes to unseen data. Usually, you want to avoid over-fitting on your data when selecting hyper-parameters to get a less biased estimate of the model's true performance. Therefore, the data you do hyper-parameter search on has to be independent from data you use to assess a model's performance.

GitLab 6 + Apache + Phusion Passenger

I've successfully been running an instance of GitLab for almost a year now. The same server is running Redmine, hence both GitLab and Redmine are running in their respective sub-directories. Phusion Passenger is my application server of choice. Unfortunately, it became increasingly difficult to keep this setup running with newer versions of GitLab. First, GitLab officially is not supporting running it out of a sub-directory, second, by default it uses Unicorn. Here, I want to detail my setup how you still can achieve the GitLab + Apache + Phusion Passenger combo, because I could only find slightly outdated guides online.

PyGObject 3.1.0 released

I am pleased to announce version 3.1.0 of the Python bindings for GObject. This is the first release of the unstable branch, which will eventually result in the stable 3.2.x series.

It is important to point out that this release reverts the change "Convert all strings to utf-8 encoding when retrieving from TreeModel" to restore backwards compatibility. If you are using Python 2 it is recommended to always use the byte-representation of UTF-8 encoded strings (str class) instead of the unicode objects. I'm going to add a section to the Python GTK+ 3 tutorial that will explain this issue and how to deal with it in more detail.


sha256sum: a5b36eff7c4b14f161bc9ba2ae09a03ddb47d9f2c769589b8f389ae3a92cc92e

Updates to Python GTK+ 3 Tutorial

I continued working on the Python GTK+ 3 Tutorial since I announced it almost a month ago. I added a section describing the Gtk.Grid widget, menus, dialogs and Gtk.TextView, the latter just added today. In addition, I added screenshots for all examples and merged a couple of grammar/typo fixes from other people. The current contents should cover the most important widgets and should allow you to create more complex applications.

Python GTK+ 3 Tutorial

One of the big advantages of PyGTK is that it is documented very well. Unfortunately, despite the efforts to make PyGObject as compatible to PyGTK as possible, the differences are still huge. A big portion is due to the changes between GTK+ version 2 and 3, of course. To date, you basically have to look into the GIR file or C reference manual to try to figure out how things work. Once you are familiar with the way C functions are converted to Python, you can guess most methods.