This is the second part of our series about code performance in R. It contains a lot of approaches to reduce the time your code needs to run. It’s useful to know those ideas before starting to write new code, but it also helps to optimize existing code.

If you have already written some code you want to speed up, but don’t know which part of it is actually slow, I recommend you to read the first part of this series on profiling. …

This is the first part of our series about code performance in R.

Let’s assume you have written some code, it’s working, it computes the results you need, but it is really slow. If you don’t want to get slowed down in your work, you have no other choice than improving the code’s performance. But how to start? The best approach is to find out where to start optimizing.

It is not always obvious which part of the code makes it so slow, or which of multiple alternatives is fastest. There is the risk to spending a lot of time…

The map shows the local 7-day-incidence rate of the officially reported Covid-19 infections in Germany over time. The calculation is based on the official data from Robert-Roch-Institute (RKI), which are freely available online. The data is available on a daily basis at the district level. The map results from the master thesis of Lukas Fuchs in the Joint Master Studiengang Statistics, in cooperation with Prof. Dr. Ulrich Rendtel (Department of Economics, Freie Universität Berlin) and INWT Statistics. An advanced algorithm is used to plot national-wide infection cases, revealing more visible patterns and providing at least 30% higher accuracy compared to…

Traditionally, marketing decisions have been made by executives on the basis of instinct, experience, and what data are available. But what if this could be automated, with an artificial agent making use of huge amounts of data to determine the optimal marketing strategy for every customer individually at a particular moment in time? This is precisely the promise of reinforcement learning.

What is reinforcement learning (RL)?

In reinforcement learning, an “agent” can take different actions within an “environment.” Unlike many machine learning algorithms which involve only a single step, reinforcement learning is an iterative process: the agent sees a representation of the environment’s “state,” and…

In our first blog article on Continuous Integration, we presented a selection of CI tools — including the very widespread and long-standing tool, Jenkins. Jenkins is a web-based, free, open source continuous integration system written in Java. The configuration, visualization, and evaluation of projects takes place exclusively via the browser. A specialty of Jenkins is its high flexibility: with over 1500 plugins currently available, configurations can be individually designed. The basic requirement for continuous integration is that the code is managed in a version control system. Here, too, Jenkins offers a large selection of compatible version control systems.

Incidentally, the…

Continuous Integration is a software development practice where members of a team integrate their work frequently […]. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible.

Martin Fowler on

So, what is this “CI” everyone is talking about?

Let’s bring the definition of Continuous Integration by Martin Fowler to a deeper level of understanding: Continuous Integration, a.k.a. CI, is a software development tool which is used at the integration stage of software development — as the name might suggest. …

Missing or incomplete data can have a huge negative impact on any data science project. This is particularly relevant for companies in the early stages of developing solid data collection and management systems.

While the best solution for missing data is to avoid it in the first place by developing good data-collection and stewardship policies, often we have to make do with what’s available.

This blog covers the different kinds of missing data, and what we can do about missing data once we know what we’re dealing with. These strategies range from simple — for example, choosing models that handle…

Business is changing as a result of the increasing quantity and variety of data available. Significant new opportunities can be realized by harnessing the knowledge contained in these data — if you know where to look. A data science team can help to bring raw data through the analysis process and derive insights that are critical in today’s technologically-competitive environment.

For many companies, however, building a data science team can be daunting: the field is technical, the roles are varied, and buzzwords are common. This article aims to help with navigating this process by touching on what kinds of positions…

When you write code, you’re sure to run into problems from time to time. Debugging is the process of finding errors in your code to figure out why it’s behaving in unexpected ways. This typically involves:

  1. Running the code
  2. Stopping the code where something suspicious is taking place
  3. Looking at the code step-by-step from this point on to either change the values of some variables, or modify the code itself.

Debugging can be a frustrating process, particularly if you’re lacking the skills or tools to approach it efficiently. …

These days, most people are familiar with the concept of A/B testing. This is one of the most common ways to make advertising decisions, particularly in online marketing. In an A/B test, the customer base is divided into two or more groups, each of which is served a different version of whatever is being tested (such as a special offer, or the layout of an advertising campaign). At the end of the test, whichever variant was most successful is pursued for the customer base at large.

While this method is tried and true, there are some potential situations where it…


