## October 15, 2017

### OpenDreamKit

#### WP6 Math-in-the-Middle Integration Use Case to be Published at MACIS-2017 (two papers)

OpenDreamKit WP6 (Data/Knowledge/Software-Bases) has reported on the first use cases in two papers to be publised at MACIS 2017.

# Introduction

One of the main tasks for OpenDreamKit (T.31]) is improving portability of mathematical software across hardware platforms and operating systems.

One particular such challenge, which has dogged the SageMath project practically since its inception, is getting a fully working port of Sage on Windows (and by extension this would mean working Windows versions of all the CAS’s and other software Sage depends on, such as GAP, Singular, etc.)

This is particularly challenging, not so much because of the Sage Python library (which has some, but relatively little system-specific code). Rather, the challenge is in porting all of Sage’s 150+ standard dependencies, and ensuring that they integrate well on Windows, with a passing test suite.

Although UNIX-like systems are popular among open source software developers and some academics, the desktop and laptop market share of Windows computers is estimated to be more than 75% and is an important source of potential users, especially students.

However, for most of its existence, the only way to “install” Sage on Windows was to run a Linux virtual machine that came pre-installed with Sage, which is made available on Sage’s downloads page. This is clumsy and onerous for users–it forces them to work within an unfamiliar OS, and it can be difficult and confusing to connect files and directories in their host OS to files and directories inside the VM, and likewise for web-based applications like the notebook. Because of this Windows users can feel like second-class citizens in the Sage ecosystem, and this may turn them away from Sage.

Attempts at Windows support almost as old as Sage itself (initial Sage release in 2005). Microsoft offered funding to work on Windows version as far back as 2007 but was far too little for the amount of effort needed.

Additional work done was done off and on through 2012, and partial support was possible at times. This included admirable work to try to support building with the native Windows development toolchain (e.g. MSVC). There was even at one time an earlier version of a Sage installer for Windows, but long since abandoned.

However, Sage development (and more importantly Sage’s dependencies) continued to advance faster than there were resources for the work on Windows support to keep up, and work mostly stalled after 2013. OpenDreamKit has provided a unique opportunity to fund the kind of sustained effort needed for Sage’s Windows support to catch up.

# Sage for Windows overview

As of SageMath version 8.0, Sage will be available for 64-bit versions of Windows 7 and up. It can be downloaded through the SageMath website, and up-to-date installation instructions are being developed at the SageMath wiki. A 32-bit version had been planned as well, but is on hold due to technical limitations that will be discussed later.

The installer contains all software and documentation making up the standard Sage distribution, all libraries needed for Cygwin support, a bash shell, numerous standard UNIX command-line utilities, and the Mintty terminal emulator, which is generally more user-friendly and better suited for Cygwin software than the standard Windows console.

It is distributed in the form of a single-file executable installer, with a familiar install wizard interface (built with the venerable InnoSetup. The installer comes in at just under a gigabyte, but unpacks to more than 4.5 GB in version 8.0.

Because of the large number of files comprising the complete SageMath distribution, and the heavy compression of the installer, installation can take a fair amount of time even on a recent system. On my Intel i7 laptop it takes about ten minutes, but results will vary. Fortunately, this has not yet been a source of complaints–beta testers have been content to run the installer in the background while doing other work–on a modern multi-core machine the installer itself does not use overly many resources.

If you don’t like it, there’s also a standard uninstall:

The installer include three desktop and/or start menu shortcuts:

The shortcut titled just “SageMath 8.0” launches the standard Sage command prompt in a text-based console. In general it integrates well enough with the Windows shell to launch files with the default viewer for those file types. For example, plots are saved to files and displayed automatically with the default image viewer registered on the computer.

(Because Mintty supports SIXEL mode graphics, it may also be possible to embed plots and equations directly in the console, but this has not been made to work yet with Sage.)

“SageMath Shell” runs a bash shell with the environment set up to run software in the Sage distribution. More advanced users, or users who wish to directly use other software included in the Sage distribution (e.g. GAP, Singular) without going through the Sage interface. Finally, “SageMath Notebook” starts a Jupyter Notebook server with Sage configured as the default kernel and, where possible, opens the Notebook interface in the user’s browser.

In principle this could also be used as a development environment for doing development of Sage and/or Sage extensions on Windows, but the current installer is geared primarily just for users.

# Rationale for Cygwin and possible alternatives

There are a few possible routes to supporting Sage on Windows, of which Cygwin is just one. For example, before restarting work on the Cygwin port I experimented with a solution that would run Sage on Windows using Docker. I built an installer for Sage that would install Docker for Windows if it was not already installed, install and configure a pre-build Sage image for Docker, and install some desktop shortcuts that attempted to launch Sage in Docker as transparently as possible to the user. That is, it would ensure that Docker was running, that a container for the Sage image was running, and then would redirect I/O to the Docker container.

This approach “worked”, but was still fairly clumsy and error-prone. In order to make the experience as transparent as possible a fair amount of automation of Docker was needed. This could get particularly tricky in cases where the user also uses Docker directly, and accidentally interferes with the Sage Docker installation. Handling issues like file system and network port mapping, while possible, was even more complicated. What’s worse, running Linux images in Docker for Windows still requires virtualization. On older versions this meant running VirtualBox in the background, while newer versions require the Hyper-V hypervisor (which is not available on all versions of Windows–particularly “Home” versions). Furthermore, this requires hardware-assisted virtualization (HAV) to be enabled in the user’s BIOS. This typically does not come enabled by default on home PCs, and users must manually enable it in their BIOS menu. We did not consider this a reasonable step to ask of users merely to “install Sage”.

Another approach, which was looked at in the early efforts to port Sage to Windows, would be to get Sage and all its dependencies building with the standard Microsoft toolchain (MSVC, etc.). This would mean both porting the code to work natively on Windows, using the MSVC runtime, as well as developing build systems compatible with MSVC. There was a time when, remarkably, many of Sage’s dependencies did meet these requirements. But since then the number of dependencies has grown too much, and Sage itself become too dependent on the GNU toolchain, that this would be an almost impossible undertaking.

A middle ground between MSVC and Cygwin would be to build Sage using the MinGW toolchain, which is a port of GNU build tools (including binutils, gcc, make, autoconf, etc.) as well as some other common UNIX tools like the bash shell to Windows. Unlike Cygwin, MinGW does not provide emulation of POSIX or Linux system APIs–it just provides a Windows-native port of the development tools. Many of Sage’s dependencies would still need to be updated in order to work natively on Windows, but at the very least their build systems would require relatively little updating–not much more than is required for Cygwin. This would actually be my preferred approach, and with enough time and resources it could probably work. However, it would still require a significant amount of work to port some of Sage’s more non-trivial dependencies, such as GAP and Singular, to work on Windows without some POSIX emulation.

So Cygwin is the path of least resistance. Although bugs and shortcomings in Cygwin itself occasionally require some effort to work around (as a developer–users should not have to think about it), for the most part it just works with software written for UNIX-like systems. It also has the advantage of providing a full UNIX-like shell experience, so shell scripts and scripts that use UNIX shell tools will work even on Windows. However, since it works directly on the native filesystem, there is less opportunity for confusion regarding where files and folders are saved. In fact, Cygwin supports both Windows-style paths (starting with C:\\) and UNIX-style paths (in this case starting with C:/).

Finally, a note on the Windows Subsystem for Linux (WSL), which debuted shortly after I began my Cygwin porting efforts, as I often get asked about this: “Why not ‘just’ use the ‘bash for Windows’?” The WSL is a new effort by Microsoft to allow running executables built for Linux directly on Windows, with full support from the Windows kernel for emulation of Linux system calls (including ones like fork()). Basically, it aims to provide all the functionality of Cygwin, but with full support from the kernel, and the ability to run Linux binaries directly, without having to recompile them. This is great of course. So the question is asked if Sage can run in this environment, and experiments suggest that it works pretty well (although the WSL is still under active development and has room for improvement).

I wrote more about the WSL in a blog post last year, which also addresses why we can’t “just” use it for Sage for Windows. But in short: 1) The WSL is currently only intended as a developer tool: There’s no way to package Windows software for end users such that it uses the WSL transparently. And 2) It’s only available on recent updates of Windows 10–it will never be available on older Windows versions. So to reach the most users, and provide the most hassle-free user experience, the WSL is not currently a solution. However, it may still prove useful for developers as a way to do Sage development on Windows. And in the future it may be the easiest way to install UNIX-based software on Windows as well, especially if Microsoft ever expands its scope.

# Development challenges

The main challenge with porting Sage to Windows/Cygwin has relatively little to do with the Sage library itself, which is written almost entirely in Python/Cython and involves relatively few system interfaces (a notable exception to this is the advanced signal handling provided by Cysignals, but this has been found to work almost flawlessly on Cygwin thanks to the Cygwin developers’ heroic efforts in emulating POSIX signal handling on Windows). Rather, most of the effort has gone into build and portability issues with Sage’s more than 150 dependencies.

The majority of issues have been build-related issues. Runtime issues are less common, as many of Sage’s dependencies are primarily mathematical, numerical code–mostly CPU-bound algorithms that have little use of platform-specific APIs. Another reason is that, although there are some anomalous cases, Cygwin’s emulation of POSIX (and some Linux) interfaces is good enough that most existing code just works as-is. However, because applications built in Cygwin are native Windows applications and DLLs, there are Windows-specific subtleties that come up when building some non-trivial software. So most of the challenge has been getting all of Sage’s dependencies building cleanly on Cygwin, and then maintaining that support (as the maintainers of most of these dependencies are not themselves testing against Cygwin regularly).

In fact, maintenance was the most difficult aspect of the Cygwin port (and this is one of the main reasons past efforts failed–without a sustained effort it was not possible to keep up with the pace of Sage development). I had a snapshot of Sage that was fully working on Cygwin, with all tests passing, as soon as the end of summer in 2016. That is, I started with one version of Sage and added to it all the fixes needed for that version to work. However, by the time that work was done, there were many new developments to Sage that I had to redo my work on top of, and there were many new issues to fix. This cycle repeated itself a number of times.

## Continuous integration

The critical component that was missing for creating a sustainable Cygwin port of Sage was a patchbot for Cygwin. The Sage developers maintain a (volunteer) army of patchbots–computers running a number of different OS and hardware platforms that perform continuous integration testing of all proposed software changes to Sage. The patchbots are able, ideally, to catch changes that break Sage–possibly only on specific platforms–before they are merged into the main development branch. Without a patchbot testing changes on Cygwin, there was no way to stop changes from being merged that broke Cygwin. With some effort I managed to get a Windows VM with Cygwin running reliably on UPSud’s OpenStack infrastructure, that could run a Cygwin patchbot for Sage. By continuing to monitor this patchbot the Sage community can now receive prior warning if/when a change will break the Cygwin port. I expect this will impact only a small number of changes–in particular those that update one of Sage’s dependencies.

In so doing we are, indirectly, providing continuous integration on Cygwin for Sage’s many dependencies–something most of those projects do not have the resources to do on their own. So this should be considered a service to the open source software community at large. (I am also planning to piggyback on the work I did for Sage to provide a Cygwin buildbot for Python–this will be important moving forward as the official Python source tree has been broken on Cygwin for some time, but is one of the most critical dependencies for Sage).

## Runtime bugs

All that said, a few of the runtime bugs that come up are non-trivial as well. One particular source of bugs is subtle synchronization issues in multi-process code, that arise primarily due to the large overhead of creating, destroying, and signalling processes on Cygwin, as compared to most UNIXes. Other problems arise in areas of behavior that are not specified by the POSIX standard, and assumptions are made that might hold on, say, Linux, but that do not hold on Cygwin (but that are still POSIX-compliant!) For example, a difference in (undocumented, in both cases) memory management between Linux and Cygwin made for a particularly challenging bug in PARI. Another interesting bug came up in a test that invoked a stack overflow bug in Python, which only came up on Cygwin due to the smaller default stack size of programs compiled for Windows. There are also occasional bugs due to small differences in numerical results, due to the different implementation of the standard C math routines on Cygwin, versus GNU libc. So one should not come away with the impression that porting software as complex as Sage and its dependencies to Cygwin is completely trivial, nor that similar bugs might not arise in the future.

## Challenges with 32-bit Windows/Cygwin

The original work of porting Sage to Cygwin focused on the 32-bit version of Cygwin. In fact, at the time that was the only version of Cygwin–the first release of the 64-bit version of Cygwin was not until 2013. When I picked up work on this again I focused on 64-bit Cygwin–most software developers today are working primarily on 64-bit systems, and so from many projects I’ve worked on the past my experience has been that they have been more stable on 64-bit systems. I figured this would likely be true for Sage and its dependencies as well.

In fact, after getting Sage working on 64-bit Cygwin, when it came time to test on 32-bit Cygwin I hit some significant snags. Without going into too many technical details, the main problem is that 32-bit Windows applications have a user address space limited to just 2 GB (or 3 GB with a special boot flag). This is in fact not enough to fit all of Sage into memory at once. The good news is that for most cases one would never try to use all of Sage at once–this is only an issue if one tries to load every library in both Sage, and all its dependencies, into the same address space. In practical use this is rare, though this limit can be hit while running the Sage test suite.

With some care, such as reserving address space for the most likely to be used (especially simultaneously) libraries in Sage, we can work around this problem for the average user. But the result may still not be 100% stable.

It becomes a valid question whether it’s worth the effort. There are unfortunately few publicly available statistics on the current market share of 64-bit versus 32-bit Windows versions among desktop users. Very few new desktops and laptops sold anymore to the consumer market include 32-bit OSes, but it is still not too uncommon to find on some older, lower-end laptops. In particular, some laptops sold not too long ago with Windows 7 were 32-bit. According to Net Market Share, as of writing Windows 7 still makes up nearly 50% of all desktop operating system installments. This still does not tell us about 32-bit versus 64-bit. The popular (12.5 million concurrent users) Steam PC gaming platform publishes the results of their usage statistics survey, which as of writing shows barely over 5% of users with 32-bit versions of Windows. However, computer gamers are not likely to be representative of the overall market, being more likely to upgrade their software and hardware.

So until some specific demand for a 32-bit version of SageMath for Windows is heard, we will not likely invest more effort into it.

# Conclusion and future work

Focusing on Cygwin for porting Sage to Windows was definitely the right way to go. It took me only a few months in the summer of 2016 to get the vast majority of the work done. The rest was just a question of keeping up with changes to Sage and fixing more bugs (this required enough constant effort that it’s no wonder nobody managed to quite do it before). Now, however, enough issues have been addressed that the Windows version has remained fairly stable, even in the face of ongoing updates to Sage.

Porting more of Sage’s dependencies to build with MinGW and without Cygwin might still be a worthwhile effort, as Cygwin adds some overhead in a few areas, but if we had started with that it would have been too much effort.

In the near future, however, the priority needs to be improvements to user experience of the Windows Installer. In particular, a better solution is needed for installing Sage’s optional packages on Windows (preferably without needing to compile them). And an improved experience for using Sage in the Jupyter Notebook, such that the Notebook server can run in the background as a Windows Service, would be nice. This feature would not be specific to Sage either, and could benefit all users of the Jupyter Notebook on Windows.

Finally, I need to better document the process of doing Sage development on Cygwin, including the typical kinds of problems that arise. I also need to better document how to set up and maintain the Cygwin patchbot, and how to build releases of the Sage on Windows installer so that its maintenance does not fall solely on my shoulders.

## September 22, 2017

### William Stein

#### DataDog's pricing: don't make the same mistake I made

(I wrote a 1-year followup post here.)

I stupidly made a mistake recently by choosing to use DataDog for monitoring the infrastructure for my startup (SageMathCloud).

I got bit by their pricing UI design that looks similar to many other sites, but is different in a way that caused me to spend far more money than I expected.

I'm writing this post so that you won't make the same mistake I did.  As a product, DataDog is of course a lot of hard work to create, and they can try to charge whatever they want. However, my problem is that what they are going to charge was confusing and misleading to me.

I wanted to see some nice web-based data about my new autoscaled Kubernetes cluster, so I looked around at options. DataDog looked like a new and awesomely-priced service for seeing live logging. And when I looked (not carefully enough) at the pricing, it looked like only $15/month to monitor a bunch of machines. I'm naive about the cost of cloud monitoring -- I've been using Stackdriver on Google cloud platform for years, which is completely free (for now, though that will change), and I've also used self hosted open solutions, and some quite nice solutions I've written myself. So my expectations were way out of whack. Ever busy, I signed up for the "$15/month plan":

One of the people on my team spent a little time and installed datadog on all the VM's in our cluster, and also made DataDog automatically start running on any nodes in our Kubernetes cluster. That's a lot of machines.

Today I got the first monthly bill, which is for the month that just happened. The cost was $639.19 USD charged to my credit card. I was really confused for a while, wondering if I had bought a year subscription. After a while I realized that the cost is per host! When I looked at the pricing page the first time, I had just saw in big letters "$15", and "$18 month-to-month" and "up to 500 hosts". I completely missed the "Per Host" line, because I was so naive that I didn't think the price could possibly be that high. I tried immediately to delete my credit card and cancel my plan, but the "Remove Card" button is greyed out, and it says you can "modify your subscription by contacting us at [email protected]": So I wrote to [email protected]: Dear Datadog,Everybody on my team was completely mislead by yourhorrible pricing description.Please cancel the subscription for wstein immediatelyand remove my credit card from your system.This is the first time I've wasted this much moneyby being misled by a website in my life.I'm also very unhappy that I can't delete my creditcard or cancel my subscription via your website. It'slike one more stripe API call to remove the credit card(I know -- I implemented this same feature for my site). And they responded: Thanks for reaching out. If you'd like to cancel yourDatadog subscription, you're able to do so by going intothe platform under 'Plan and Usage' and choose the optiondowngrade to 'Lite', that will insure your credit cardwill not be charged in the future. Please be sure toreduce your host count down to the (5) allowed underthe 'Lite' plan - those are the maximum allowed forthe free plan.Also, please note you'll be charged for the hostsmonitored through this month. Please take a look atour billing FAQ. They were right -- I was able to uninstall the daemons, downgrade to Lite, remove my card, etc. all through the website without manual intervention. When people have been confused with billing for my site, I have apologized, immediately refunded their money, and opened a ticket to make the UI clearer. DataDog didn't do any of that. I wish DataDog would at least clearly state that when you use their service you are potentially on the hook for an arbitrarily large charge for any month. Yes, if they had made that clear, they wouldn't have had me as a customer, so they are not incentivized to do so. A fool and their money are soon parted. I hope this post reduces the chances you'll be a fool like me. If you chose to use DataDog, and their monitoring tools are very impressive, I hope you'll be aware of the cost. ADDED: On Hacker News somebody asked: "How could their pricing page be clearer? It says per host in fairly large letters underneath it. I'm asking because I will be designing a similar page soon (that's also billed per host) and I'd like to avoid the same mistakes." My answer: [EDIT: This pricing page by the top poster in this thread is way better than I suggest below -- https://www.serverdensity.com/pricing/] 1. VERY clearly state that when you sign up for the service, then you are on the hook for up to$18*500 = $9000 + tax in charges for any month. Even Google compute engine (and Amazon) don't create such a trap, and have a clear explicit quota increase process. 2. Instead of "HUGE$15" newline "(small light) per host", put "HUGE $18 per host" all on the same line. It would easily fit. I don't even know how the$15/host datadog discount could ever really work, given that the number of hosts might constantly change and there is no prepayment.
3. Inform users clearly in the UI at any time how much they are going to owe for that month (so far), rather than surprising them at the end. Again, Google Cloud Platform has a very clear running total in their billing section, and any time you create a new VM it gives the exact amount that VM will cost per month.
4. If one works with a team, 3 is especially important. The reason that I had monitors on 50+ machines is that another person working on the project, who never looked at pricing or anything, just thought -- he I'll just set this up everywhere. He had no idea there was a per-machine fee.

#### DataDog: Don't make the same mistake I did -- a followup and thoughts about very unhappy customers

This is a followup to my previous blog post about DataDog billing.

TL;DR:
- I don't recommend DataDog,
- dealing with unhappy customers is hard,
- monitoring for data science nerds?

I was recently at the Seattle Google Cloud Summit and DataDog was well represented, with the biggest booth and top vendor billing during the keynote. Clearly they are doing something right. I had a past unpleasant experience with them, and I had just been auditing my records and discovered that last year DataDog had actually charged me a lot more than I thought, so was kind of annoyed. Nonetheless, they kept coming up and talking to me, server monitoring is life-and-death important to me, and their actual software is very impressive in some ways.

## Conference Call with DataDog

Jay setup a conference call with me today at 10am (September 22, 2017). Before the call, I sent him a summary of my blog post, and also requested a refund, especially for the suprise bill they sent me nearly 6 weeks after my post.

During the call, Jay explained that he was "protecting" Nick from me, and that I would mostly talk with Michelle Danis who is in charge of customer success. My expectation for the call is that we would find some common ground, and that they would at least appreciate the chance to make things right and talk with an unhappy customer. I was also curious about how a successful startup company addresses the concerns of an unhappy customer (me).

I expected the conversation to be difficult but go well, with me writing a post singing the praises of the charming DataDog sales and customer success people. A few weeks ago CoCalc.com (my employer) had a very unhappy customer who got (rightfully) angry over a miscommunication, told us he would no longer use our product, and would definitely not recommend it to anybody else. I wrote to him wanting to at least continue the discussion and help, but he was completely gone. I would do absolutely anything I could to ensure he is a satisfied, if only he would give me the chance. Also, there was a recent blog post from somebody unhappy with using CoCalc/Sage for graphics, and I reached out to them as best I could to at least clarify things...

In any case, here's what DataDog charged us as a result of us running their daemon on a few dozen containers in our Kubernetes cluster (a contractor who is not a native English speaker actually setup these monitors for us):

07/22/2016  449215JWJH87S8N4  DATADOG 866-329-4466 NY  $639.1908/29/2016 2449215L2JH87V8WZ DATADOG 866-329-4466 NY$927.22

I was shocked by the 07/22 bill which spured my post, and discovered the 8/29 one only later. We canceled our subscription on July 22 (cancelling was difficult in itself).

Michelle started the conference call by explaining that the 08/29 bill was for charges incurred before 07/22, and that their billing system has over a month lag (and it does even today, unlike Google Cloud Platform, say). Then Michelle explained at length many of the changes that DataDog has made to their software to address the issues I (and others?) have pointed out with their pricing description and billing. She was very interested in whether I would be a DataDog customer in the future, and when I said no, she explained that they would not refund my money since the bill was not a mistake.

I asked if they now provide a periodic summary of the upcoming bill, as Google cloud platform (say) does. She said that today they don't, though they are working on it. They do now provide a summary of usage so far in an admin page.

Finally, I explained in no uncertain terms that I felt misled by their pricing. I expected that they would understand, and pointed out that they had just described to me many ways in which they were addressing this very problem. Very surprisingly, Michelle's response was that she absolutely would not agree that there was any problem with their pricing description a year ago, and they definitely would not refund my money. She kept bringing up the terms of service. I agreed that I didn't think legally they were in the wrong, given what she had explained, just that -- as they had just pointed out -- their pricing and billing was unclear in various ways. They would not agree at all.

I can't recommend doing business with DataDog. I had very much hoped to write the opposite in this updated post. Unfortunately, their pricing and terms are still confusing today compared to competitors, and they are unforgiving of mistakes. This dog bites.

(Disclaimer: I took notes during the call, but most of the above is from memory, and I probably misheard or misunderstood something. I invite comments from DataDog to set the record straight.)

Also, for what it is worth, I definitely do recommend Google Cloud Platform.  They put in the effort to do many things right regarding clear billing.

## How do Startups Deal with Unhappy Customers?

I am very curious about how other startups deal with unhappy customers. At CoCalc we have had very few "major incidents" yet... but I want to be as prepared as possible. At the Google Cloud Summit, I went to some amazing "war storries by SRE's" session in which they talked about situations they had been in years ago in which their decisions meant the difference between whether they would have a company or job tomorrow or not. These guys clearly have amazing instincts for when a problem was do-or-die serious and when it wasn't. And their deep preparation "in depth" was WHY they were on that stage, and a big reason why older companies like Google are still around. Having a strategy for addressing very angry customers is surely just as important.

Google SRE's: these guys are serious.

I mentioned my DataDog story to a long-time Google employee there (16 years!) and he said he had recently been involved in a similar situation with Google's Stackdriver monitoring, where the bill to a customer was $85K in a month just for Stackdriver. I asked what Google did, and he said they refunded the money, then worked with the customer to better use their tools. There is of course no way to please all of the people all of the time. However, I genuinely feel that I was ripped off and misled by DataDog, but I have the impression that Jay and Michelle honestly view me as some jerk trying to rip them off for$1500.   And they probably hate me for telling you about my experiences.

So far, with CoCalc we charge customers in advance for any service we provide, so less people are surprised by a bill.  Sometimes there are problems with recurring subscriptions when a person is charged for the upcoming month of a subscription, and don't want to continue (e.g., because their course is over), we always fully refund the charge. What does your company do? Why? I do worry that our billing model means that we miss out on potential revenue.

We all know what successful huge consumer companies like Amazon and Wal-Mart do.

## Monitoring for Data Science Nerds?

I wonder if there is interest in a service like DataDog, but targeted at Data Science Nerds, built on CoCalc, which provides hosted collaborative Jupyter notebooks with Pandas, R, etc., pre-installed. This talk at PrometheusCon 2017 mostly discussed the friction that people face moving data from Prometheus to analyze using data science tools (e.g., R, Python, Jupyter). CoCalc provides a collaborative data science environment, so if we were to smooth over those points of friction, perhaps it could be useful to certain people. And much more efficient...

## August 26, 2017

### OpenDreamKit

#### Workshop on live structured documents

OpenDreamKit is hosting a workshop on live structured documents to take place at Simula Research Laboratory in Oslo, Norway from Monday 16. October to Friday 20. October.

The workshop is dedicated to various aspects of live documents, including:

• Authoring language and tools (ReST, XML, latex, …),
• Converters (Sphinx, pandoc, docbook, …),
• Infrastructure for rendering documents,
• Infrastructure for interacting with the computing backends (e.g. Thebe based on the Jupyter protocol, sympy live, …),
• Online services providing computing backends (e.g. tmpnb, sagecell, binder, …),
• Integrated collaborative authoring environments (sharelatex, SMC, texmacs, …)

Participants can register via eventbrite.

## OpenDreamKit at Groups St Andrews in Birmingham

Groups St Andrews is a conference series with a conference every four years. This year’s Groups St Andrews will be in Birmingham, and I will attend, bring a poster, give a contributed talk about computing in permutation groups, and teach a course on GAP.

This post serves the main purpose of providing a page on the OpenDreamKit website that can hold all the links that will appear on the poster, and possible further information.

## June 09, 2017

### OpenDreamKit

#### Sphinx documentation of Cython code using "binding=True"

One of the deliverables (D4.13) of the OpenDreamKit project is refactoring the documentation system of SageMath. The SageMath documentation is built using a heavily customized Sphinx. Many of the customizations are neccessary to support autodoc (automatically generated documentation from docstrings) for Cython files.

Thanks to some changes I made to Sphinx, autodoc for Cython now works provided that:

1. You use Sphinx version 1.6 or later.

2. The Cython code is compiled with the binding=True directive. See How to set directives in the Cython documentation.

3. A small monkey-patch is applied to inspect.isfunction. You can put this in your Sphinx conf.py for example:

 def isfunction(obj):
return hasattr(type(obj), "__code__")

import inspect
inspect.isfunction = isfunction


This was used successfully for the documentation of cysignals and fpylll. There is ongoing work to do the same for SageMath.

## Implementation of functions in Python

To understand why items 2 and 3 on the above list are needed, we need to look at how Python implements functions. In Python, there are two kinds of functions (we really mean functions here, not methods or other callables):

1. User-defined functions, defined with def or lambda:

 >>> def foo(): pass
>>> type(foo)
<class 'function'>
>>> type(lambda x: x)
<class 'function'>

2. Built-in functions such as len, repr or isinstance:

 >>> type(len)
<class 'builtin_function_or_method'>


In the CPython implementation, these are completely independent classes with different behaviours.

## User-defined functions binding as methods

Just to give one example, built-in functions do not have a __get__ method, which means that they do not become methods when used in a class.

Let’s consider this class:

class X(object):
def printme(self):
return repr(self)


This is essentially equivalent to

class X(object):
printme = (lambda self: repr(self))

>>> X().printme()
'<__main__.X object at 0x7fb342f960b8>'


However, directly putting the built-in function repr in the class does not work as expected:

class Y(object):
printme = repr

>>> Y().printme()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: repr() takes exactly one argument (0 given)


This is simply something that built-in functions do not support.

## User-defined vs. built-in functions

Here is a list of the main differences between user-defined and built-in functions:

• User-defined functions are implemented in Python, built-in functions are implemented in C.

• Only user-defined functions support __get__ and can become methods (see above).

• Only user-defined functions support introspection such as inspect.getargspec() and inspect.getsourcefile().

• CPython has specific optimizations for calling built-in functions.

• The inspect module and profiling make a difference between the two kinds of functions.

Cython generates C code, so Cython functions must be built-in functions. This has unfortunate disadvantages, such as the lack of introspection support, which is particularly important for Sphinx.

## The Cython function type: cyfunction

Luckily, the Cython developers came up with a solution: they invented a completely new function type (called cyfunction), which is implemented like built-in functions but which behaves as much as possible like user-defined functions.

By default, functions in Cython are built-in functions. With the directive binding=True, functions in Cython become cyfunctions. Since cyfunctions are not specifically optimized by CPython, this comes with a performance penalty. More precisely, calling cyfunctions from Python is slower than calling built-in functions from Python. The slowdown can be significant for simple functions. Within Cython, cyfunctions are as fast as built-in functions.

Since a cyfunction is not a built-in function nor a user-defined function (those two types are not subclassable), the inspect module (and hence Sphinx) does not recognize it as being a function. So, to have full inspect support for Cython functions, we need to change inspect.isfunction. After various attempts, I came up with hasattr(type(obj), "__code__") to test whether the object obj is a function (for introspection purposes). This will match user-defined functions and cyfunctions but not built-in functions, nor any other Python type that I know of.

## The future: a PEP to change the function types?

I have some vague plans for a Python Enhancement Proposal (PEP) to change the implementation of the Python function types. The goal is that Cython functions can be implemented on top of some standard Python function type, with all features that cyfunctions currently have, the performance of built-in functions and introspection support of user-defined functions.

At this point, it is too early to say anything about the implementation of this hypothetical future Python function type. If anything happens, I will surely post an update.

## What is RethinkDB?

UPDATE:  Several months after I wrote this post, RethinkDB was relicensed.  For the CoCalc project, it was too late, and by then we had already switched to PostgreSQL

RethinkDB is a INCREDIBLE high quality polished open source realtime database that is easy to deploy, shard, replicate, and supports a reactive client programming model, which is useful for collaborative web-based applications. Shockingly, the 7-year old company that created RethinkDB has just shutdown. I am the CEO of a company, SageMath, Inc., that uses RethinkDB very heavily, so I have a strong interest in RethinkDB surviving as an independent open source project.

## Three Types of Open Source Projects

There are many types of open source projects. RethinkDB was the type of open source project where most work on RethinkDB has been fulltime focused work, done by employees of the RethinkDB company. RethinkDB is licensed under the AGPL, but the company promised to make the software available to customers under other licenses.

Academia: I started the SageMath open source math software project in 2005, which has over 500 contributors, and a relatively healthy volunteer ecosystem, with about hundred contributors to each release, and many releases each year. These are mostly volunteer contributions by academics: usually grad students, postdocs, and math professors. They contribute because SageMath is directly relevant to their research, and they often contribute state of the art code that implements algorithms they have created or refined as part of their research. Sage is licensed under the GPL, and that license has worked extremely well for us. Academics sometimes even get significant grants from the NSF or the EU to support Sage development.

Companies: I also started the Cython compiler project in 2007, which has had dozens of contributors and is now the defacto standard for writing or wrapping fast code for use by Python. The developers of Cython mostly work at companies (e.g., Google) as a side project in their spare time. (Here's a message today about a new release from a Cython developer, who works at Google.) Cython is licensed under the Apache License.

## What RethinkDB Will Become

RethinkDB will no longer be an open source project whose development is sponsored by a single company dedicated to the project. Will it be an academic project, a company-supported project, or dead?

A friend of mine at Oxford University surveyed his academic CS colleagues about RethinkDB, and they said they had zero interest in it. Indeed, from an academic research point of view, I agree that there is nothing interesting about RethinkDB. I myself am a college professor, and understand these people! Academic volunteer open source contributors are definitely not going to come to RethinkDB's rescue. The value in RethinkDB is not in the innovative new algorithms or ideas, but in the high quality carefully debugged implementations of standard algorithms (largely the work of bad ass German programmer Daniel Mewes). The RethinkDB devs had to carefully tune each parameter in those algorithms based on extensive automated testing, user feedback, the Jepsen tests, etc.

That leaves companies. Whether or not you like or agree with this, many companies will not touch AGPL licensed code:
"Google open source guru Chris DiBona says that the web giant continues to ban the lightning-rod AGPL open source license within the company because doing so "saves engineering time" and because most AGPL projects are of no use to the company."

With RethinkDB today, the only option is AGPL. This very strongly discourage use by the only possible group of users and developers that have any chance to keep RethinkDB from death. If this situation is not resolved as soon as possible, I am extremely afraid that it never will be resolved. Ever. If you care about RethinkDB, you should be afraid too. Ignoring the landscape and culture of volunteer open source projects is dangerous.

## A Proposal

I don't know who can make the decision to relicense RethinkDB. I don't kow what is going on with investors or who is in control. I am an outsider. Here is a proposal that might provide a way out today:

PROPOSAL: Dear RethinkDB, sell me an Apache (or BSD) license to the RethinkDB source code. Make this the last thing your company sells before it shuts down. Just do it.

Hacker News Discussion

## 6 GSoC SageMath Projects

During the past couple of summers, SageMath successfully managed many Google Summer of Code projects. This year we are again happy to have six projects:

### Implementing matroid classes and plotting improvements

(Zachary Gershkoff / Stefan van Zwam)
This project seeks to implement several common matroid classes in SageMath, along with algorithms for their display and relevant computations. The graphic matroid class in particular will be implemented with a representative graph with methods for Whitney switching and minor operations. This will be accompanied by improvements to the graph theory library, with methods relevant to matroids enabled to support multigraphs. Other modules for this project include improved plotting of rank 3 matroids to eliminate false colinearities, computation of a matroid's automorphism group using SageMath's group theory libraries, and faster minor testing based on an existing trac ticket.

### Expanding the Functionality of Dynamical Systems

(Rebecca Lauren Miller / Paul Fili and Ben Hutz)

As a member of the sage-dynamics community, researchers have compiled a wishlist for algorithms and functionality they would like added. I would like to shorten the wish list for us.For my project I will be completing some desired additions to SAGE from the Sage Dynamics Wiki. I will implement Well’s Algorithm, strengthen the numerical precision in cannonical_height, as well as implement reduced_form for higher dimensions.

### Improvement of Complex Dynamics in Sage

(Ben Barros / Adam Towsley and Ben Hutz)
There are three major things that I would like to implement to improve the functionality of Sage in the area Complex Dynamics. The details of the project are summarized in the following list:
• Complex Dynamics Graphical package: Integrate or implement a complex dynamics software such as Mandel into Sage. This will be done by creating an optional package for Sage. If there is enough demand, the package may become a standard package for Sage at some point.
• Spider Algorithm: The object of the Spider Algorithm is to construct polynomials with assigned combinatorics. For example, we may want to find a polynomial that has a periodic orbit of period 7. The Spider Algorithm provides a way for us to compute this polynomial efficiently. I plan to implement this algorithm into Sage.
• Coercion: If you have a map defined over Q, you should be able to take the image of a point over C (i.e. somewhere you have a well-defined embedding) without having to use the command "change_ring()". Something similar works for polynomials in Sage but it does not work for morphisms/schemes.

### Linear-time Implementation of Modular Decomposition of Undirected and Directed Graphs

(Lokesh Jain / Dima Pasechnik)
This project is aimed at providing linear time implementation for modular decomposition of graphs and digraphs. Modular decomposition is decomposition of graph into modules. A module is a subset of vertices and it is a generalization of connected component in graph. Let us take for example a module X. For any vertex v ∉ X it is either connected or not connected to every vertex of X. Another property of module is that a module can be subset of another module. There are various algorithms which have been published for modular decomposition of graphs. The focus in this project is on linear time complexity algorithms which can be practically implemented. The project further aims to use the modules developed for modular decomposition to implement other functionality like skew partitions. Skew partition is partition of graph into two sets of vertices such that induced graph formed by one set is disconnected and induced graph formed by other set is complement of the first. Modular decomposition is a very important concept in Graph Theory and it has a number of use cases. For instance it has been an important tool for solving optimization and combinatorics problems.

### Modular Decomposition of graphs and digraphs

(Maria Ioanna Spyrakoy / Dima Pasechnik)
Modular decomposition of (di)graphs is a generalization of the concept of the decomposition of (di)graphs into connected components. Its current implementation in Sage relies on badly broken abandoned C code, and badly needs to be replaced by something that works and is not too slow. However, the only open-source implementations of some of these procedures are either in Java or in Perl, and thus aren't really useful for Sage.

Note: A attentive reader might notice the similarity between those projects. They will be split regarding the type of graph and be coordinated to not overlap but to augment each other.

### Visualizing constructs in cluster algebras and quiver representations

(Bryan Wang / Travis Scrimshaw)
I aim to implement visualizations of several key constructs in cluster algebras and quiver representations. The first is Auslander-Reiten quivers, for at least the A_n and D_n cases. The second is labelled endomorphism quivers and mutations within a cluster category, focusing on the A_n case. The third is posets of down-mutations for the A_n case. These features will be useful not only for research purposes, but also as nice examples to play around with and learn from. Aside from these features, I am interested in implementing features for the Quantum Cluster Algebras project.

All the best for this summer, thank you to Google for making this possible, and sorry to all those candidates who didn't make it ...

### OpenDreamKit

#### Release: nbdime 0.3.0

We are happy to announce release 0.3.0 of nbdime, continuing to improve the process of working with Jupyter notebooks in version control.

The highlight of 0.3 is much improved integration with git, making it easier than ever to get started with nbdime in git:

pip install --upgrade nbdime # install nbdime
nbdime config-git --global --enable # tell git to use nbdime when it sees notebooks


and you can get nice GUI diffs directly from git refs on the command-line:

nbdiff-web master mynotebook.ipynb


## April 26, 2017

### OpenDreamKit

#### Debriefing from a successful Formal Project Review by the EU Commission

On April 26th, OpenDreamKit underwent its first formal review by the European Commission. We presented the achievements of the first 18 months of the project, including 30 deliverables (reports, slides). Overall, the feedback was very positive, with language such as “enthusiast”, “brilliant”, “amazing job”, or “things have come along fantastically”. We made a strong point in our reports and presentations that a vast majority of what’s happening comes from the ecosystem we support. All we do is exploit the special resources the EU is entrusting us to knock down some tough hurdles that are preventing the ball to roll. Kudos to our communities!

## Debriefing notes I sent to the OpenDreamKit participants

About twenty of us were in Brussels early this week for the OpenDreamKit Month 18 formal review. After two days of intensive preparation, we presented our work on Wednesday to our project officer and reviewers.

There are a few points that we need to think about (not unexpected). But otherwise the hard work we all put since the beginning of the project came out as quite a show. The panel gave very constructive feedback and were overall really happy. They appreciate our approach, our work, our spirit.

Now is the time to enjoy that appreciation and build on that energy to do even better in the coming years. Pass this on to our communities!

Speaking of funding: the reviewers made a strong point that we bear a big responsibility: apparently mathematics does not have a good press in the high spheres these days. We were very lucky, as a math project, to be funded; it’s really because they appreciated so much the strength of the proposal and our “clever and creative interpretation of the call” that we made it through. “No other math project is being funded” (this quote obviously does not apply to ERCs; the scope is plausibly that of H2020 projects).

They now need strong ammunition to make sure that future calls leave room for mathematics. So not only do we have to succeed because we care so much about our aims (and should investigate followups to pursue them further), but also for the sake of other projects elsewhere in mathematics. We also need to proactively explain and highlight to a wider audience what we do in collaboration with our communities. There is very good stuff going on, let it be seen.

Some further thoughts needs to be put in how to achieve that. For now, the take home message is simple: If you witness something nice happening, from a technical achievement with a wow factor to a thought provoking anecdote, write a blog post about it. See the instructions, or even just send a brief draft text by e-mail to Mike Croucher with me in CC.

Let me conclude by thanking the whole band that came to Brussels (with a special nod to the presenters on which we dumped the most delicate presentations). I was frustrated as you all were spending all this time together without tackling what we all really care most. However we built on our image in the Commission, and used the occasion to strengthen our group around a joint vision. This is a worthwhile long term investment.

Thank you everybody for all the enthusiastic, dedicated and beautiful work. It’s an honor and a pleasure to be working with such a team.

Remember: pass it on to those supporting you and to your communities.

Cheers, Nicolas

## April 24, 2017

### OpenDreamKit

#### Formal Project Review for OpenDreamKit's first reporting period (Sept. 2015 to Feb. 2017)

At the occasion of its first (very successful!) formal review by the EU commission, twenty OpenDreamKit participants met on the last week of April 2017 at the CLORA (Club of associated research organisations) headquarters in Brussels.

## Participants

A framadate poll was created

### Were present on the steering committee day:

Nicolas THIERY; Benoît PILORGET; Erik BRAY; Viviane PONS; Vincent DELECROIX; Michael KOHLHASE; Dennis MUELLER; Florian RABE; Tom WIESING; Clément PERNET; Wolfram DECKER; William HART; Dmitrii PASECHNIK; Marcin KOSTUR; Mike CROUCHER; Hans FANGOHR; Alexander KONOVALOV; Stephen LINTON; Luca DE FEO; John CREMONA; Paul-Olivier DEHAYE; Benjamin RAGAN-KELLEY; Jeroen DEMEYER; Konrad HINSEN

## Agenda

• Second amendment to the Grant
• WP7: overview of the situation and of the measures taken and to be planned
• Overview of the deliverables due for M18
• Best practice for the Project Review

## Minutes

### Second amendment to the Grant

Benoît Pilorget (BP) announced that the Second Amendment should be over in April-May. All the modifications concerning deliverables and the scientific context were accepted. The remaining blocking points were purely administrative and require some time. At the moment these notes are being written (19/05/2017), the Commission seems to have fully agreed on all terms and is about to sign the amendment.

Related to this amendment, the consortium expressed their congratulations to Hans Fangohr (HF) for his new position at XFEL, in Hamburg. All points of the amendment can be found on the github issue #193

### WP7: overview of the situation and of the measures taken and to be planned:

An open brainstorming session took place. Resulting from the retirement of Ursula Martin, it appears that some aspects of WP7 that require research-grade expertise in sociology will be hardly achievable as it is organised today with the current consortium. Therefore solutions must be found so that we don’t just tick the boxes but actually deliver high quality material.

Several options have been discussed:

1) Hire new staff specialised in sociology or likewise field. This solution would probably lead to the transfer of some funding within the consortium, unless it turns out enough Person-Months are planned at UOXF

2) Subcontract the planned work not feasible. For this solution to work out, one must find an adequate subcontractor (providing enough funds are available within UOXF or the consortium) and sign an amendment to the Grant with the Commission

3) Rethink the scientific content (objectives, tasks, deliverables), to take into account all we have learned since the writing of the proposal, and make the best use of the available ressources and consortium expertise. This of course would require a negotiation with the EU and probably a new amendment to the grant agreement.

The consortium is expecting official feedback from the Project Officer and reviewers after the formal review. In the meantime, the Coordinator and Principal Investigators of WP7 will be informally brainstorming all possibilities. Were an amendment necessary, it will be written after the current amendment for the addition of FAU Erlangen and XFEL is signed by the Commission.

### Overview of the deliverables due for M18 and 24

After a tour de table, the Coordinator ensured that all deliverables due for Month 24 (31/08/2017) have a leader and a definite working plan.

### Best practice for the Project Review:

BP reminded the consortium of the support slides for the Review that were presentend at the Edinburgh steering committee meeting.

### Feedback from the Quality Review Board

HF, the chair of the Quality Review Board (QRB), expressed rough positive feedback from the first QRB meeting. A full report will be made available for all participants.

## April 06, 2017

### OpenDreamKit

#### The story behind our website

OpenDreamKit uses a static website powered by Jekyll and GitHub. Ever wondered what it means? Read this post to discover.

#### Report on WomenInSage

Last January, Viviane Pons, Jessica Striker and Jennifer Balakrishnan organized the first WomenInSage event in Europe with OpenDreamKit. 20 women spent a week together coding and learning in a rented house in the Paris area.

# The Workshop

## Opening event

To open the workshop, Viviane, Jessica, and Jennifer gave a series of introduction to Sage lectures at the Institut Henri Poincaré in Paris, covering combinatorics and number theory.

## The Week

The workshop then moved to the rented house. There, we organized short talk sessions to get to know our respective research fields and expectations for the week. After that, we were able to split into small groups to work on many different projects: STL export, Krummer surfaces, Kuznyechik cipher, Motzkin words, Shioda invariants, and more. We also had presentations on How to contribute to Sage (with a crash course on git) and How to write a Sage package. Every evening, we had a Status report session to share our progress with the group. You can read our program and final status reports on the event wikipage.

## Special PyLadies coding cafe

Viviane Pons is one of the organizer of the local Paris chapter for PyLadies. She organized a meeting between the WomenInSage mathematician and the PyLadies developers. We were welcomed by Algolia for an afternoon of coding-and-chatting with the PyLadies.

# Impact

The data presented here come from a post-event questionnaire sent to the participants.

The gender gap is very important in the mathematic development community. In the OpenDreamKit project, among the 54 participants we are only 3 women. This reflects the global situation in the field. Many mathematician women are still hesitant to join our community and lack confidence in their abilities as developers. Organizing a women targeted event is a way to motivate them and building up self-confidence in a safe and casual atmosphere.

The women who attended the conference had various level of programming experience ranging from 1 (no experience) to 5 (a lot of experience).

This disparity also reflected in their knowledge of Sage.

As for contributions, only 4 participants had contributed to Sage in the past which included the 3 organizers. Also, a majority of participants had never attended a Sage Days before. Actually, 6 of them had never even heard of Sage Days and 2 of them said they did not think it was “for them”.

To the question “How did the fact that the event was targeted to women impact your decision to come? (Would you have participated in a classical SageDays)”, Many participants answered that it was indeed a factor a their decision.

Yes, but it helped. I didn´t feel so sure about my skills and being surrounded by women made things easier.

It was a new experience that I don’t regret at all.

I might have participated, but would have been less confident.

I have participated in and benefited from classical SageDays, but found this event to be even better at creating an atmosphere where everyone felt empowered to learn and contribute.

I made a special effort I would not have done for regular sage days.

One of the participant said she would not have felt comfortable sharing a house with men but that this event was such positive experience that she would now consider it for other Sage days. The event helped building up the confidence of the participants, 9 of them said they felt more confident to attend classical Sage Days after the event.

We took advantage of the diverse knowledge background of our group to work together and learn from each other. It was an occasion for many “first times” among participants who had very little experience with Sage:

• 5 participants installed a source version of Sage for the first time (so that they could edit the source).
• 3 used git for the first time.
• 5 used git within Sage for the first time.
• 11 got their first Trac account .
• 5 got their first contribution to a Sage ticket.
• 8 are in the process of getting their first code integrated to Sage.

We worked on 14 tickets during the week, 6 of those which have been merged since the conference. All participants said they had learned new things and it would impact their careers.

This also was an occasion to start projects and form more research and development collaborations for future.

All of this happened in a very casual and welcoming atmosphere. We used the common rooms of the house to work. We cooked international, vegetarian friendly meals (some participants had brought food and recipes from their home countries). We got to know each other and shared more than code. All participants agreed that it was a very positive experience. When asked to rate the general atmosphere of the conference, all of them gave a 5.

As an organizer, it was also very rewarding and it motivates me to do it again. To the question: “Any other comment you might have?”, we only got one answer.

All three organizers were so very generous with their time and expertise, and created a wonderful supportive environment. Thank-you!

## Berlin workshop

This week the KWARC team (Michael Kohlhase, Florian Rabe, Dennis Müller) and myself met in Berlin at the WIAS. The goal was to meet some of the modelers working there, who are very interested in the MMT system and the work in OpenDreamKit. Their entry point is Work Package 6 (interoperability), motivated by the benefits they would get intrinsically from formalizing the mathematical work they do into the OMDoc/MMT language (e.g. addressability of mathematical models), but also with an eye on all the other work packages from OpenDreamKit (e.g. interactive documents). Personally, I was focused on working out what I could of a semantic interchange between Sage and GAP of mathematical objects.

## Formalization of mathematical concepts

To start, we decided to do a bit of prototyping around transitive groups. The first step in the Math-in-the-Middle methodology for interoperability between computer algebra systems is to formalize the mathematical concept itself. Recent progress on the MMT language has actually made this very practical (see also here):

A mathematician should be able to point to this and get near universal agreement in the community on what that means.

Line 24 is of course critical to the definition, but one can see that the rest is well structured and readable. I have omitted here the first five lines, which consist of include statements, and make the whole thing a completely formal definition yet implemented at a very high level of abstraction. You could slim down those includes and build the same thing on flexiformal foundations, e.g. not bother with the logic “deep down”.

Overall, not many mathematicians might be able to write this, but almost any mathematician can navigate her way through it. It also helps that the jEdit editor and the MathHub webserver have drastically improved, especially in ease of use (work done as part of Work Package 4), but also installation and resilience (work done as part of Work Package 3).

## Math-in-the-Middle methodology

Now that we have a target formalization, the idea is to separately make Sage and GAP interact with it. In the Math-in-the-Middle (MitM) formalism adopted for Work Package 6, we think of having in the “center” a system-independent flexiformalization of the mathematical domains (represented in this diagram in blue; replace in your head EC for elliptic curves with TG for transitive groups).

The next step is to work on the reddish clouds, which are the interface theories between this center and the other systems. These interface theories mainly flexiformalize the system-specific aspects of the domain.

On the GAP side, GAP generates for those interfaces OMDoc/MMT Content Dictionaries (CDs) that contain name, type, and documentation for all API functions (constructors, predicates, methods, …). This is automated, has good coverage and is very rich semantically (more on that towards the end of the post). The next step of the plan is then to align the generated system CDs with the MitM formalization by the MMT implements relation of aligment (e.g. an aligment could be: GAP-transitive_group MMT-implements MitM-transitive group). If equivalent Sage CDs were available, as well as Sage alignments, we would get a semantic crosswalk between GAP and Sage by composing the MitM alignments between all those different CDs. This would provide the necessary framework for interoperability.

At the moment Sage does export some of its knowledge into CDs, thanks to what was implemented by Nicolas Thiéry, leveraging his category framework. This is unfortunately not enough to cover transitive groups, which have rich structure as category objects (but the “Category of Transitive Groups” does not exist in Sage). Given the circumstances of this workshop, I thus decided to focus on the Sage side, and see what information I could extract about transitive groups.

## Exporting knowledge from Sage

If you look at Sage’s TransitiveGroup, a lot of mathematical knowledge is acquired from elsewhere through the class hierarchy lying above TransitiveGroup, and the category framework that instruments that hierarchy. This lead me to first try to build a model of how the Sage class TransitiveGroup was actually implemented and what it was doing, but this was a mistake. Indeed, it was very difficult, as I got lost between meta-logics and what I was actually trying to do: modeling Sage? modeling how Sage models math? how python uses Sage to model math? I was trying to do too much, too early and was probably the wrong person to do that.

If you look back at the methodology, the MitM CDs don’t need to link up to the Math-in-the-Middle content dictionary right away. This is actually up to the alignments, that come later (and could be done by a different person). I was trying to do both at once, while my focus should really have been: “how do I export, but not align, as much of the math knowledge as possible embedded into Sage into a language that can easily be processed by the KWARC team?” (for the categories export built by Nicolas Thiéry, the export went through JSON).

OK then, the question now becomes: “where is math knowledge embedded in Sage that is relevant to the mathematical concept of transitive group?” The first response is of course still “Everywhere!”, but where are actually the low hanging fruits?

### A math skeleton

I found that the best way to communicate around this issue with the KWARC team is by extracting from Sage code a “math skeleton”. For this, the Sage-specific module sageinspect was very useful. I thus introspected the sage object corresponding to the class TransitiveGroup, and related objects:

# sage/src/sage/structure/sage_object.pyx
cdef class SageObject:

# sage/local/lib/python2.7/site-packages/sage/categories/category.py
class Category(UniqueRepresentation, SageObject):

# sage/src/sage/structure/category_object.pyx
cdef class CategoryObject(SageObject):

# sage/local/lib/python2.7/site-packages/sage/structure/parent.pyx
cdef class Parent(category_object.CategoryObject):

# sage/src/sage/groups/group.pyx
cdef class Group(Parent):

# sage/src/sage/groups/group.pyx
cdef class FiniteGroup(Group):

# sage/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup.py
class PermutationGroup_generic(group.FiniteGroup):

# sage/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup_named.py
class PermutationGroup_unique(CachedRepresentation, PermutationGroup_generic):

# sage/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup_named.py
class TransitiveGroup(PermutationGroup_unique):


What is mathematical here? Clearly, just about everything, but that is because I was selective in the printout given above: I worked up the class hierarchy from TransitiveGroup by hand, but excluded all the python objects that don’t inherit from SageObject. For instance, you don’t see in that list:

# sage/local/lib/python2.7/site-packages/sage/structure/unique_representation.py
class CachedRepresentation:


CachedRepresentation is only relevant, from a mathematical standpoint, in where it appears as a superclass. Its own internals are pure design decisions for CAS software, not mathematics.

The criterion to use for “related objects” is thus that only objects inheriting from SageObject need to be navigated. So we are navigatin in the class hierarchy diamond between TransitiveGroup and SageObject, collecting classes, which I manually imported from the sage library (obviously this could be automated):

from sage.structure.sage_object import SageObject
from sage.structure.category_object import Category     # not strictly in the class hierarchy, but included to facilitate discussion
from sage.structure.category_object import CategoryObject
from sage.structure.parent import Parent
from sage.groups.group import Group
from sage.groups.group import FiniteGroup
from sage.groups.perm_gps.permgroup import PermutationGroup_generic
from sage.groups.perm_gps.permgroup_named import PermutationGroup_unique
from sage.groups.perm_gps.permgroup_named import TransitiveGroup


This is how I selected the objects from which I wanted to extract more information, producing the list of class definitions above.

[Note by the way the weird changes in the path to sageinspect.sage_getsource in the listing above (why??? because of interactions between import statements?)]

### More flesh on the skeleton

The next step is to add a bit of flesh to that skeleton export. Obviously this is going to be more intricate. I have included here what you get when you look at all the methods coming out of the source code for TransitiveGroup, PermutationGroup_unique, etc. In other words, a completely static navigation to the specific methods. This was the right thing to do for communicating with the KWARC team, but is wrong for our ultimate purpose. It was the right thing to do to communicate with KWARC (or in a blog post) as it distilled Sage to its most interesting bits, and we could fill the gaps relying on comment concepts (like “class hierarchy”). However, as a quicker way to get more consistent and richer Sage output, I could have navigated dynamically to the relevant classes, and extracted all the methods available from the live objects. This is of course because tons of methods get added when the object gets created, with a lot of mathematics packed into that. The same math could be reconstructed from the source code, but obviously that would be harder to do as we would be re-emulating a lot of what python does.

In any case, here is the full printout of what I get for just the method declarations for PermutationGroup_generic, the Parent that is most interesting:


# sage/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup.py
class PermutationGroup_generic(group.FiniteGroup):
def __init__(self, gens=None, gap_group=None, canonicalize=True, domain=None, category=None):
def construction(self):
def _has_natural_domain(self):
def _gap_init_(self):
def _magma_init_(self, magma):
def __cmp__(self, right):
def _element_class(self):
def __call__(self, x, check=True):
def _coerce_impl(self, x):
def list(self):
def __contains__(self, item):
def has_element(self, item):
def __iter__(self):
def gens(self):
def gens_small(self):
def gen(self, i=None):
def identity(self):
def exponent(self):
def largest_moved_point(self):
def degree(self):
def domain(self):
def _domain_gap(self, domain=None):
def smallest_moved_point(self):
def representative_action(self,x,y):
def orbits(self):
def orbit(self, point, action="OnPoints"):
def transversals(self, point):
def stabilizer(self, point, action="OnPoints"):
def base(self, seed=None):
def strong_generating_system(self, base_of_group=None):
def _repr_(self):
def _latex_(self):
def _order(self):
def order(self):
def random_element(self):
def group_id(self):
def id(self):
def group_primitive_id(self):
def center(self):
def socle(self):
def frattini_subgroup(self):
def fitting_subgroup(self):
def intersection(self, other):
def conjugacy_class(self, g):
def conjugacy_classes(self):
def conjugate(self, g):
def direct_product(self, other, maps=True):
def semidirect_product(self, N, mapping, check=True):
def holomorph(self):
def subgroup(self, gens=None, gap_group=None, domain=None, category=None, canonicalize=True, check=True):
def as_finitely_presented_group(self, reduced=False):
def quotient(self, N):
def commutator(self, other=None):
def cohomology(self, n, p = 0):
def cohomology_part(self, n, p = 0):
def homology(self, n, p = 0):
def homology_part(self, n, p = 0):
def character_table(self):
def irreducible_characters(self):
def trivial_character(self):
def character(self, values):
def conjugacy_classes_representatives(self):
def conjugacy_classes_subgroups(self):
def subgroups(self):
def _regular_subgroup_gap(self):
def has_regular_subgroup(self, return_group = False):
def blocks_all(self, representatives = True):
def cosets(self, S, side='right'):
def minimal_generating_set(self):
def normalizer(self, g):
def centralizer(self, g):
def isomorphism_type_info_simple_group(self):
def is_abelian(self):
def is_commutative(self):
def is_cyclic(self):
def is_elementary_abelian(self):
def isomorphism_to(self, right):
def is_isomorphic(self, right):
def is_monomial(self):
def is_nilpotent(self):
def is_normal(self, other):
def is_perfect(self):
def is_pgroup(self):
def is_polycyclic(self):
def is_simple(self):
def is_solvable(self):
def is_subgroup(self, other):
def is_supersolvable(self):
def non_fixed_points(self):
def fixed_points(self):
def is_transitive(self, domain=None):
def is_primitive(self, domain=None):
def is_semi_regular(self, domain=None):
def is_regular(self, domain=None):
def normalizes(self, other):
def composition_series(self):
def derived_series(self):
def lower_central_series(self):
def molien_series(self):
def normal_subgroups(self):
def poincare_series(self, p=2, n=10):
def sylow_subgroup(self, p):
def upper_central_series(self):


Here are things a semi-intelligent mathematician can deduce from this fleshed-out skeleton, and that we might be able to export automatically:

• The arity of all those functions is useful. Unfortunately, this being python (2.x), the type cannot simply be read. Michael Kohlhase has some interesting ideas regarding mathematicians, types and the modeling necessary for MMT. I think he is right, partly, and there is much to look forward to in the services MMT can provide around type inference. Note that it will be core to this process that MMT allows for flexiformalisation as well!
• I omitted docstrings in this export, but of course this is also useful for semantic information in natural language. Often the docstring contains structured information too, for instance some typing information (see above).
• There is, as often, a method called __init__ that specifies a constructor. In other words, some combination of maps from some parameter space into the object modeled by PermutationGroup_generic. That relationship is messy though, most of the time. Note that the GAP team took the opportunity over last summer to have an intern refactor/regularize the way they did constructors into a more “semantic” way”: essentially instead of using the elementary __init__, they made a defconstructor and gave it documentation, type information,… as parameters. Of course defconstructor elaborates to a call to __init__ but the parameters can be used in the CD generation (and for static type-based optimizations later; ask Markus Pfeiffer @ St. Andrews if you are interested in the details).
• _gap_xxxx and _magma_xxxx indicate that the relevant “stuff” exists in the corresponding CASes. This is thus indicating a good place to bootstrap the alignment process between gap and sage, and therefore extract KPIs and generally optimize our progress. This would be best done by instrumenting at the SageObject level, since this is where all those _other-computer-algebra-system_xxxx methods are first located, as abstract methods.
• the presence of magic methods __xxxxxxx__ indicates the existence of a relation of some kind on the elements of PermutationGroup_generic, which is a Sage Parent. However, this information is best extracted from the categories export itself, presumably all(?) the time.
• is_xxxx methods indicate the existence of a test and thus a property.
• after some very basic pruning, all the other methods indicate the existence of clear mathematical objects, often relatively simple maps.

Many of the deductions made above will be done in the same way for all Parents (at least if we go for the easiest information to grab), so that’s where the instrumentation should go. Most of that instrumentation actually makes sense to have in a CAS, beceause it exposes mathematically relevant concepts. It would simply be used by the exporter generating the Content Dictionary.

Remark: Ultimately we want to extract information from live objects. It should not be lost, however, that what we are trying to do is partly a social process (the study of this process is itself the topic of Work Package 7). Humans have built the code from which we are trying to extract information, and now we want to communicate that with other humans so they can in turn code on top of that. Those other humans are familiar with different tools. For instance the KWARC team uses MMT related tools, like MathHub, but not Sage. Presumably other CAS developers or even “plain” mathematicians will just see Sage through an interface built on top of MMT. So I would advocate that we:

1. Make sure to export all the information containing math from Sage into MMT, even that which is not readable beyond text by the system we export to;
2. Devise methods to make this informal export as addressable as possible from within MMT, but not necessarily runnable.

Step 1. could be useful for instance if one is working in GAP and asking “How does Sage do that?”. We should be able to access Sage source code from within GAP, and it will be useful for automating some tasks.

Step 2. would be useful for students in the KWARC group, for instance, who would then be able to extract semantically richer information from a system like Sage with just verbal instructions from domain specific experts, because the data is now in MMT format. It splits the step in two: MMT extraction and semantic extraction, and requires different skills.

The process could be further accelerated, I bet, by exposing also deep sage introspection tools into MMT.

At this stage self-preservation instincts kick in and I don’t want to think deeper at this proposal from a logical standpoint.

I wish to thank Michael Kohlhase for suggestions that have improved the first draft of this post.

## March 24, 2017

### OpenDreamKit

#### Report on the WP6-WIAS Workshop on Math-in-the-Middle Content

WP6 participants JacU (Florian Rabe), FAU (Dennis Müller, Michael Kohlhase) and UZH (Paul Olivier Dehaye) came together with members of the Weierstrass Institute for Applied Analysis and Stochastics (WIAS: Thomas Koprucki and Carsten Tabelow) for a one-week code (20. 3. – 24. 3.) sprint on the Math-in-the-Middle Content and Logic and the encoding of mathematical Models. The result of this was a significant extension of the MitM ontology (in particular for the meta-theories for Sage) and a WIAS preprint on formalizations of Models.

## March 08, 2017

### OpenDreamKit

#### SageMathCloud for OpenDreamKit

Part of OpenDreamKit’s mission is to work on user interfaces for better collaboration and also component architectures. This is why the SageMathCloud platform is of special interest for us. One of our tasks is even to have a deeper look into its code base. In this post, as part of our Review on emerging technologies, we propose an overview of the platform.

## What is SageMathCloud?

SageMathCloud is an online platform which allows the creation of collaborative scientific projects including many scientific softwares and tools like SageMath, Jupyter, SciPy, Julia, Latex, and more.

Its codebase is open-source, distributed under the GNU General Public License. The platform is run by a private company (SageMath Inc.) created by William Stein who is also the initiator of the SageMath software. The platform offers both free and paying premium accounts.

### Projects

The main tool of the SageMathCloud platform is the possibility to create projects from which you can access the many features. A single user can create as many projects as needed. Each project is an independant Linux virtual machine. It thus comes with a full file system and an online terminal that allows you to run Linux commands. The storage of each project is limited by default but can be extended on premium accounts. You can access the files through the SageMathCloud web interface or also through ssh.

One key feature is that each project can be shared by multiple users. This allows sharing access to the files and also real time editing though the platform. Single files or folders can also be made public. A link is then provided which allows either viewing or downloading the files (even without a SageMathCloud account) and also an easy way to copy onto a different SageMathCloud project owned by the viewer.

### Softwares

When you create a SageMathCloud project, your Linux virtual machine comes with many softwares and tools especially useful for mathematicians and scientists in general. We list here the most important ones.

• Sage and Sage worksheets. As the name indicates, the platform was primarily developed as a replacement for the old Sage notebook server to allow collaborative online work using Sage. The SageMath software is of course installed by default on the virtual machine and one can run Sage through the online terminal. The platform also offers its own Sage worksheet filetype to edit and run Sage code in a cell-type system (as in the Jupyter notebook or the old Sage notebook) mixed with other cell types like text and HTML. This is used to create interactive worksheets that can be easily shared and copied.

• Jupyter. SageMathCloud includes a Jupyter notebook interface with many kernel options (Python 2, Python 3, Anaconda, Sage, R, Julia, and more). On top of the usual interface, SageMathCloud’s Jupyter offers real time synchronization among multi users.

• Latex. The common document preparation system Latex is installed on the virtual machine. It also offers a multi user editor with real time synchronization and a dual view of both the Latex source code and pdf output.

### Notebooks: SMC, Sage and Jupyter

SageMathCloud offers very inovative features in terms of notebooks which should be studied both on technichal and usability aspects.

• Real time notebooks. Real time multi user synchronization is a key aspect of SageMathCloud development. In particulatr, it has been a motivation for the development of SageMathCloud homemade Sage worksheet. More recently, it has also been added to the Jupyer notebook by enhancing the original software. This enhancement is of particular interest for OpenDreamKit as this could benefit all Jupyer users.

• Muli-kernel, multi-client. The multi-kernel philosophy is an inherent part of Jupyer development. Indeed, Jupyer is a notebook interface that can be used with many different language kernels (python, Sage, Julia, and more). SageMathCloud follows the same spirit and offers a variety of kernels on the Sage worksheet. One advantage of the Sage worksheet is that it allows for many kernels to be used in different cells of one single worksheet (in Jupyer, the kernel has to be chosen once and for all for the entire worksheet). Furthermore, SageMathCloud has developed what they call the Jupyer bridge: allowing user to run a Jupyer kernel from within a Sage worksheet. In this sense, the Sage worksheet could be seen as alternative client to Jupyer the same way one can develop alternative kernels. More on this question can be read on the github page of SageMathInc.

## Sharing and teaching with SageMathCloud

### Accessibility

The great advantage of SageMathCloud is that it offers a complete scientific environment without the usual setting up hassle. It makes the different software very easy to access independently of the user personal system as long as there is an access to a good Internet connexion. As an example, a mathematician can share a demo of code (in a Jupyter or a Sage notebook) that could be used directly by its collaborators. Of course, the Internet access is itself a limit. Given poor network access, for example but not only in some developing countries where bandwidth is sometimes limited.

### Teaching

When teaching is concerned, the sharing facilities of SageMathCloud come very useful. Moreover, the platform offers a course managing system. The principle is as follows: the teacher has acces to a “main project” containing the class material; every student has its own project which is shared with the teacher. The course management system allows for automatic actions like:

• Create all the student projects where the teacher is automatically added as a collaborator.
• Create assignments by copying some material from the main project to the students projects.
• Collecting, grading, and returning assignments by copying back and forth between the students projects and the main project.

An assignment is just a folder. It can have multiple content depending on the class. Of course, the system is especially interesting when the assignment is given within an interactive worksheet and can then be achieved by the student directly on the interface. SageMathCloud then becomes a very good interface to initiate students to the many scientific softwares it offers.

## SageMathCloud and OpenDreamKit

The many features of SageMathCloud make it a very interesting project for OpenDreamKit to look at. Indeed, it offers one of the leading technologies for scientists in terms of cloud project management, teaching and sharing facilities. In particular it showcases a collection of features that have been selected and adopted by a wide community.It also has some limits which we would like to address through our project:

• Accessibility. As previously mentioned, the cloud based interface can not be easily accessed in places where the Internet connexion is not good enough. One solution would be to have clear easy-to-follow instructions on how to install a SageMathCloud platform in a local institution or on a personal machine. This is to be taken care of in D3.2 and D3.4.

• Interoperability and file formats. At the moment, the SageMathCloud platform offers two file formats for interactive worksheet: the Jupyter one and a home-made Sage worksheet one. It is not possible to run the Sage worksheets elsewhere than on the platform. Especially, there is no way to run a Sage worksheet on a local Sage installation. It is not yet clear what a long term unified worksheet solution would be and it is part of the OpenDreamKit project to work on this question. The technical choices made for the Sage worksheets are interesting to investigate in this regard, as well as, file conversions and so on.

## March 01, 2017

### OpenDreamKit

#### Jupyter Notebooks Facilitating Productivity, Sustainability, and Accessibility of Data Science

Min Ragan-Kelley presented a poster on Jupyter notebooks facilitating productivity, sustainability, and accessibility of data science and computational science in general. The poster included the role of OpenDreamKit-supported projects, such as nbdime and nbval in facilitating reproducible science.

PDF of Poster

DOI: 10.6084/m9.figshare.4696414.v1

## February 18, 2017

### Liang Ze

#### Distributive Laws

I’ve been participating in the Kan Extension Seminar II, and this week it’s my turn to post about Jon Beck’s “Distributive Laws” at the n-Category Cafe!

The post uses lots of string diagrams for monads, resulting in pictures like the following:

See you there!

## February 15, 2017

### OpenDreamKit

#### Reports from "Computational Mathematics with Jupyter" workshop

Jointly with the Collaborative Computational Project “CoDiMa - CCP in the area of Computational Discrete Mathematics”, we have organised the workshop “Computational Mathematics with Jupyter”, which took place at the International Centre for Mathematical Sciences in Edinburgh on 16-20 January 2017. You can find some reports from the workshop here:

## January 20, 2017

### OpenDreamKit

#### Task based parallelization of recursive linear algebra routines using Kaapi

Clément Pernet gave a talk at the Journée Runtime, on the work of the UGA partner (formerly UJF) on the parallelization of exact linear algebra using recursive tasks.

Pdf slides of Talk

#### A case study of computational science in Jupyter notebooks: JOOMMF

Hans Fangohr gave an introduction to computational micromagnetics and the current workflow that is used by thousands of scientists across the planet. He then introduced a new Python interface to the computational tool (OOMMF), and demonstrate how this can be driven from within a Jupyter Notebook. Through the notebook, work can be carried out more effectively, and more reproducibly. A roadmap and update for the Jupyter-OOMMF project was presented as well.

Pdf slides of Talk

Notebook of micromagnetic simulation (standard problem 3)

Notebook of micromagnetic model

Blog entry mentioning the presentation

## January 19, 2017

### OpenDreamKit

#### Biannual ODK Steering Committee meeting

The biannual OpenDreamKit Steering Committee meeting is taking place in Edinburgh at the occasion of the Computational Mathematics with Jupyter workshop.

## Brief agenda

• Preparation for the formal review
• Progress reports per site
• Advisory Board and Quality Review Board
• Amendment to the grant agreement
• Key Performance Indicators
• Deliverables due Month 18
• WP7 topics in view of personnel changes
• Future funding

## January 16, 2017

### OpenDreamKit

#### Workshop: Computational Mathematics with Jupyter

Jointly with the Collaborative Computational Project “CoDiMa - CCP in the area of Computational Discrete Mathematics”, we are currently organising a workshop “Computational Mathematics with Jupyter”.

It will take place at the International Centre for Mathematical Sciences in Edinburgh on 16-20 January 2017. Please see the workshop website for further details.

The Software Sustainability Institute blog is hosting a summary of the event.

## January 13, 2017

### OpenDreamKit

#### nbdime released

nbdime has had its first stable release. nbdime provides tools for diffing and merging Jupyter notebooks, and integrating notebooks into git workflow. nbdime aims to alleviate some common difficulties when working with Jupyter notebooks.

Particular features of nbdime:

• recognizing binary outputs that cannot be reasonably interpreted in the terminal
• recognizing transient fields and eliminating them from merge conflicts
• ensuring that merged notebooks are always valid
• integration with git as drivers for passive diff/merge integration and tools for interactive GUI integration

Tools provided by nbdime:

• nbshow: show a legible formatting of a notebook on the command-line
• nbdiff: command-line diff of notebooks, eliding outputs that are known to not be renderable in a terminal. nbdiff can be integrated into git as a diff driver
• nbdiff-web: create a rich, rendered web view of the changes between two notebooks
• nbmerge: three-way merge with automatic conflict resolution, which can be integrated into git as a merge driver, ensuring always-valid notebooks and eliminating merge conflicts on transient fields
• nbmerge-web: interactive three-way merge tool for manually resolving merge conflicts

## December 28, 2016

### OpenDreamKit

#### From pythran import typing

As part of its OpenDreamKit deliverable D5.4 (Make Pythran typing better to improve error information), the Pythran team has written an in-depth article about an unsound type checker.

## December 22, 2016

### OpenDreamKit

#### Full-time open-ended research software engineer position

We are seeking a full-time research software engineer at Warwick to work with Professor John Cremona on those parts of the OpenDreamKit project connected with the LMFDB in relation to Work Package 6. The position is open-ended. The post holder will be employed within the Scientific Computing Research Technology Platform (RTP) at Warwick, initially under secondment to Professor Cremona for the duration of the ODK project, together with longer-term responsibilities for the development and support of research software across the University.

The full advertisement may be found here.

Deadline for applications: 26 January 2017.

# Location

The developer will work at the University of Warwick, UK.

# Mission and activities

Initially, to work as part of the OpenDreamKit collaboration, on those parts of WP6 related to the LMFDB project. Longer term, to play a leading role in the development of research software support at Warwick.

# Skills requirements

See the See the full job advertisement for a detailed job description, and a list f essential and desirable criteria for the person to be appointed.

# Context

Until 31 August 2019, the position will be mainly funded by

OpenDreamKit, a Horizon 2020 European Research Infrastructure project that will run for four years, starting from September

1. This project brings together the open-source computational mathematics ecosystem – and in particular LinBox, MPIR, SageMath, GAP, PARI/GP, LMFDB, Singular, MathHub, and the IPython/Jupyter interactive computing environment. – toward building a flexible toolkit for Virtual Research Environments for mathematics. Lead by Université Paris-Sud, this project involves about 50 people spread over 15 sites in Europe, with a total budget of about 7.6 million euros.

From 1 September 2019 the post will be funded by the University of Warwick.

# Expressions of interest

Interested candidates should send an email to Professor John Cremona ([email protected]) for further information about the position, as soon as possible.

## December 16, 2016

### Sébastien Labbé

#### A time evolution picture of packages built in parallel by Sage

Compiling sage takes a while and does a lot of stuff. Each time I am wondering which components takes so much time and which are fast. I wrote a module in my slabbe version 0.3b2 package available on PyPI to figure this out.

This is after compiling 7.5.beta6 after an upgrade from 7.5.beta4:

sage: from slabbe.analyze_sage_build import draw_sage_build
sage: draw_sage_build().pdf()


From scratch from a fresh git clone of 7.5.beta6, after running MAKE='make -j4' make ptestlong, I get:

sage: from slabbe.analyze_sage_build import draw_sage_build
sage: draw_sage_build().pdf()


The picture does not include the start and ptestlong because there was an error compiling the documentation.

By default, draw_sage_build considers all of the logs files in logs/pkgs but options are available to consider only log files created in a given interval of time. See draw_sage_build? for more info.

### OpenDreamKit

#### nbdime 0.1.0

nbdime 0.1.0 has been released, implementing tools for diffing and merging Jupyter notebooks

Key features:

• nbdiff for diffing notebooks in the terminal
• nbdiff-web for viewing a rich, rendered diff of two notebooks
• nbmerge for merging three notebooks, with automatic conflict resolution that should always guarantee a valid notebook, even with unresolved conflicts
• nbmerge-web for manually resolving conflicts when merging notebooks
• nbshow for quickly viewing a notebook in the terminal
• git integration for using the diff and merge tools on notebook files by default

• Read the docs!
• Contribute!

## December 15, 2016

### OpenDreamKit

#### Full-time mathematical software developer position at TU Kaiserslautern

We are seeking a full-time mathematical software developer at TU Kaiserslautern to work with Prof. Wolfram Decker on the Singular contribution to the OpenDreamKit project.

Deadline for applications: TBA.

# Location

The developer will work at TU Kaiserslautern in the city of Kaiserslautern Germany. Kaiserslautern is next to one of the largest contiguous forests in Europe.

# Mission

To work as part of the OpenDreamKit collaboration, to implement improvements via parallelisation of components of Singular.

# Activities

To implement parallel algorithms in Singular in C/C++.

Particular deliverables include:

• Improving the quadratic sieve for integer factorisation.

• Parallelising the new polynomial arithmetic functionality in Singular.

Depending on the skills of the applicant, the developer may also wish to contribute to other aspects of the Singular project and mathematical research in Kaiserslautern.

# Skills requirements

• C/C++ programming experience

• Interest in either:

• algebra/number theory/algebraic geometry
• fast arithmetic
• the design and development of computer algebra systems
• Fluency in English

• Must have an Masters degree in Mathematics fully certificated

• Experience in Open Source development and tooling (GitHub)

# Context

The position will be funded by

OpenDreamKit, a Horizon 2020 European Research Infrastructure project that will run for four years, starting from September

1. This project brings together the open-source computational mathematics ecosystem – and in particular LinBox, MPIR, SageMath, GAP, PARI/GP, LMFDB, Singular, MathHub, and the IPython/Jupyter interactive computing environment. – toward building a flexible toolkit for Virtual Research Environments for mathematics. Lead by Université Paris-Sud, this project involves about 50 people spread over 15 sites in Europe, with a total budget of about 7.6 million euros.

# Applications

Interested candidates should send an email to both decker {at} mathematik dot uni-kl dot de and goodwillhart {at} googlemail dot com with a CV and short letter of application, as soon as possible.

## RethinkDB and sustainable business models

Three weeks ago, I spent the evening of Sept 12, 2016 with Daniel Mewes, who is the lead engineer of RethinkDB (an open source database). I was also supposed to meet with the co-founders, Slava and Michael, but they were too busy fundraising and couldn't join us. I pestered Daniel the whole evening about what RethinkDB's business model actually was. Yesterday, on October 6, 2016, RethinkDB shut down.

I met with some RethinkDB devs because an investor who runs a fund at the VC firm Andreessen-Horowitz (A16Z) had kindly invited me there to explain my commercialization plans for SageMath, Inc., and RethinkDB is one of the companies that A16Z has invested in. At first, I wasn't going to take the meeting with A16Z, since I have never met with Venture Capitalists before, and do not intend to raise VC. However, some of my advisors convinced me that VC's can be very helpful even if you never intend to take their investment, so I accepted the meeting.

In the first draft of my slides for my presentation to A16Z, I had a slide with the question: "Why do you fund open source companies like RethinkDB and CoreOS, which have no clear (to me) business model? Is it out of some sense of charity to support the open source software ecosystem?" After talking with people at Google and the RethinkDB devs, I removed that slide, since charity is clearly not the answer (I don't know if there is a better answer than "by accident").

I have used RethinkDB intensely for nearly two years, and I might be their biggest user in some sense. My product SageMathCloud, which provides web-based course management, Python, R, Latex, etc., uses RethinkDB for everything. For example, every single time you enter some text in a realtime synchronized document, a RethinkDB table gets an entry inserted in it. I have RethinkDB tables with nearly 100 million records. I gave a talk at a RethinkDB meetup, filed numerous bug reports, and have been described by them as "their most unlucky user". In short, in 2015 I bet big on RethinkDB, just like I bet big on Python back in 2004 when starting SageMath. And when visiting the RethinkDB devs in San Francisco (this year and also last year), I have said to them many times "I have a very strong vested interest in you guys not failing." My company SageMath, Inc. also pays RethinkDB for a support contract.

Sustainable business models were very much on my mind, because of my upcoming meeting at A16Z and the upcoming board meeting for my company.  SageMath, Inc.'s business model involves making money from subscriptions to SageMathCloud (which is hosted on Google Cloud Platform); of course, there are tons of details about exactly how our business works, which we've been refining based on customer feedback. Though absolutely all of our software is open source, what we sell is convenience, easy of access and use, and we provide value by hosting hundreds of courses on shared infrastructure, so it is much cheaper and easier for universities to pay us rather than hosting our software themselves (which is also fairly easy). So that's our business model, and I would argue that it is working; at least our MRR is steadily increasing and is more than twice our hosting costs (we are not cash flow positive yet due to developer costs).

So far as I can determine, the business model of RethinkDB was to make money in the following ways: 1. Sell support contracts to companies (I bought one). 2. Sell a closed-source proprietary version of RethinkDB with extra features that were of interest to enterprise (they had a handful of such features, e.g., audit logs for queries). 3. Horizon would become a cloud-hosted competitor to Firebase, with unique advantages that users have the option to migrate from the cloud to their own private data center, and more customizability. This strategy depends on a trend for users to migrate away from the cloud, rather than to it, which some people at RethinkDB thought was a real trend (I disagree).

I don't know of anything else they were seriously trying right now. The closed-source proprietary version of RethinkDB also seemed like a very recent last ditch effort that had only just begun; perhaps it directly contradicted a desire to be a 100% open source company?

With enough users, it's easier to make certain business models work. I suspect RethinkDB does not have a lot of real users. Number of users tends to be roughly linearly related to mailing list traffic, and the RethinkDB mailing list has an order of magnitude less traffic compared to the SageMath mailing lists, and SageMath has around 50,000 users. RethinkDB wasn't even advertised to be production ready until just over a year ago, so even they were telling people not to use it seriously until relatively recently. The adoption cycle for database technology is slow -- people wisely wait for Aphyr's tests, benchmarks comparing with similar technology, etc. I was unusual in that I chose RethinkDB much earlier than most people would, since I love the design of RethinkDB so much. It's the first database I loved, having seen a lot over many decades.

Conclusion: RethinkDB wasn't a real business, and wouldn't become one without year(s) more runway.

I'm also very worried about the future of RethinkDB as an open source project. I don't know if the developers have experience growing an open source community of volunteers; it's incredibly hard and its unclear they are even going to be involved. At a bare minimum, I think they must switch to a very liberal license (Apache instead of AGPL), and make everything (e.g., automated testing code, documentation, etc) open source. It's insanely hard getting any support for open source infrastructure work -- support mostly comes from small government grants (for research software) or contributions from employees at companies (that use the software). Relicensing in a company friendly way is thus critical.

## Company Incentives

Companies can be incentived in various ways, including:
• to get to the next round of VC funding
• to be a sustainable profitable business by making more money from customers than they spend, or
• to grow to have a very large number of users and somehow pivot to making money later.
When founding a company, you have a chance to choose how your company will be incentived based on how much risk you are willing to take, the resources you have, the sort of business you are building, the current state of the market, and your model of what will happen in the future.

For me, SageMath is an open source project I started in 2004, and I'm in it for the long haul. I will make the business I'm building around SageMathCloud succeed, or I will die trying -- therefore I have very, very little tolerance for risk. Failure is not an option, and I am not looking for an exit. For me, the strategy that best matches my values is to incentive my company to build a profitable business, since that is most likely to survive, and also to give us the freedom to maintain our longterm support for open source and pure mathematics software.

Thus for my company, neither optimizing for raising the next round of VC or growing at all costs makes sense. You would be surprised how many people think I'm completely wrong for concluding this.

## Andreessen-Horowitz

I spent the evening with RethinkDB developers, which scared the hell out of me regarding their business prospects. They are probably the most open source friendly VC-funded company I know of, and they had given me hope that it is possible to build a successful VC-funded tech startup around open source. I prepared for my meeting at A16Z, and deleted my slide about RethinkDB.

I arrived at A16Z, and was greeted by incredibly friendly people. I was a little shocked when I saw their nuclear bomb art in the entry room, then went to a nice little office to wait. The meeting time arrived, and we went over my slides, and I explained my business model, goals, etc. They said there was no place for A16Z to invest directly in what I was planning to do, since I was very explicit that I'm not looking for an exit, and my plan about how big I wanted the company to grow in the next 5 years wasn't sufficiently ambitious. They were also worried about how small the total market cap of Mathematica and Matlab is (only a few hundred million?!). However, they generously and repeatedly offered to introduce me to more potential angel investors.

We argued about the value of outside investment to the company I am trying to build. I had hoped to get some insight or introductions related to their portfolio companies that are of interest to my company (e.g., Udacity, GitHub), but they deflected all such questions. There was also some confusion, since I showed them slides about what I'm doing, but was quite clear that I was not asking for money, which is not what they are used to. In any case, I greatly appreciated the meeting, and it really made me think. They were crystal clear that they believed I was completely wrong to not be trying to do everything possible to raise investor money.

## Basecamp

During the first year of SageMath, Inc., I was planning to raise a round of VC, and was doing everything to prepare for that. I then read some of DHH's books about Basecamp, and realized many of those arguments applied to my situation, given my values, and -- after a lot of reflection -- I changed my mind. I think Basecamp itself is mostly closed source, so they may have an advantage  in building a business. SageMathCloud (and SageMath) really are 100% open source, and building a completely open source business might be harder. Our open source IP is considered worthless by investors. Witness: RethinkDB just shut down and Stripe hired just the engineers -- all the IP, customers, etc., of RethinkDB was evidently considered worthless by investors.

The day after the A16Z meeting, I met with my board, which went well (we discussed a huge range of topics over several hours). Some of the board members also tried hard to convince me that I should raise a lot more investor money.

## Will Poole: you're doomed

Two weeks ago I met with Will Poole, who is a friend of a friend, and we talked about my company and plans. I described what I was doing, that everything was open source, that I was incentivizing the company around building a business rather than raising investor money. He listened and asked a lot of follow up questions, making it very clear he understands building a company very, very well.

His feedback was discouraging -- I said "So, you're saying that I'm basically doomed." He responded that I wasn't doomed, but might be able to run a small "lifestyle business" at best via my approach, but there was absolutely no way that what I was doing would have any impact or pay for my kids college tuition. If this was feedback from some random person, it might not have been so disturbing, but Will Poole joined Microsoft in 1996, where he went on to run Microsoft's multibillion dollar Windows business. Will Poole is like a retired four-star general that executed a successful campaign to conquer the world; he been around the block a few times. He tried pretty hard to convince me to make as much of SageMathCloud closed source as possible, and to try to convince my users to make content they create in SMC something that I can reuse however I want. I felt pretty shaken and convinced that I needed to close parts of SMC, e.g., the new Kubernetes-based backend that we spent all summer implementing. (Will: if you read this, though our discussion was really disturbing to me, I really appreciate it and respect you.)

My friend, who introduced me to Will Poole, introduced me to some other people and described me as that really frustrating sort of entrepreneur who doesn't want investor money. He then remarked that one of the things he learned in business school, which really surprised him, was that it is good for a company to have a lot of debt. I gave him a funny look, and he added "of course, I've never run a company".

I left that meeting with Will convinced that I would close source parts of SageMathCloud, to make things much more defensible. However, after thinking things through for several days, and talking this over with other people involved in the company, I have chosen not to close anything. This just makes our job harder. Way harder. But I'm not going to make any decisions based purely on fear. I don't care what anybody says, I do not think it is impossible to build an open source business (I think Wordpress is an example), and I do not need to raise VC.

Hacker News Discussion: https://news.ycombinator.com/item?id=12663599

Chinese version: http://www.infoq.com/cn/news/2016/10/Reflection-sustainable-profit-co

# !!! SUBSCRIPTIONS ARE CLOSED

The LoOPS network, DevLog and OpenDreamKit are organising a day for the various tools available in the Jupyter environment. Notebooks are more and more used among research communities thanks to their ease of use and their interactivity. They allow an easy access to class documentation and appealing practical exercises for students, to share ideas between colleagues and to initiate a reflexion to allow reproducible research works.

• Where: Room 1-2-3 of the Institut d’Astrophysique Spatiale, Orsay, France
• When: 6th of December 2016
• Who: four core developers of Jupyter tools will be present (S. Corlay, A, Darian, T. Kluyver, B. Ragan-Kelley) and V. Pons who is working on SageMathCloud.
• Event organisation: Loïc Gouarin

Subscriptions are free but mandatory. Most of talks and workshops will be given in English. You may need to bring your own training materials, in which case we will warn you in advance.

## Agenda:

• 9h15-9h45: Welcome
• 9h45-12h45 Presentations
• A. Darian et S. Corlay : JupyterLab and third-party extensions, featuring ipywidgets: the next generation of Jupyter notebooks.
• B. Ragan-Kelley : JupyterHub: Deploying Jupyter Notebooks for students and researchers.
• V Fauske : nbdime: diffing and merging notebooks.
• V. Pons : the SageMathCloud platform
• 12h45-14h00 Buffet
• 14h00-14h30 T. Kluyver: Nbconvert: make things from notebooks
• 14h30-17h30 Workshops run in parallel

## Presentations:

1) A. Darian et S. Corlay : JupyterLab and third-party extensions, featuring ipywidgets: the next generation of Jupyter notebooks

This talk will consist of an architectural overview and the current state of affairs of the new JupyterLab and ipywidgets. It will feature demos of the master branch of these projects, reflecting the latest developments

2) B. Ragan-Kelley : JupyterHub: Deploying Jupyter Notebooks for students and researchers

Since the Jupyter notebook is a web-based environment, the notebook server can be run remotely, not just on your local machine. JupyterHub is a multi-user server, aimed at helping research groups and instructors host notebook servers for their users or students. By default, JupyterHub uses the local system users and PAM authentication, but it can be customized to use any authentication system, including GitHub, CILogon, Shibboleth, and more. The way single-user servers are spawned can also be customized to use services such as Docker, Kubernetes, or HPC cluster queuing systems. The tutorial will cover a basic deployment of JupyterHub on a single machine, then extending it to use docker and GitHub authentication, as well as general best practices for JupyterHub deployment.

3) V. Fauske : nbdime: diffing and merging notebooks

Jupyter notebooks are JSON documents containing a combination of code, prose, and output. These outputs may be rich media, such as HTML or images. The use of JSON and including output can present challenges when working with version control systems and code review. The JSON structure significantly impedes the readability of diffs, and simple line-based merge tools can produce invalid results. nbdime aims to provide diff and merge tools specifically for notebooks. For diffs, nbdime shows rendered diffs of notebooks, so that the content can be compared efficiently, rather than the raw JSON. Merges performed with nbdime will guarantee a valid notebook as a result, even in the event of conflicts. nbdime integrates with existing tools, such as git, so you shouldn’t need to change how you work.

4) V. Pons : the SageMathCloud platform

We will present the open-source interactive platform SageMathCloud and its many useful aspects for research collaboration and teaching:

• creation of a collaborative project;
• sharing files and worksheets;
• using Jupyter in SageMathCloud;
• multi-user real time editing;
• course management with Jupyter and SageMathCloud

5) T. Kluyver : Nbconvert: make things from notebooks

Nbconvert is a set of tools to convert notebooks to other file types, such as HTML, Latex, or executable scripts. We’ll cover how to use it at the command line and in the notebook interface, along with an overview of how it works. Nbconvert is also designed to be highly extensible, and we’ll describe some of the things that can be done by building on nbconvert, such as extra converters, reports based on input, and cross-linking between converted notebooks.

## November 02, 2016

### OpenDreamKit

#### OOMMF Python interface presentation

Hans Fangohr presented the first prototype of the Python OOMMF interface at the 61st international meeting on magnetism and magnetic materials in New Orleans (US).

Pdf slides of Talk