May 02, 2018

OpenDreamKit

Free Computational Mathematics

This is an announcement for a large one week dissemination conference organized by OpenDreamKit at the CIRM premises near Marseille, France. With OpenDreamKit approaching its end (august 2019), this will our main public closing event.

Webpage on the CIRM website for registrations

May 02, 2018 11:01 PM

Jupyter receives the ACM Software System Award!

Project Jupyter has been awarded the 2017 ACM Software System Award, joining an illustrious list of projects that contains major highlights of computing history, including Unix, TeX, S (R’s predecessor), the Web, Mosaic, Java, INGRES (modern databases) and more. We are delighted of this strong recognition of a major piece of the ecosystem OpenDreamKit is based upon and contributing to.

Congratulations to the Jupyter team, including our fellows Min Ragan-Kelley and Thomas Kluyver!

May 02, 2018 12:00 AM

April 24, 2018

Harald Schilly

SageMath GSoC 2018 Projects

We're happy to announce our list of Google Summer of Code projects for 2018! Thank you to everyone involved, all students, mentors and of course, Google! 


Sai Harsh Tondomker (David Coudert and Dima Pasechnik):

Addition of SPQR-tree to graph module

In this project, our goal is to extend the graph theory library in Sage by implementing functionality to find triconnected subgraphs, which will partition an undirected graph into 3-connected components and the construction of the corresponding SPQR-tree. These modules can later be used as pre-processing for several graph problems such as Hamiltonian Cycle, traveling salesman problem.



Meghana M Reddy (David Coudert and Dima Pasechnik):

Addition of SPQR-trees to the graph module of Sage Math

The aim of the project is to code the linear time algorithm for partitioning a graph into 3-connected components and constructing the corresponding SPQR-tree of the graph. Further, this algorithm can be used as a subroutine for several other graph problems such as recognition of chordless graphs, hamiltonian cycle etc.



Filip Ion (Johan Rosenkilde):

Databases and bounds of codes

The following proposal detail some possible improvements of the coding theory component of SageMath. We aim to build databases of precomputed bounds and optimal examples of linear codes for each choice of parameters below a maximum range.



Raghukul Raman (Travis Scrimshaw and Benjamin Hutz):

Rational Point on Varieties

This project aims at implementing basic algorithms for finding rational points on varieties. Classically algebraic variety is defined as the set of solutions of polynomial equations over a number field. A rational point of an algebraic variety is a solution of set of equations in the given field (rational field if not mentioned). Much of the number theory can be viewed as the study of rational point of algebraic varieties. Some of the great achievements of number theory amount to determining the rational points on particular curves. For example, Fermat’s last theorem is equivalent to the statement that for an integer n ≥ 3, the only rational point of the curve xn+yn=zn in P2 over Qare the obvious ones. Common variants of these question include determining the set of all points of V(K) of height up to some bound. The aim of this project is to implement some basic rational point finding algorithms (sieving modulo prime and enumeration) and extend these to product_projective_space scheme.



Dario Asprone (Dima Pasechnik)

Checking graph isomorphism using the Weisfeiler-Lehman algorithm

Currently SageMath checks for graph isomorphism through an internal package with a corresponding method, called isomorphic and contained in sage.groups.perm_gps.partn_ref.refinement_graphs. This method, in addition to being quite convoluted and without much documentation about its inner workings, was last updated in a significant manner in 2011.
The project aims at creating a new package which implements an efficient in practice heuristic method to check if two graphs are isomorphic, using one of the members of the Weisfeiler-Lehman (WL) algorithm hierarchy. An attempt was made in the past at the same task (but with a narrower scope, limited to only implementing the second order version of the WL algorithm), but the code was found to contain a bug and was never implemented/removed from the codebase (see Trac #10482).
To the best of my knowledge, this would be the first working open-source implementation of k-WL, for k > 1.


by Harald Schilly ([email protected]) at April 24, 2018 03:36 PM

April 18, 2018

OpenDreamKit

Spring school in Experimental Mathematics (MathExp2018)

OpenDreamKit is organizing a spring school on experimental mathematics (“Mathématiques Expérimentales”). The first week will focus on learning tools (4 courses given by expert in computer science and mathematics as well as programming exercises). During the second week each participant is asked to work on a programming project related to his or her research.

April 18, 2018 12:00 AM

March 19, 2018

March 15, 2018

OpenDreamKit

Toward versatile JupyterHub deployments, with the Binder and JupyterHub convergence

About this document

Nowadays, many institutions run a JupyterHub server, providing their members with easy access to Jupyter-based virtual environments (a.k.a. notebook servers), preinstalled with a stack of computational software, tailored to the typical needs of the institution’s members. Meanwhile, since a few years ago, Binder lets any user on the internet define, run, and share temporary virtual environments equipped with an arbitrary software stack (examples).

In Fall 2017, Binder was revamped as BinderHub, a lightweight layer on top of JupyterHub. The next step in this convergence is to bring together the best of both worlds: think persistent authenticated Binder; or repo2docker enabled JupyterHub. For now, let’s call them versatile JupyterHub deployments.

This document brainstorms this convergence process: it sets up the ground with a scenario and assumptions for a typical institution-wide JupyterHub deployment, proposes specifications from a user perspective, and describes some typical use cases that would be enabled by such specifications. It further discusses security aspects and what remains to be implemented, before concluding with more advanced features and open questions.

This document started as a collection of private notes reflecting on in-development JupyterHub deployment at Paris-Saclay and EGI respectively, with some additional contributions. They were largely informed by many discussions at March 2018’s JupyterHub coding sprint in Orsay that involved dev-ops of those deployments and two of the main JupyterHub and BinderHub devs: Min Ragan Kelley and Chris Holdgraf. It was also inspired by some of cocalc features. Silly ideas reflecting here are mine, hard work is theirs; thank you all!!!

This document is meant for brainstorming; please hop in and edit.

Typical scenario

An institution – typically a university, a national lab, a transnational research infrastructure such as the European XFEL, or transational infrastructure provider like EGI – wishes to provide its members and users with a Jupyter service.

The service lets user spawn and access personal or collaborative virtual environments: namely a web interface to a light weight virtual machine, in which they can use Jupyter notebooks, run calculations, etc. In the remainder of this document we will use JupyterHub’s terminology and call such virtual environments notebook servers.

To cater for a large variety of use cases in teaching and research, the main aim of the upcoming specifications is to make the service as versatile as possible. In particular, it should empower the users to customize the service (available software stack, storage setup, …), without a need for administrator intervention.

Assumptions

The institution has access to:

  • An authentication service (Single Sign-On)

    Examples: Paris-Sud’ Adonis internal SSO, the federated “Recherche et Enseignement Supérieur” authentication service of Renater, EGI CheckIn, …

  • Computing resources

    Examples: a local cluster, access to a externalized cloud (GC, AWS, Azure, …)

  • A shared volume service using the above authentication service

    E.g. a local NextCloud service, or …

  • (Optional) a forge

    Examples: a local gitlab service, github, … if private repositories are needed, the forge presumably will need the same authentication service

Specifications / User Story

Main page of the service

After authentication, the user faces a page that is similar to binder’s main page:

  • A form to describe and launch the desired persistent notebook server.

    For the sake of simplicity, the form could optionally start hidden, and be replaced by two items: “Choose preconfigured notebook server” / “Create my own notebook server”.

  • Links to documentation

  • Warnings about potential security concerns, to inform the user choices.

    Alternatively, such warnings could be displayed in a later security confirmation dialog “with the given configuration, a malicious image/collaborator could …; do you trust it? Proceed/Edit Configuration/Cancel/Don’t ask again”

  • Institutional credits (service provided by …)

The form consists of:

  • The usual binder items:

    • the description of the computing environment: a repo-to-docker-style git repo+branch+…

    • the file/url to open on startup

    • a UI to get a URL/badge referencing the machine

  • Persistence and access options:

    • server_name: name to give to the server

      If server_name is not specified, create a random server name?

    • mount options: [[mount point, volume url], […]]

      This assumes that the user has appropriate credentials to access the given volumes through the authentication service

    • collaborators=[….]: (optional) a white list of other users of this jupyterhub that can access this server

    • a flag allowing public ‘temporary read-only’ access (meaning that the container and all changes are thrown away at the end of the session; and that any ‘mounted’ data sources are read-only during the session)

      alternatively

    • credentials: whether to pass the user credentials into the container (as environment variable, or file)

    • resources scaling (optional): memory, number of processors, duration (time or time of inactivity after which the server will be automatically stopped / destroyed)

Behavior upon clicking Launch:

  • If a notebook server with the given name already exists and the parameters are not changed (or not set): connect to that server, restarting it if needed

  • If the parameters have been changed, update the existing server when possible? Or should the user just delete the server?

  • Otherwise, create and launch

Behavior upon following a server URL/badge:

  • Display the authentication page (if not yet authenticated)

  • Display a security confirmation dialog as above (if origin is not within the jupyterhub), with a description of the configuration and origin.

  • As above after clicking “Launch”

Some use cases

Local binder (better name? [[email protected]?])

Scenarios:

  • Luc, a researcher, discovered a nice computing environment on Binder. Maybe a notebook demonstrating an interesting workflow to analyze data. He wants to use it more intensively on his own data.

  • Lucy has found a notebook & binder environment published with a paper, and she wants to re-execute the notebook to reproduce the published results and start her research in the field. However, no binder (compute) resources are available in the cloud. The computation takes 20 minutes on a standard PC and she would like to run this calculation on her local server.

Setup:

They recreate the same environment on their local server (for example by just changing the server name in the binder URL).

More advanced scenario to explore: Lucy would like to use her Desktop PC because that resource is readily available and idles 99% of the time.

Easy sharing of computational environments

Scenarios:

  • Sarah, a power user, is using some specialized stack of software on a daily basis; maybe she authored some of it. She wants her colleagues in her lab to try out that software.

  • Paul organizes a training session for his colleagues.

  • Alice has authored notebooks that she wants to share with her colleagues. Maybe the notebooks automatize some complicated processes and present them in the form of interactive web applications built using Jupyter widgets (demo). Think of a wizard to setup parameters and data, run a computation, and visualize the results.

Setup:

They prepare a notebook server with the appropriate software stack installed configured and access to the user’s shared home directory. Maybe they provide some document. They then just have to share the URL with their colleagues. No lengthy software installation. The colleagues can then start working right away, in their own environment, using their own data, saving their work in their home directory.

In all cases, the explicit description of the computing environment (and the use of open source software!) eases:

  • the publication of the same computational environment / notebooks elsewhere, e.g. on a public Binder;
  • the installation the same software on the user’s personal computer.

Collaboration

Scenario: Alice and Bob want to collaborate on some data analysis.

Setup:

They create a shared volume. Then either:

  • They set up each their own notebook server, and let them share the same volume.

  • Alice sets up a single server, with Bob as collaborator. Within the server, they are considered as the same user.

At this stage, they should not edit the same notebook simultaneously. However the stable version of JupyterLab, due sometime in 2018, should enable real-time collaboration in both setups, thanks to a CRDT file format for notebooks.

Class management

Scenario: using the server for a class’ computer labs and assignments

Desired features:

  • Full customizability of the computing environment by the teacher;
  • Support for live manipulation of the class notes;
  • Support for submission, collection and auto-grading of assignments;
  • Access from the computer labs or outside (home, …);
  • Possibility to either use the server, needing only a web browser (no software installation required; supports phones, tablets, …), or install and run the software locally.

Prerequisites:

  • A JupyterHub instance, configured as above, accessible from the teachers and students;
  • A forge such as gitlab or github, accessible from JupyterHub
  • A shared drive service (e.g. next cloud/nsf/…), serving home directories, and letting teachers setup shared volumes
  • A shared authentication (e.g. SSO), so that notebook servers in JupyterHub can access the shared drive.
  • Some web server

Procedure for the teacher(s):

  • Set up a shared volume for the whole class
  • Prepare a computing environment in a git repository on the forge.

    Typically includes: computational software, [nbgrader] + configuration, …

  • Prepare the course material typically in a git repository on the forge (the same one or another)
  • Use JupyterHub’s form UI to setup (and test) a full description of the student’s notebook servers, with mounting of the home directory (or subdirectory thereof?) and shared volume. Possibly add the teacher(s) as collaborator(s) ??? Get the corresponding URL.
  • Possibly prepare a variant thereof for teachers of the class.
  • Set up a web page for the class, with hyperlink(s) to the above URL.

    There can typically be an hyperlink for each session pointing directly to the exercises for that particular session.

Fetching the class material:

  • Option 1: manually download from the web (wget in the terminal, or web upload, …) or shared volume
  • Option 2: use nbgitpuller from the command line
  • Option 3: use nbgrader, either from the command line or with the UI to get the files from the shared volume
  • Option 4: automatize the above using a notebook server extension such as that for nbgitpuller

Submitting assignments:

  • Use nbgrader, either from the command line or with the UI to push the files to the shared volume

To explore: integration with local e-learning platforms like Moodle, typically using LTI, in particular for class management and grades reporting. There already exists an LTI authenticator for Jupyter.

Security concerns

A malicious image description, image, or collaborator can:

  • Take any action within the image being built or within the notebook server

  • Waste computing resources (cpu/memory);

  • With internet: connect to any website, without specific priviledges (e.g. participate to a DOS attack); abuse computing resources, e.g. for bitcoining. The image building almost certainly needs internet access.

  • With persistent storage and internet access: access and leak information from the storage; e.g. private information from the user;

  • With read-write persistent storage: corrupt the storage (e.g. the user’s home directory)

  • With credentials: take any action on behalf of the user in any service that use the same authentication.

Implementation status

Most of the features are already there. Here are some of the missing steps:

  • Extending binder’s form as described above;

  • Implementing the logic to mount shared volumes;

  • Instructions / scripts for setting up a local docker registry;

    The current Binder installation tutorial assumes that a docker registry is already available; e.g. that provided by google cloud services.

    For a smaller setup using the same host for both building images and running notebook servers, no docker registry is needed. In this case, JupyterHub could just run repo2docker locally before launching the notebook server. repo2docker however does not implement image caching; so a simplified version of the image cache mechanism of Binder needs to be implemented.

Alternatives

It should be noted that there is basically no coupling between JupyterHub/Binder and Jupyter. The former is merely a general purpose web service for provisioning web-based virtual environments. For example, JupyterHub/Binder has also been used to serve R-Studio based virtual environments. Reciprocally, there are alternative services to do such provisioning from which to get inspiration, like Simulagora.

Advanced features & open questions

Redirection

The main form could contain an additional item:

  • URL input / dropdown menu to choose another jupyterhub instances to redirect to.

Use cases:

  • A user finds a nice image on binder; he wants to customize it to run it on his institution’s jupyterhub; possibly adding persistent storage to it. Or reciprocally: a user wants to share on binder a local image of his.

  • An institution wants to advertise other jupyterhub instances; this could e.g. be used by a single entry point for federating a collection of instances (e.g. all French academic JupyterHub’s).

Marketplace of images

With the URL mechanism, any (power) user can prepare a dedicated image and share it with his collaborators. Images can be more or less specific: from just a computing environment to a fully specified machine, with mount points, …

Thanks to the above there is no need for a tightly coupled Marketplace. Nevertheless it may be useful to have one location (or more) for collecting and publicizing links to popular images. Some minimal coupling may be needed if one would want to sort the images according to their popularity.

Note: at this stage, a user cannot produce an image by setting up a machine “by hand” and save its state. The construction must be fully scripted. On the plus side, this encourages users to script their images, making them more reproducible.

National and international initiatives such as the European Open Science Cloud may help providing such a catalog of relevant Jupyter notebooks/images.

Default volume configuration

  • Choose good defaults, if at all possible compatible with binder. Main question: where should the files provided by the binder environment be copied? In a subdirectory of the persistent home? In the home directory, with the persistent home being mounted in a subdirectory thereof?

Intensive use and resource management / accounting

The above has been written with casual use in mind. For extensive use, some form of accounting and controlling of the resources used would be needed. For example, for LAL’s cloud we may want to have some form of bridge between the OpenStack dashboard and the hub. UI to be designed. Could the user provision a machine using the dashboard, and then specify on JupyterHub that the container shall be run on that machine?

March 15, 2018 12:00 AM

March 07, 2018

OpenDreamKit

OpenDreamKit at the RSE conference

Open Dream Kit presence at the second RSE conference

The Manchester Museum of Science and Industry (MOSI) saw the second Research Software Engineering (RSE) conference on September 7-8th, 2017. Over 200 attendees gathered to discuss ways to better support and advance science via software, innovative computational tools, and policy making. There were more than 40 talks and 15 workshops covering a diverse range of topics, from community building to imposter syndrome, data visualization, and High-Performance Computing.

Attending the conference is an excellent opportunity to integrate within the international RSE community and appreciate how much this has grown over the last few years. All thanks to the great work done by RSEs within their institutions and their efforts to make software a valuable research asset. It will be, for certain, interesting to see how this will continue to grow and evolve as policy changes take place and more research councils, funding bodies, and research institutions acknowledge the importance of research software and its scientific impact.

OpenDreamKit member, Tania Allard, ran a hands-on workshop on Jupyter notebooks for reproducible research. This workshop focused on the use of Jupyter notebooks as a means to disseminate reproducible analysis workflows and how this can be leveraged using tools such as nbdime and nbval. Both nbdime and nbval were developed by members of the OpenDreamKit project as a response to the growing popularity of the Jupyter notebooks and the lack of native integration between these technologies and existing version control and validation/testing tools.

An exceptional win was that this workshop was, in fact, one of the most popular events of the conference and we were asked to run it twice as it was massively oversubscribed. This reflects, on one hand, the popularity of Jupyter notebooks due to the boom of literate programming and its focus on human-readable code. Allowing researchers to share their findings and the code they used along the way in a compelling narrative. On the other hand, it demonstrates the importance of reproducible science and the need for tools that help RSE and researchers to achieve this goal, which aligns perfectly with the goals of OpenDreamKit.

The workshop revolved around 3 main topics:

  1. Version control of the Jupyter notebooks
  2. Notebooks validation
  3. The basics of reproducible software practices.

The main focus was on how tools like nbdime and nbval can support people already using Jupyter notebooks but have struggled to integrate these with software best development practices due to a number of limitations on the existing tools. Then, we followed on other actions that can be taken to ensure that their data analysis workflows were reproducible and sustainable. This lead to a number of interesting discussions about the topic and allowed for the attendees to share their previous experiences regarding reproducibility and/or the lack thereof in different research areas.

We plan to run a set or workshops around reproducibility over the duration of the ODK project and we’ll make sure to report on them here too. Finally, all the materials are licensed under CC-BY and can be found in this GitHub repository .

March 07, 2018 12:00 AM

February 21, 2018

OpenDreamKit

Research Software Engineer position opening at Université Paris-Sud (tentativelly filled)

This is an announcement for a research software engineer position opening at Université Paris-Sud, working on web-based user interfaces and semantic interoperability layers for mathematical computational systems and databases.

Time line

Interviews in March 2018 for a recruitment as soon as possible.

Update (March 27th): after the interviews on March 21st, we selected and ranked two candidates and made an offer to the first one. Pending administrative hoops, they will take the position.

Salary

For a full-time position, and depending on the applicant’s past experience, between 2000€ and 3000€ of monthly “salaire net” (salary after non-wage labour cost but before income tax). Equivalently, what this salary represents for is a “salaire brut” of up to 46200€ yearly. We have secured funding until the end of the project (August 2019).

Location

The research software engineer will work at the Laboratoire de Recherche en Informatique of Université Paris Sud, in the Orsay-Bures-Gif-Saclay campus, 25 km South-West of Paris city centre.

Mission and activities

Paris Sud is the leading site of OpenDreamKit, with eight participants involved in all the work packages. The research software engineer will join that team and support its efforts in WP4 and WP6, targeting respectively Jupyter-based user interfaces and interoperability for mathematical computational systems and databases. A common theme is how to best exploit the mathematical knowledge embedded in the systems. For some context, see e.g. the recent publications describing the Math-In-The-Middle approach.

More specifically, a successful candidate will be expected to contribute significantly to some of the following tasks (see also OpenDreamKit’s Proposal:

  • Dynamic documentation and exploration system (Task 4.5)

    Introspection has become a critical tool in interactive computation, allowing user to explore, on the fly, the properties and capabilities of the objects under manipulation. This challenge becomes particularly acute in systems like Sage where large parts of the class hierarchy is built dynamically, and static documentation builders like Sphinx cannot anymore render all the available information.

    In this task, we will investigate how to further enhance the user experience. This will include:

    • On the fly generation of Javadoc style documentation, through introspection, allowing e.g. the exploration of the class hierarchy, available methods, etc.

    • Widgets based on the HTML5 and web component standards to display graphical views of the results of SPARQL queries, as well as populating data structures with the results of such queries,

    • D4.16: Exploratory support for semantic-aware interactive Jupyter widgets providing views on objects of the underlying computational or database components. Preliminary steps are demonstrated in the Larch Environment project (see demo videos) and sage-explorer. The ultimate aim would be to automatically generate LMFDB-style interfaces.

    Whenever possible, those features will be implemented generically for any computation kernel by extending the Jupyter protocol with introspection and documentation queries.

  • Memoisation and production of new data (Task 6.9)

    Many CAS users run large and intensive computations, for which they want to collect the results while simultaneously working on software improvements. GAP retains computed attribute values of objects within a session; Sage currently has a limited cached_method. Neither offers storage that is persistent across sessions or supports publication of the result or sharing within a collaboration. We will use, extend and contribute back to, an appropriate established persistent memoisation infrastructure, such as python-joblib, redis-simple-cache or dogpile.cache, adding features needed for storage and use of results in mathematical research. We will design something that is simple to deploy and configure, and makes it easy to share results in a controlled manner, but provides enough assurance to enable the user to rely on the data, give proper credit to the original computation and rerun the computation if they want to.

  • Knowledge-based code infrastructure (Task 6.5)

    Over the last decades, computational components, and in particular Axiom, MuPAD, GAP, or Sage, have embedded more and more mathematical knowledge directly inside the code, as a way to better structure it for expressiveness, flexibility, composability, documentation, and robustness. In this task we will review the various approaches taken in these software (e.g. categories and dynamic class hierarchies) and in proof assistants like Coq (e.g. static type systems), and compare their respective strength and weaknesses on concrete case studies. We will also explore whether paradigms offered by recent programming languages like Julia or Scala could enable a better implementation. Based on this we will suggest and experiment with design improvements, and explore challenges such as the compilation, verification, or interoperability of such code.

Skills and background requirements

  • Degree in mathematics or computer science; PhD appreciated but not required;

  • Strong programming experience with languages such as Python, Scala, Javascript, etc; experience with web technologies in general and the Jupyter stack in particular appreciated;

  • Experience in software design and practical implementation in large software projects; experience with computational mathematics software (e.g. SageMath) appreciated;

  • Experience in open-source development (collaborative development tools, interaction with the community, …);

  • Strong communication skills;

  • Fluency in oral and written English; speaking French is not a prerequisite.

Context

The position will be funded by OpenDreamKit, a Horizon 2020 European Research Infrastructure project that will run for four years, starting from September

  1. This project brings together the open-source computational mathematics ecosystem – and in particular LinBox, MPIR, SageMath, GAP, PARI/GP, LMFDB, Singular, MathHub, and the IPython/Jupyter interactive computing environment. – toward building a flexible toolkit for Virtual Research Environments for mathematics. Lead by Université Paris-Sud, this project involves about 50 people spread over 15 sites in Europe, with a total budget of about 7.6 million euros.

Within this ecosystem, the applicant will work primarily on the free open-source mathematics software system Sagemath. Based on the Python language and many existing open-source math libraries, SageMath is developed since 10 years by a worldwide community of 300 researchers, teachers and engineers, and has reached 1.5M lines of code.

The applicant will work within one of the largest teams of SageMath developers, composed essentially of researchers in mathematics and computer science, at the Laboratoire de Recherche en Informatique (LRI) and in nearby institutions. The LRI also hosts a strong team working on proof systems.

Applications

To apply for this position, please send an e-mail to upsud-recruitement-research-engineer at opendreamkit.org by March 10, with the following documents (in English) attached:

  • cover_letter.pdf: a cover letter explaining your interest in this particular position;

  • CV.pdf: a CV, highlighting among other things your skills and background and your contributions to open source software;

  • degree.pdf: copy of your most recent degree including (if applicable) the reviewers reports;

  • reference letters: files reference_letter_.pdf or contact information of potential referees.

Applications sent after March 10 will be considered until the position is filled.

February 21, 2018 12:00 AM

February 19, 2018

The Matroid Union

Google Summer of Code

As you might know, SageMath is a software system for mathematical computation. Built on Python, it has extensive libraries for numerous areas of mathematics. One of these areas is Matroid Theory, as has been exhibited several times on this blog.

Google Summer of Code is a program where Google pays students to work on open-source software during the summer.

Once again, SageMath has been selected as a mentoring organization for the Google Summer of Code. We’ve had students work on aspects of the Matroid Theory functionality for the past four years. Maybe this year, YOU can join those illustrious ranks! Check out the call for proposals and ideas list. Read the instructions on both pages carefully. Applications open on March 12, so it’s a good idea to start talking to potential mentors and begin writing your proposal!

by Stefan van Zwam at February 19, 2018 02:52 PM

February 06, 2018

OpenDreamKit

Remote project meeting

This is an online project meeting to review all achievements since March 2017.

February 06, 2018 12:00 AM

January 23, 2018

OpenDreamKit

Expérimentation mathématique et combinatoire avec Sage

Viviane Pons gave a two hours lecture on mathematical experimentation, research and open-source mathematical software development for a seminar organized by the students of the prestigious school Ecole Normale supérieure de Lyon.

January 23, 2018 12:00 AM

January 01, 2018

William Stein

Low latency local CoCalc and SageMath on the Google Pixelbook: playing with Crouton, Gallium OS, Rkt, Docker

I just got CoCalc fully working locally on my Google Pixelbook Chromebook! I want this, since (1) I was inspired by a recent blog post about computer latency, and (2) I'm going to be traveling a lot next week (the JMM in San Diego -- come see me at the Sage booth), and may have times without Internet during which I want to work on CoCalc's development.


I first tried Termux, which is a "Linux in Android" userland that runs on the Pixelbook (via Android), but there were way, way too many problems for CoCalc, which is a very complicated application, so this was out. The only option was to enable ChromeOS dev mode.

I next considered partitioning the hard drive, installing Linux natively (in addition to ChromeOS), and dual booting. However, it seems the canonical option is Gallium OS and it nobody has got that to work with Pixelbook yet (?). In fact, it appears that Gallium OS development made have stopped a year ago (?). Bummer. So I gave up on that approach...

The next option was to try Crouton + Docker, since we have a CoCalc Docker image. Unfortunately, it seems currently impossible to use Docker with the standard ChromeOS kernel.  The next thing I considered was to use Crouton + Rkt, since there are blog posts claiming Rkt can run vanilla Docker containers on Crouton.

I setup Crouton, installed the cli-extra chroot, and easily installed Rkt. I learned how Rkt is different than Docker, and tried a bunch of simple standard Docker containers, which worked. However, when I tried running the (huge) CoCalc Docker container, I hit major performance issues, and things broke down. If I had the 16GB Chromebook and more patience, maybe this would have worked. But with only 8GB RAM, it really wasn't feasible.

The next idea was to just use Crouton Linux directly (so no containers), and fix whatever issues arose. I did this, and it worked well, with CoCalc giving me a very nice local browser-based interface to my Crouton environment. Also, since we've spent so much time optimizing CoCalc to be fast over the web, it feels REALLY fast when used locally. I made some changes to the CoCalc sources and added a directory, to hopefully make this easier if anybody else tries. This is definitely not a 1-click solution.

Finally, for SageMath I first tried the Ubuntu PPA, but realized it is hopelessly out of date. I then downloaded and extracted the Ubuntu 16.04 binary and it worked fine. Of course, I'm also building Sage from source (I'm the founder of SageMath after all), but that takes a long time...


Anyway, Crouton works really, really well on the Pixelbook, especially if you do not need to run Docker containers.

by William Stein ([email protected]) at January 01, 2018 10:29 PM

December 06, 2017

OpenDreamKit

State of the European Open Science Cloud and strategy of OpenDreamKit (draft)

During the last week of November two events related to the European Open Science Cloud (EOSC) took place in Brussels: the EOSC stakeholder forum on 28-29 November and the 2017 edition of the DI4R (Digital Infrastructures for Research). These two events were closely related in regard with the momentum period 2018-2020 for the Scientific Community and the advancement of open and digital science.

The vision

The European Open Science Cloud (EOSC) is currently a process to build a digital platform that is inspired by the F.A.I.R. principle. F.A.I.R. stands for: Findable, Accessible, Interoperable and Reusable.

FAIR is the idea between the recent actions taken by the European Commission and various funding agencies in favour of open data and open science. The EOSC platform, also called the EOSC-Hub, intends in fine to make possible to the whole European scientific Community and beyond: exchange of data, easy access to knowledge and access to all useful infrastructures for all scientific disciplines.

The word “Open” in EOSC must be understood in every possible way. It means (not exhaustively) that the platform will :

  • promote open source software and open access culture
  • make data as open as possible but as closed as necessary in regard with confidentiality and IPR (Intellectual Property Rights) concerns,
  • be a mix of paid and free services and/or software
  • be open to all users: research centres, public bodies, companies, citizens
  • be accessible worldwide
  • gather all infrastructures and technologies that are of help for scientists from all possible disciplines

Technically speaking, the general idea is to federate existing and yet to appear services, and to make these services interoperable with one another. Services and software can concern for example data and computation hosting, authentication, indexing, collaborative tools, data and service catalogues, etc. Thanks to the interoperability the researchers will be able to discover, navigate, use and re-use, data and combine them suing a various number of infrastructures. The interpretation of data is (or at last should be in our reading) quite broad here: it includes metadata and provenance, data models, tools and generally knowledge required to make sense of the data.

EOSC photo

The core role of EOSC-Hub is to coordinate, foster interoperability, develop glue services, and generally speaking steer the efforts toward the needs of researchers. The outcome of the process is still unknown but the ambition is grand. Indeed, an EU official compared the EOSC to the internet. The WWW has been and still is a process, and it has changed the face of the Earth. The ambition of the EOSC is to change the way we do Science. An explanatory video is available here.

The EOSC will eventually be linked to the future pan-European HPC, as well as with the future European Data Infrastructure (EDI) which are funded alongsie the EOSC by the European Commission. The EDI and the EOSC may at the end of the day merge, since differences of purpose for each is not yet fully clear.

EOSC sketch

Political support and public funding

The EOSC is part of two strategies of the European Commission: the Digital Single Market on one hand and the European Research Area on the other hand.

The EOSC is the combination of the two, as it aims at creating a single research community without country or technical barriers (interoperability between software and services).

In its Work Programme 2018-2020 for European Research Infrastructures, the European Commission has put 375 million Euros on the table for “implementing the European Open Science Cloud”. It was initially planned to open a topic of 79M€ for adding other services and infrastructures to the EOSC, but it was postponed after 2020 for the 9th Framework Programme for research and Innovation. According to Augusto Burgueño Arjon, Head of the Unit “eInfrastructure & Science Cloud” at the DG Communication and Networks at the European Commission, it is not yet decided what this fund will be used for as it will depend on the evolution of the EOSC-Hub. AS according to the Commission and main supporters there is no room for failure, the next 3 years will be crucial. If EOSC implementation is a success, the first half of FP9 will focus on aggregating remaining infrastructures and services while the second half of FP9 will focus on sustaining and scaling the EOSC.

Member States are pushing forward the EOSC as well. Germany and the Netherlands issued in May 2017 a joint position paper on the EOSC, which France officially joined on the 01/12/2017. Furthermore 13 Member States have of to date signed an agreement to start a pan-European HPC programme. This new infrastructure will follow Prace and will be closely connected to EOSC.

Infrastructures involved

Outside of public institutions there are 6 major players pushing forward the EOSC:

Together they built a consortium of 74 partners coordinated by EGI, in order to answer the call H2020-EINFRA-12-2017(a) within the Work Programme 2018-2020. This project named “EOSC-Hub project” will receive 30M€ by the Commission for the 2018-2020 period. Beneficiaries include Research Infrastructures, national e-Infrastructure providers, SMEs and academic institutions. This consortium will be focused on and addressing issues of the EOSC such as interoperability, adoption of open standards and protocols, governance structure etc

Indeed, our impression is that - while the vision is clear - the design and realisation of this vision has no clear shape yet: many ideas and components are floating around, some fully or half realised but simultaneously many questions remain unanswered or have not even been asked yet. This state of affairs is not surprising: the vision is grand and has the potential to disrupt (positively) the way in which research is carried out today; the actors are human beings, scientists, institutes, funding bodies and states all with their own priorities and constraints. Furthermore, there are real technical challenges in putting this “Cloud” together, and there are also cultural challenges to move more and more research activities towards Open Science. Existing metrics for academics and research institutions do not generally incentivise open science; which makes change of behaviour difficult. The biggest challenge for the EOSC is then maybe the challenge of skills and habit because as it was heard from an EU official: “if we do all this and no one is using it, it will be worthless”.

Some infrastructures and services were presented during the DI4R conference as “EOSC building blocks”. The presentations are all available following this link.

Two presentations attracted our attention:

  • Hubzero presentation: open source platform for scientific and educational collaboration
  • Presentation introducing to FAIR: nothing new but explains principles for EOSC

Contributions are open on Github to help develop FAIR metrics for EOSC.

OpenDreamKit and the EOSC

It seems that no e-infrastructure or service presented at the events specifically targets math-based research and teaching, so there could be some room for components of the OpendreamKit VRE.

Partnership with EGI

Indeed there is an existing collaboration between EGI and OpenDreamKit for the deployment of JupyterHub in EGI services. This collaboration will become official in the next weeks with the signature of a Memorandum of Understanding between the two parties. Depending on the success of this joint work and of the need of the EOSC post-2020, the collaboration could be extended.

Lobby

OpenDreamKit as a consortium promoting Open Source software and Open Data in the name of large communities can take lobbying actions to have an impact on the shape of the future EOSC. The following actions can and will be taken:

  • Become a stakeholder of the EOSC:

1) Endorse the principles of the EOSC declaration by sending an official statement by mail

2) Commit to take some of the specific actions forward

Endorsement and commitment must be sent at [email protected].

  • Contact experts implied in the EOSC:
    • From External Board of EOSC working for the Commission: several experts including Jean-François ABRAMATIC (INRIA) and the chairperson John Womersley (European Spallation Source)

    • From the External Advisory Board of the EOSC-pilot project (launched in preparation of EOSC-Hub project): the list is available on their website. One expert, Françoise GENOVA, is also member of the OpenDreamKit Advisory Board.

  • The European Commission Project Officer for OpenDreamKit is also deeply implied in the talks for the EOSC implementation, and the Coordinator will be in close relation with her in the next months and years.

Reference: Hans fangohr’s blogpost on his group page

December 06, 2017 12:00 AM

November 15, 2017

OpenDreamKit

Subgroups and lattices of Lie groups

OpenDreamKit is hosting a workshop on “Subgroups and lattices of Lie groups” to take place at the Faber residency in Olot, Spain from monday 19th of February to Saturday 3rd of March 2018. The aim is to bring together experts in the geometry, algebra and combinatorics together with software developers in order to improve algorithms and functionalities of open source packages concerning Lie groups and their subgroups.

The organization page of the event is https://wiki.sagemath.org/days93.

November 15, 2017 12:00 AM

November 02, 2017

OpenDreamKit

Publishing reproducible logbooks

Scenario

Jane has written a (math) paper based on experimentations. She would like anyone to be able to reproduce her calculations.

a binder logbook screenshot

Suggestion of solution

  1. Describe the experimentation as Jupyter notebooks, mixing prose, code, and outputs (think of them as logbooks);

  2. Publish them on a public repository (e.g. on GitHub);

  3. Make that repository Binder-ready by describing the software stack required; for details, see the Binder documentation;

  4. Bonus: make the paper itself active

    To do: explore using e.g. latexml+thebe?.

Some instances

To do

  • Estimate the number of such instances;
  • Provide a template.

Time and expertise required

Assuming Jane is familiar with version control and Jupyter (basic lab skills taught at Software Carpentry, that the experiments were prepared as notebooks, and the software required is packaged (conda, debian, docker container, …), the publishing part could take two hours the first time, and half an hour later on.

What’s new since OpenDreamKit started

  • Apparition of Binder;
  • Expansion of the Jupyter technology;
  • Better packaging and interfacing of math software.

OpenDreamKit contribution

  • Development and contributions to Jupyter interfaces (kernels) for math software (GAP, Pari/GP, SageMath, Singular) and C++; see D.47.
  • Contributions to the packaging of math software (GAP, Pari/GP, SageMath, Singular, …); see D3.1 and D3.10;
  • Early adoption of Binder;
  • Contributions to the deployment of new Binder instances;
  • Advertising, training, providing a template (TODO!), …

November 02, 2017 12:00 AM

Post doc position opening at Université Paris-Sud

This is an announcement for a postdoc position opening at Université Paris-Sud, working on the interplay between Data, Knowledge, and Software in Mathematics, and in particular the exploitation of mathematical knowledge for increased interoperability across computational mathematics software and mathematical databases.

Time line

Interviews in early december, for a recruitment from early 2018 to Fall 2019. Since we have a strong candidate for a half time position, we will also consider candidates interested in a half-time or shorter duration position.

Salary

For a full-time month work and depending on the applicant’s past experience, between 2000€ and 3000€ of monthly “salaire net” (salary after non-wage abour cost but before income tax).

Equivalently, what this salary represents for is a “salaire brut” of up to 46200€ yearly (for a full-time position).

Location

The postdoc will work at the Laboratoire de Recherche en Informatique of Université Paris Sud, in the Orsay-Bures-Gif-Saclay campus, 25 km South-West of Paris city centre.

Mission and activities

OpenDreamKit’s Work Package 6 explores the interplay between Data, Knowledge and Software in Mathematics. In particular, it aims at exploiting mathematical knowledge for increased interoperability across computational mathematics software and mathematical databases (known as the Math-In-The-Middle approach). See e.g. the recent publications on that topic, and Section 3.1.6 ``Workpackage Description’’ of the OpenDreamKit Proposal.

A successful candidate will be expected to do significant progress, in close collaboration with the other OpenDreamKit participants and the community, on some of the tasks of this Work Package:

  • D6.8: Currated Math-in-the-Middle Ontology and Alignments for GAP / Sage / LMFDB

  • T6.5: Knowledge-based code infrastructure

    Over the last decades, computational components, and in particular Axiom, MuPAD, \GAP, or \Sage, have embedded more and more mathematical knowledge directly inside the code, as a way to better structure it for expressiveness, flexibility, composability, documentation, and robustness. In this task we will review the various approaches taken in these software (e.g. categories and dynamic class hierarchies) and in proof assistants like Coq (e.g. static type systems), and compare their respective strength and weaknesses on concrete case studies. We will also explore whether paradigms offered by recent programming languages like Julia or Scala could enable a better implementation. Based on this we will suggest and experiment with design improvements, and explore challenges such as the compilation, verification, or interoperability of such code.

The candidate will be welcome to work on closely related though more technical tasks:

  • T4.5: Dynamic documentation and exploration system

    Introspection has become a critical tool in interactive computation, allowing user to explore, on the fly, the properties and capabilities of the objects under manipulation. This challenge becomes particularly acute in systems like Sage where large parts of the class hierarchy is built dynamically, and static documentation builders like Sphinx cannot anymore render all the available information.

    In this task, we will investigate how to further enhance the user experience. This will include:

    • On the fly generation of Javadoc style documentation, through introspection, allowing e.g. the exploration of the class hierarchy, available methods, etc.

    • Widgets based on the HTML5 and web component standards to display graphical views of the results of SPARQL queries, as well as populating data structures with the results of such queries,

    • D4.16: Exploratory support for semantic-aware interactive widgets providing views on objects of the underlying computational or database components. Preliminary steps are demonstrated in the Larch Environment project (see demo videos) and sage-explorer. The ultimate aim would be to automatically generate LMFDB-style interfaces.

    Whenever possible, those features will be implemented generically for any computation kernel by extending the Jupyter protocol with introspection and documentation queries.

  • T6.9: Memoisation and production of new data

    Many CAS users run large and intensive computations, for which they want to collect the results while simultaneously working on software improvements. GAP retains computed attribute values of objects within a session; Sage currently has a limited cached_method. Neither offers storage that is persistent across sessions or supports publication of the result or sharing within a collaboration. We will use, extend and contribute back to, an appropriate established persistent memoisation infrastructure, such as python-joblib, redis-simple-cache or dogpile.cache, adding features needed for storage and use of results in mathematical research. We will design something that is simple to deploy and configure, and makes it easy to share results in a controlled manner, but provides enough assurance to enable the user to rely on the data, give proper credit to the original computation and rerun the computation if they want to.

Skills and background requirements

  • Strong experience in the design and practical implementation of mathematics software: computational mathematics software (e.g. SageMath), knowledge management systems, or proof systems;

  • PhD in mathematics or computer science;

  • Experience in open-source development (collaborative development tools, interaction with the community, …);

  • Fluency in programming languages such as Scala, Python, Julia, etc appreciated;

  • Strong communication skills;

  • Fluency in oral and written English; speaking French is not a prerequisite.

Context

The position will be funded by

OpenDreamKit, a Horizon 2020 European Research Infrastructure project that will run for four years, starting from September

  1. This project brings together the open-source computational mathematics ecosystem – and in particular LinBox, MPIR, SageMath, GAP, PARI/GP, LMFDB, Singular, MathHub, and the IPython/Jupyter interactive computing environment. – toward building a flexible toolkit for Virtual Research Environments for mathematics. Lead by Université Paris-Sud, this project involves about 50 people spread over 15 sites in Europe, with a total budget of about 7.6 million euros.

Within this ecosystem, the developer will work primarily on the free open-source mathematics software system Sagemath. Based on the Python language and many existing open-source math libraries, SageMath is developed since 10 years by a worldwide community of 300 researchers, teachers and engineers, and has reached 1.5M lines of code.

The developer will work within one of the largest teams of SageMath developers, composed essentially of researchers in mathematics and computer science, at the Laboratoire de Recherche en Informatique (LRI) and in nearby institutions. The LRI also hosts a strong team working on proof systems.

Applications

To apply for this position, please send an e-mail to Nicolas.Thiery at u-psud.fr before December 1st, with the following documents attached:

  • cover_letter.pdf: a cover letter, in English (why are you interested in this particular position);

  • CV.pdf: a CV, highlighting among other things your skills and background and your contributions to open source software;

  • phd_reports.pdf: PhD reports (when applicable);

  • reference letters (each named reference_letter_.pdf), or alternatively reference contact information.

Applications sent after December 1st will be considered until the position is filled.

November 02, 2017 12:00 AM

October 19, 2017

OpenDreamKit

Presentation of OpenDreamKit

Viviane Pons presented the OpenDreamKit project and its impact for teaching to the Netmath community

October 19, 2017 12:00 AM

October 15, 2017

OpenDreamKit

WP6 Math-in-the-Middle Integration Use Case to be Published at MACIS-2017 (two papers)

OpenDreamKit WP6 (Data/Knowledge/Software-Bases) has reported on the first use cases in two papers to be publised at MACIS 2017.

October 15, 2017 12:00 AM

October 11, 2017

OpenDreamKit

Release: SageMath for Windows

Introduction

One of the main tasks for OpenDreamKit (T.31]) is improving portability of mathematical software across hardware platforms and operating systems.

One particular such challenge, which has dogged the SageMath project practically since its inception, is getting a fully working port of Sage on Windows (and by extension this would mean working Windows versions of all the CAS’s and other software Sage depends on, such as GAP, Singular, etc.)

This is particularly challenging, not so much because of the Sage Python library (which has some, but relatively little system-specific code). Rather, the challenge is in porting all of Sage’s 150+ standard dependencies, and ensuring that they integrate well on Windows, with a passing test suite.

Although UNIX-like systems are popular among open source software developers and some academics, the desktop and laptop market share of Windows computers is estimated to be more than 75% and is an important source of potential users, especially students.

However, for most of its existence, the only way to “install” Sage on Windows was to run a Linux virtual machine that came pre-installed with Sage, which is made available on Sage’s downloads page. This is clumsy and onerous for users–it forces them to work within an unfamiliar OS, and it can be difficult and confusing to connect files and directories in their host OS to files and directories inside the VM, and likewise for web-based applications like the notebook. Because of this Windows users can feel like second-class citizens in the Sage ecosystem, and this may turn them away from Sage.

Attempts at Windows support almost as old as Sage itself (initial Sage release in 2005). Microsoft offered funding to work on Windows version as far back as 2007 but was far too little for the amount of effort needed.

Additional work done was done off and on through 2012, and partial support was possible at times. This included admirable work to try to support building with the native Windows development toolchain (e.g. MSVC). There was even at one time an earlier version of a Sage installer for Windows, but long since abandoned.

However, Sage development (and more importantly Sage’s dependencies) continued to advance faster than there were resources for the work on Windows support to keep up, and work mostly stalled after 2013. OpenDreamKit has provided a unique opportunity to fund the kind of sustained effort needed for Sage’s Windows support to catch up.

Sage for Windows overview

As of SageMath version 8.0, Sage will be available for 64-bit versions of Windows 7 and up. It can be downloaded through the SageMath website, and up-to-date installation instructions are being developed at the SageMath wiki. A 32-bit version had been planned as well, but is on hold due to technical limitations that will be discussed later.

The installer contains all software and documentation making up the standard Sage distribution, all libraries needed for Cygwin support, a bash shell, numerous standard UNIX command-line utilities, and the Mintty terminal emulator, which is generally more user-friendly and better suited for Cygwin software than the standard Windows console.

It is distributed in the form of a single-file executable installer, with a familiar install wizard interface (built with the venerable InnoSetup. The installer comes in at just under a gigabyte, but unpacks to more than 4.5 GB in version 8.0.

Sage for Windows Installer

Sage for Windows Installer

Because of the large number of files comprising the complete SageMath distribution, and the heavy compression of the installer, installation can take a fair amount of time even on a recent system. On my Intel i7 laptop it takes about ten minutes, but results will vary. Fortunately, this has not yet been a source of complaints–beta testers have been content to run the installer in the background while doing other work–on a modern multi-core machine the installer itself does not use overly many resources.

If you don’t like it, there’s also a standard uninstall:

Sage for Windows Uninstaller

The installer includes three desktop and/or start menu shortcuts:

Sage for Windows start menu shortcuts

The shortcut titled just “SageMath 8.0” launches the standard Sage command prompt in a text-based console. In general it integrates well enough with the Windows shell to launch files with the default viewer for those file types. For example, plots are saved to files and displayed automatically with the default image viewer registered on the computer.

Sage for Windows console

(Because Mintty supports SIXEL mode graphics, it may also be possible to embed plots and equations directly in the console, but this has not been made to work yet with Sage.)

“SageMath Shell” runs a bash shell with the environment set up to run software in the Sage distribution. More advanced users, or users who wish to directly use other software included in the Sage distribution (e.g. GAP, Singular) without going through the Sage interface. Finally, “SageMath Notebook” starts a Jupyter Notebook server with Sage configured as the default kernel and, where possible, opens the Notebook interface in the user’s browser.

In principle this could also be used as a development environment for doing development of Sage and/or Sage extensions on Windows, but the current installer is geared primarily just for users.

Rationale for Cygwin and possible alternatives

There are a few possible routes to supporting Sage on Windows, of which Cygwin is just one. For example, before restarting work on the Cygwin port I experimented with a solution that would run Sage on Windows using Docker. I built an installer for Sage that would install Docker for Windows if it was not already installed, install and configure a pre-build Sage image for Docker, and install some desktop shortcuts that attempted to launch Sage in Docker as transparently as possible to the user. That is, it would ensure that Docker was running, that a container for the Sage image was running, and then would redirect I/O to the Docker container.

This approach “worked”, but was still fairly clumsy and error-prone. In order to make the experience as transparent as possible a fair amount of automation of Docker was needed. This could get particularly tricky in cases where the user also uses Docker directly, and accidentally interferes with the Sage Docker installation. Handling issues like file system and network port mapping, while possible, was even more complicated. What’s worse, running Linux images in Docker for Windows still requires virtualization. On older versions this meant running VirtualBox in the background, while newer versions require the Hyper-V hypervisor (which is not available on all versions of Windows–particularly “Home” versions). Furthermore, this requires hardware-assisted virtualization (HAV) to be enabled in the user’s BIOS. This typically does not come enabled by default on home PCs, and users must manually enable it in their BIOS menu. We did not consider this a reasonable step to ask of users merely to “install Sage”.

Another approach, which was looked at in the early efforts to port Sage to Windows, would be to get Sage and all its dependencies building with the standard Microsoft toolchain (MSVC, etc.). This would mean both porting the code to work natively on Windows, using the MSVC runtime, as well as developing build systems compatible with MSVC. There was a time when, remarkably, many of Sage’s dependencies did meet these requirements. But since then the number of dependencies has grown too much, and Sage itself become too dependent on the GNU toolchain, that this would be an almost impossible undertaking.

A middle ground between MSVC and Cygwin would be to build Sage using the MinGW toolchain, which is a port of GNU build tools (including binutils, gcc, make, autoconf, etc.) as well as some other common UNIX tools like the bash shell to Windows. Unlike Cygwin, MinGW does not provide emulation of POSIX or Linux system APIs–it just provides a Windows-native port of the development tools. Many of Sage’s dependencies would still need to be updated in order to work natively on Windows, but at the very least their build systems would require relatively little updating–not much more than is required for Cygwin. This would actually be my preferred approach, and with enough time and resources it could probably work. However, it would still require a significant amount of work to port some of Sage’s more non-trivial dependencies, such as GAP and Singular, to work on Windows without some POSIX emulation.

So Cygwin is the path of least resistance. Although bugs and shortcomings in Cygwin itself occasionally require some effort to work around (as a developer–users should not have to think about it), for the most part it just works with software written for UNIX-like systems. It also has the advantage of providing a full UNIX-like shell experience, so shell scripts and scripts that use UNIX shell tools will work even on Windows. However, since it works directly on the native filesystem, there is less opportunity for confusion regarding where files and folders are saved. In fact, Cygwin supports both Windows-style paths (starting with C:\\) and UNIX-style paths (in this case starting with C:/).

Finally, a note on the Windows Subsystem for Linux (WSL), which debuted shortly after I began my Cygwin porting efforts, as I often get asked about this: “Why not ‘just’ use the ‘bash for Windows’?” The WSL is a new effort by Microsoft to allow running executables built for Linux directly on Windows, with full support from the Windows kernel for emulation of Linux system calls (including ones like fork()). Basically, it aims to provide all the functionality of Cygwin, but with full support from the kernel, and the ability to run Linux binaries directly, without having to recompile them. This is great of course. So the question is asked if Sage can run in this environment, and experiments suggest that it works pretty well (although the WSL is still under active development and has room for improvement).

I wrote more about the WSL in a blog post last year, which also addresses why we can’t “just” use it for Sage for Windows. But in short: 1) The WSL is currently only intended as a developer tool: There’s no way to package Windows software for end users such that it uses the WSL transparently. And 2) It’s only available on recent updates of Windows 10–it will never be available on older Windows versions. So to reach the most users, and provide the most hassle-free user experience, the WSL is not currently a solution. However, it may still prove useful for developers as a way to do Sage development on Windows. And in the future it may be the easiest way to install UNIX-based software on Windows as well, especially if Microsoft ever expands its scope.

Development challenges

The main challenge with porting Sage to Windows/Cygwin has relatively little to do with the Sage library itself, which is written almost entirely in Python/Cython and involves relatively few system interfaces (a notable exception to this is the advanced signal handling provided by Cysignals, but this has been found to work almost flawlessly on Cygwin thanks to the Cygwin developers’ heroic efforts in emulating POSIX signal handling on Windows). Rather, most of the effort has gone into build and portability issues with Sage’s more than 150 dependencies.

The majority of issues have been build-related issues. Runtime issues are less common, as many of Sage’s dependencies are primarily mathematical, numerical code–mostly CPU-bound algorithms that have little use of platform-specific APIs. Another reason is that, although there are some anomalous cases, Cygwin’s emulation of POSIX (and some Linux) interfaces is good enough that most existing code just works as-is. However, because applications built in Cygwin are native Windows applications and DLLs, there are Windows-specific subtleties that come up when building some non-trivial software. So most of the challenge has been getting all of Sage’s dependencies building cleanly on Cygwin, and then maintaining that support (as the maintainers of most of these dependencies are not themselves testing against Cygwin regularly).

In fact, maintenance was the most difficult aspect of the Cygwin port (and this is one of the main reasons past efforts failed–without a sustained effort it was not possible to keep up with the pace of Sage development). I had a snapshot of Sage that was fully working on Cygwin, with all tests passing, as soon as the end of summer in 2016. That is, I started with one version of Sage and added to it all the fixes needed for that version to work. However, by the time that work was done, there were many new developments to Sage that I had to redo my work on top of, and there were many new issues to fix. This cycle repeated itself a number of times.

Continuous integration

The critical component that was missing for creating a sustainable Cygwin port of Sage was a patchbot for Cygwin. The Sage developers maintain a (volunteer) army of patchbots–computers running a number of different OS and hardware platforms that perform continuous integration testing of all proposed software changes to Sage. The patchbots are able, ideally, to catch changes that break Sage–possibly only on specific platforms–before they are merged into the main development branch. Without a patchbot testing changes on Cygwin, there was no way to stop changes from being merged that broke Cygwin. With some effort I managed to get a Windows VM with Cygwin running reliably on UPSud’s OpenStack infrastructure, that could run a Cygwin patchbot for Sage. By continuing to monitor this patchbot the Sage community can now receive prior warning if/when a change will break the Cygwin port. I expect this will impact only a small number of changes–in particular those that update one of Sage’s dependencies.

In so doing we are, indirectly, providing continuous integration on Cygwin for Sage’s many dependencies–something most of those projects do not have the resources to do on their own. So this should be considered a service to the open source software community at large. (I am also planning to piggyback on the work I did for Sage to provide a Cygwin buildbot for Python–this will be important moving forward as the official Python source tree has been broken on Cygwin for some time, but is one of the most critical dependencies for Sage).

Runtime bugs

All that said, a few of the runtime bugs that come up are non-trivial as well. One particular source of bugs is subtle synchronization issues in multi-process code, that arise primarily due to the large overhead of creating, destroying, and signalling processes on Cygwin, as compared to most UNIXes. Other problems arise in areas of behavior that are not specified by the POSIX standard, and assumptions are made that might hold on, say, Linux, but that do not hold on Cygwin (but that are still POSIX-compliant!) For example, a difference in (undocumented, in both cases) memory management between Linux and Cygwin made for a particularly challenging bug in PARI. Another interesting bug came up in a test that invoked a stack overflow bug in Python, which only came up on Cygwin due to the smaller default stack size of programs compiled for Windows. There are also occasional bugs due to small differences in numerical results, due to the different implementation of the standard C math routines on Cygwin, versus GNU libc. So one should not come away with the impression that porting software as complex as Sage and its dependencies to Cygwin is completely trivial, nor that similar bugs might not arise in the future.

Challenges with 32-bit Windows/Cygwin

The original work of porting Sage to Cygwin focused on the 32-bit version of Cygwin. In fact, at the time that was the only version of Cygwin–the first release of the 64-bit version of Cygwin was not until 2013. When I picked up work on this again I focused on 64-bit Cygwin–most software developers today are working primarily on 64-bit systems, and so from many projects I’ve worked on the past my experience has been that they have been more stable on 64-bit systems. I figured this would likely be true for Sage and its dependencies as well.

In fact, after getting Sage working on 64-bit Cygwin, when it came time to test on 32-bit Cygwin I hit some significant snags. Without going into too many technical details, the main problem is that 32-bit Windows applications have a user address space limited to just 2 GB (or 3 GB with a special boot flag). This is in fact not enough to fit all of Sage into memory at once. The good news is that for most cases one would never try to use all of Sage at once–this is only an issue if one tries to load every library in both Sage, and all its dependencies, into the same address space. In practical use this is rare, though this limit can be hit while running the Sage test suite.

With some care, such as reserving address space for the most likely to be used (especially simultaneously) libraries in Sage, we can work around this problem for the average user. But the result may still not be 100% stable.

It becomes a valid question whether it’s worth the effort. There are unfortunately few publicly available statistics on the current market share of 64-bit versus 32-bit Windows versions among desktop users. Very few new desktops and laptops sold anymore to the consumer market include 32-bit OSes, but it is still not too uncommon to find on some older, lower-end laptops. In particular, some laptops sold not too long ago with Windows 7 were 32-bit. According to Net Market Share, as of writing Windows 7 still makes up nearly 50% of all desktop operating system installments. This still does not tell us about 32-bit versus 64-bit. The popular (12.5 million concurrent users) Steam PC gaming platform publishes the results of their usage statistics survey, which as of writing shows barely over 5% of users with 32-bit versions of Windows. However, computer gamers are not likely to be representative of the overall market, being more likely to upgrade their software and hardware.

So until some specific demand for a 32-bit version of SageMath for Windows is heard, we will not likely invest more effort into it.

Conclusion and future work

Focusing on Cygwin for porting Sage to Windows was definitely the right way to go. It took me only a few months in the summer of 2016 to get the vast majority of the work done. The rest was just a question of keeping up with changes to Sage and fixing more bugs (this required enough constant effort that it’s no wonder nobody managed to quite do it before). Now, however, enough issues have been addressed that the Windows version has remained fairly stable, even in the face of ongoing updates to Sage.

Porting more of Sage’s dependencies to build with MinGW and without Cygwin might still be a worthwhile effort, as Cygwin adds some overhead in a few areas, but if we had started with that it would have been too much effort.

In the near future, however, the priority needs to be improvements to user experience of the Windows Installer. In particular, a better solution is needed for installing Sage’s optional packages on Windows (preferably without needing to compile them). And an improved experience for using Sage in the Jupyter Notebook, such that the Notebook server can run in the background as a Windows Service, would be nice. This feature would not be specific to Sage either, and could benefit all users of the Jupyter Notebook on Windows.

Finally, I need to better document the process of doing Sage development on Cygwin, including the typical kinds of problems that arise. I also need to better document how to set up and maintain the Cygwin patchbot, and how to build releases of the Sage on Windows installer so that its maintenance does not fall solely on my shoulders.

October 11, 2017 12:00 AM

September 22, 2017

William Stein

DataDog's pricing: don't make the same mistake I made

(I wrote a 1-year followup post here.)

I stupidly made a mistake recently by choosing to use DataDog for monitoring the infrastructure for my startup (SageMathCloud).

I got bit by their pricing UI design that looks similar to many other sites, but is different in a way that caused me to spend far more money than I expected.

I'm writing this post so that you won't make the same mistake I did.  As a product, DataDog is of course a lot of hard work to create, and they can try to charge whatever they want. However, my problem is that what they are going to charge was confusing and misleading to me.

I wanted to see some nice web-based data about my new autoscaled Kubernetes cluster, so I looked around at options. DataDog looked like a new and awesomely-priced service for seeing live logging. And when I looked (not carefully enough) at the pricing, it looked like only $15/month to monitor a bunch of machines. I'm naive about the cost of cloud monitoring -- I've been using Stackdriver on Google cloud platform for years, which is completely free (for now, though that will change), and I've also used self hosted open solutions, and some quite nice solutions I've written myself. So my expectations were way out of whack.

Ever busy, I signed up for the "$15/month plan":


One of the people on my team spent a little time and installed datadog on all the VM's in our cluster, and also made DataDog automatically start running on any nodes in our Kubernetes cluster. That's a lot of machines.

Today I got the first monthly bill, which is for the month that just happened. The cost was $639.19 USD charged to my credit card. I was really confused for a while, wondering if I had bought a year subscription.



After a while I realized that the cost is per host! When I looked at the pricing page the first time, I had just saw in big letters "$15", and "$18 month-to-month" and "up to 500 hosts". I completely missed the "Per Host" line, because I was so naive that I didn't think the price could possibly be that high.

I tried immediately to delete my credit card and cancel my plan, but the "Remove Card" button is greyed out, and it says you can "modify your subscription by contacting us at [email protected]":



So I wrote to [email protected]:

Dear Datadog,

Everybody on my team was completely mislead by your
horrible pricing description.

Please cancel the subscription for wstein immediately
and remove my credit card from your system.

This is the first time I've wasted this much money
by being misled by a website in my life.

I'm also very unhappy that I can't delete my credit
card or cancel my subscription via your website. It's
like one more stripe API call to remove the credit card
(I know -- I implemented this same feature for my site).


And they responded:

Thanks for reaching out. If you'd like to cancel your
Datadog subscription, you're able to do so by going into
the platform under 'Plan and Usage' and choose the option
downgrade to 'Lite', that will insure your credit card
will not be charged in the future. Please be sure to
reduce your host count down to the (5) allowed under
the 'Lite' plan - those are the maximum allowed for
the free plan.

Also, please note you'll be charged for the hosts
monitored through this month. Please take a look at
our billing FAQ.


They were right -- I was able to uninstall the daemons, downgrade to Lite, remove my card, etc. all through the website without manual intervention.

When people have been confused with billing for my site, I have apologized, immediately refunded their money, and opened a ticket to make the UI clearer.  DataDog didn't do any of that.

I wish DataDog would at least clearly state that when you use their service you are potentially on the hook for an arbitrarily large charge for any month. Yes, if they had made that clear, they wouldn't have had me as a customer, so they are not incentivized to do so.

A fool and their money are soon parted. I hope this post reduces the chances you'll be a fool like me.  If you chose to use DataDog, and their monitoring tools are very impressive, I hope you'll be aware of the cost.


ADDED:

On Hacker News somebody asked: "How could their pricing page be clearer? It says per host in fairly large letters underneath it. I'm asking because I will be designing a similar page soon (that's also billed per host) and I'd like to avoid the same mistakes."  My answer:

[EDIT: This pricing page by the top poster in this thread is way better than I suggest below -- https://www.serverdensity.com/pricing/]

1. VERY clearly state that when you sign up for the service, then you are on the hook for up to $18*500 = $9000 + tax in charges for any month. Even Google compute engine (and Amazon) don't create such a trap, and have a clear explicit quota increase process.
2. Instead of "HUGE $15" newline "(small light) per host", put "HUGE $18 per host" all on the same line. It would easily fit. I don't even know how the $15/host datadog discount could ever really work, given that the number of hosts might constantly change and there is no prepayment.
3. Inform users clearly in the UI at any time how much they are going to owe for that month (so far), rather than surprising them at the end. Again, Google Cloud Platform has a very clear running total in their billing section, and any time you create a new VM it gives the exact amount that VM will cost per month.
4. If one works with a team, 3 is especially important. The reason that I had monitors on 50+ machines is that another person working on the project, who never looked at pricing or anything, just thought -- he I'll just set this up everywhere. He had no idea there was a per-machine fee.

by William Stein ([email protected]) at September 22, 2017 01:03 PM

DataDog: Don't make the same mistake I did -- a followup and thoughts about very unhappy customers

This is a followup to my previous blog post about DataDog billing.

TL;DR:
- I don't recommend DataDog,
- dealing with unhappy customers is hard,
- monitoring for data science nerds?

Hacker News Comments

DataDog at Google Cloud Summit

I was recently at the Seattle Google Cloud Summit and DataDog was well represented, with the biggest booth and top vendor billing during the keynote. Clearly they are doing something right. I had a past unpleasant experience with them, and I had just been auditing my records and discovered that last year DataDog had actually charged me a lot more than I thought, so was kind of annoyed. Nonetheless, they kept coming up and talking to me, server monitoring is life-and-death important to me, and their actual software is very impressive in some ways.

Nick Parisi started talking with me about DataDog. He sincerely wanted to know about my past experience with monitoring and the DataDog product, which he was clearly very enthuisiastic about. So I told him a bit, and he encouraged me to tell him more, explaining that he did not work at DataDog last year, and that he would like to know what happened. So he gave me his email, and he was very genuinely concerned and helpful, so I sent him an email with a link to my post, etc. I didn't receive a response, so a week later I asked why, and received a followup email from Jay Robichau, who is DataDog's Director of Sales.

Conference Call with DataDog

Jay setup a conference call with me today at 10am (September 22, 2017). Before the call, I sent him a summary of my blog post, and also requested a refund, especially for the suprise bill they sent me nearly 6 weeks after my post.

During the call, Jay explained that he was "protecting" Nick from me, and that I would mostly talk with Michelle Danis who is in charge of customer success. My expectation for the call is that we would find some common ground, and that they would at least appreciate the chance to make things right and talk with an unhappy customer. I was also curious about how a successful startup company addresses the concerns of an unhappy customer (me).

I expected the conversation to be difficult but go well, with me writing a post singing the praises of the charming DataDog sales and customer success people. A few weeks ago CoCalc.com (my employer) had a very unhappy customer who got (rightfully) angry over a miscommunication, told us he would no longer use our product, and would definitely not recommend it to anybody else. I wrote to him wanting to at least continue the discussion and help, but he was completely gone. I would do absolutely anything I could to ensure he is a satisfied, if only he would give me the chance. Also, there was a recent blog post from somebody unhappy with using CoCalc/Sage for graphics, and I reached out to them as best I could to at least clarify things...

In any case, here's what DataDog charged us as a result of us running their daemon on a few dozen containers in our Kubernetes cluster (a contractor who is not a native English speaker actually setup these monitors for us):

07/22/2016  449215JWJH87S8N4  DATADOG 866-329-4466 NY  $639.19
08/29/2016 2449215L2JH87V8WZ DATADOG 866-329-4466 NY $927.22

I was shocked by the 07/22 bill which spured my post, and discovered the 8/29 one only later. We canceled our subscription on July 22 (cancelling was difficult in itself).

Michelle started the conference call by explaining that the 08/29 bill was for charges incurred before 07/22, and that their billing system has over a month lag (and it does even today, unlike Google Cloud Platform, say). Then Michelle explained at length many of the changes that DataDog has made to their software to address the issues I (and others?) have pointed out with their pricing description and billing. She was very interested in whether I would be a DataDog customer in the future, and when I said no, she explained that they would not refund my money since the bill was not a mistake.

I asked if they now provide a periodic summary of the upcoming bill, as Google cloud platform (say) does. She said that today they don't, though they are working on it. They do now provide a summary of usage so far in an admin page.

Finally, I explained in no uncertain terms that I felt misled by their pricing. I expected that they would understand, and pointed out that they had just described to me many ways in which they were addressing this very problem. Very surprisingly, Michelle's response was that she absolutely would not agree that there was any problem with their pricing description a year ago, and they definitely would not refund my money. She kept bringing up the terms of service. I agreed that I didn't think legally they were in the wrong, given what she had explained, just that -- as they had just pointed out -- their pricing and billing was unclear in various ways. They would not agree at all.

I can't recommend doing business with DataDog. I had very much hoped to write the opposite in this updated post. Unfortunately, their pricing and terms are still confusing today compared to competitors, and they are unforgiving of mistakes. This dog bites.    

(Disclaimer: I took notes during the call, but most of the above is from memory, and I probably misheard or misunderstood something. I invite comments from DataDog to set the record straight.)

Also, for what it is worth, I definitely do recommend Google Cloud Platform.  They put in the effort to do many things right regarding clear billing.

How do Startups Deal with Unhappy Customers?

I am very curious about how other startups deal with unhappy customers. At CoCalc we have had very few "major incidents" yet... but I want to be as prepared as possible. At the Google Cloud Summit, I went to some amazing "war storries by SRE's" session in which they talked about situations they had been in years ago in which their decisions meant the difference between whether they would have a company or job tomorrow or not. These guys clearly have amazing instincts for when a problem was do-or-die serious and when it wasn't. And their deep preparation "in depth" was WHY they were on that stage, and a big reason why older companies like Google are still around. Having a strategy for addressing very angry customers is surely just as important.


Google SRE's: these guys are serious.

I mentioned my DataDog story to a long-time Google employee there (16 years!) and he said he had recently been involved in a similar situation with Google's Stackdriver monitoring, where the bill to a customer was $85K in a month just for Stackdriver. I asked what Google did, and he said they refunded the money, then worked with the customer to better use their tools.

There is of course no way to please all of the people all of the time. However, I genuinely feel that I was ripped off and misled by DataDog, but I have the impression that Jay and Michelle honestly view me as some jerk trying to rip them off for $1500.   And they probably hate me for telling you about my experiences.

So far, with CoCalc we charge customers in advance for any service we provide, so less people are surprised by a bill.  Sometimes there are problems with recurring subscriptions when a person is charged for the upcoming month of a subscription, and don't want to continue (e.g., because their course is over), we always fully refund the charge. What does your company do? Why? I do worry that our billing model means that we miss out on potential revenue.

We all know what successful huge consumer companies like Amazon and Wal-Mart do.

Monitoring for Data Science Nerds?

I wonder if there is interest in a service like DataDog, but targeted at Data Science Nerds, built on CoCalc, which provides hosted collaborative Jupyter notebooks with Pandas, R, etc., pre-installed. This talk at PrometheusCon 2017 mostly discussed the friction that people face moving data from Prometheus to analyze using data science tools (e.g., R, Python, Jupyter). CoCalc provides a collaborative data science environment, so if we were to smooth over those points of friction, perhaps it could be useful to certain people. And much more efficient...

by William Stein ([email protected]) at September 22, 2017 12:32 PM

August 31, 2017

OpenDreamKit

Introduction to SageMath

Viviane Pons gave a Sage Introduction talk at the internation combinatorics conference Eurocomb 2017

Slides

August 31, 2017 12:00 AM

August 26, 2017

OpenDreamKit

Workshop on live structured documents

OpenDreamKit is hosting a workshop on live structured documents to take place at Simula Research Laboratory in Oslo, Norway from Monday 16. October to Friday 20. October.

The workshop is dedicated to various aspects of live documents, including:

  • Authoring language and tools (ReST, XML, latex, …),
  • Converters (Sphinx, pandoc, docbook, …),
  • Infrastructure for rendering documents,
  • Infrastructure for interacting with the computing backends (e.g. Thebe based on the Jupyter protocol, sympy live, …),
  • Online services providing computing backends (e.g. tmpnb, sagecell, binder, …),
  • Integrated collaborative authoring environments (sharelatex, SMC, texmacs, …)

Participants can register via eventbrite.

August 26, 2017 12:00 AM

July 29, 2017

OpenDreamKit

OpenDreamKit at Groups St Andrews in Birmingham

OpenDreamKit at Groups St Andrews in Birmingham

Groups St Andrews is a conference series with a conference every four years. This year’s Groups St Andrews will be in Birmingham, and I will attend, bring a poster, give a contributed talk about computing in permutation groups, and teach a course on GAP.

This post serves the main purpose of providing a page on the OpenDreamKit website that can hold all the links that will appear on the poster, and possible further information.

July 29, 2017 12:00 AM

June 09, 2017

OpenDreamKit

Sphinx documentation of Cython code using "binding=True"

One of the deliverables (D4.13) of the OpenDreamKit project is refactoring the documentation system of SageMath. The SageMath documentation is built using a heavily customized Sphinx. Many of the customizations are neccessary to support autodoc (automatically generated documentation from docstrings) for Cython files.

Thanks to some changes I made to Sphinx, autodoc for Cython now works provided that:

  1. You use Sphinx version 1.6 or later.

  2. The Cython code is compiled with the binding=True directive. See How to set directives in the Cython documentation.

  3. A small monkey-patch is applied to inspect.isfunction. You can put this in your Sphinx conf.py for example:

     def isfunction(obj):
         return hasattr(type(obj), "__code__")
    
     import inspect
     inspect.isfunction = isfunction
    

This was used successfully for the documentation of cysignals and fpylll. There is ongoing work to do the same for SageMath.

Implementation of functions in Python

To understand why items 2 and 3 on the above list are needed, we need to look at how Python implements functions. In Python, there are two kinds of functions (we really mean functions here, not methods or other callables):

  1. User-defined functions, defined with def or lambda:

     >>> def foo(): pass
     >>> type(foo)
     <class 'function'>
     >>> type(lambda x: x)
     <class 'function'>
    
  2. Built-in functions such as len, repr or isinstance:

     >>> type(len)
     <class 'builtin_function_or_method'>
    

In the CPython implementation, these are completely independent classes with different behaviours.

User-defined functions binding as methods

Just to give one example, built-in functions do not have a __get__ method, which means that they do not become methods when used in a class.

Let’s consider this class:

class X(object):
    def printme(self):
        return repr(self)

This is essentially equivalent to

class X(object):
    printme = (lambda self: repr(self))

>>> X().printme()
'<__main__.X object at 0x7fb342f960b8>'

However, directly putting the built-in function repr in the class does not work as expected:

class Y(object):
    printme = repr

>>> Y().printme()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: repr() takes exactly one argument (0 given)

This is simply something that built-in functions do not support.

User-defined vs. built-in functions

Here is a list of the main differences between user-defined and built-in functions:

  • User-defined functions are implemented in Python, built-in functions are implemented in C.

  • Only user-defined functions support __get__ and can become methods (see above).

  • Only user-defined functions support introspection such as inspect.getargspec() and inspect.getsourcefile().

  • CPython has specific optimizations for calling built-in functions.

  • The inspect module and profiling make a difference between the two kinds of functions.

Cython generates C code, so Cython functions must be built-in functions. This has unfortunate disadvantages, such as the lack of introspection support, which is particularly important for Sphinx.

The Cython function type: cyfunction

Luckily, the Cython developers came up with a solution: they invented a completely new function type (called cyfunction), which is implemented like built-in functions but which behaves as much as possible like user-defined functions.

By default, functions in Cython are built-in functions. With the directive binding=True, functions in Cython become cyfunctions. Since cyfunctions are not specifically optimized by CPython, this comes with a performance penalty. More precisely, calling cyfunctions from Python is slower than calling built-in functions from Python. The slowdown can be significant for simple functions. Within Cython, cyfunctions are as fast as built-in functions.

Since a cyfunction is not a built-in function nor a user-defined function (those two types are not subclassable), the inspect module (and hence Sphinx) does not recognize it as being a function. So, to have full inspect support for Cython functions, we need to change inspect.isfunction. After various attempts, I came up with hasattr(type(obj), "__code__") to test whether the object obj is a function (for introspection purposes). This will match user-defined functions and cyfunctions but not built-in functions, nor any other Python type that I know of.

The future: a PEP to change the function types?

I have some vague plans for a Python Enhancement Proposal (PEP) to change the implementation of the Python function types. The goal is that Cython functions can be implemented on top of some standard Python function type, with all features that cyfunctions currently have, the performance of built-in functions and introspection support of user-defined functions.

At this point, it is too early to say anything about the implementation of this hypothetical future Python function type. If anything happens, I will surely post an update.

June 09, 2017 12:00 AM

June 01, 2017

William Stein

RethinkDB must relicense NOW

What is RethinkDB?

UPDATE:  Several months after I wrote this post, RethinkDB was relicensed.  For the CoCalc project, it was too late, and by then we had already switched to PostgreSQL


RethinkDB is a INCREDIBLE high quality polished open source realtime database that is easy to deploy, shard, replicate, and supports a reactive client programming model, which is useful for collaborative web-based applications. Shockingly, the 7-year old company that created RethinkDB has just shutdown. I am the CEO of a company, SageMath, Inc., that uses RethinkDB very heavily, so I have a strong interest in RethinkDB surviving as an independent open source project.

Three Types of Open Source Projects

There are many types of open source projects. RethinkDB was the type of open source project where most work on RethinkDB has been fulltime focused work, done by employees of the RethinkDB company. RethinkDB is licensed under the AGPL, but the company promised to make the software available to customers under other licenses.

Academia: I started the SageMath open source math software project in 2005, which has over 500 contributors, and a relatively healthy volunteer ecosystem, with about hundred contributors to each release, and many releases each year. These are mostly volunteer contributions by academics: usually grad students, postdocs, and math professors. They contribute because SageMath is directly relevant to their research, and they often contribute state of the art code that implements algorithms they have created or refined as part of their research. Sage is licensed under the GPL, and that license has worked extremely well for us. Academics sometimes even get significant grants from the NSF or the EU to support Sage development.

Companies: I also started the Cython compiler project in 2007, which has had dozens of contributors and is now the defacto standard for writing or wrapping fast code for use by Python. The developers of Cython mostly work at companies (e.g., Google) as a side project in their spare time. (Here's a message today about a new release from a Cython developer, who works at Google.) Cython is licensed under the Apache License.

What RethinkDB Will Become

RethinkDB will no longer be an open source project whose development is sponsored by a single company dedicated to the project. Will it be an academic project, a company-supported project, or dead?

A friend of mine at Oxford University surveyed his academic CS colleagues about RethinkDB, and they said they had zero interest in it. Indeed, from an academic research point of view, I agree that there is nothing interesting about RethinkDB. I myself am a college professor, and understand these people! Academic volunteer open source contributors are definitely not going to come to RethinkDB's rescue. The value in RethinkDB is not in the innovative new algorithms or ideas, but in the high quality carefully debugged implementations of standard algorithms (largely the work of bad ass German programmer Daniel Mewes). The RethinkDB devs had to carefully tune each parameter in those algorithms based on extensive automated testing, user feedback, the Jepsen tests, etc.

That leaves companies. Whether or not you like or agree with this, many companies will not touch AGPL licensed code:
"Google open source guru Chris DiBona says that the web giant continues to ban the lightning-rod AGPL open source license within the company because doing so "saves engineering time" and because most AGPL projects are of no use to the company."
This is just the way it is -- it's psychology and culture, so deal with it. In contrast, companies very frequently embrace open source code that is licensed under the Apache or BSD licenses, and they keep such projects alive. The extremely popular PostgreSQL database is licensed under an almost-BSD license. MySQL is freely licensed under the GPL, but there are good reasons why people buy a commercial MySQL license (from Oracle) for MySQL. Like RethinkDB, MongoDB is AGPL licensed, but they are happy to sell a different license to companies.

With RethinkDB today, the only option is AGPL. This very strongly discourage use by the only possible group of users and developers that have any chance to keep RethinkDB from death. If this situation is not resolved as soon as possible, I am extremely afraid that it never will be resolved. Ever. If you care about RethinkDB, you should be afraid too. Ignoring the landscape and culture of volunteer open source projects is dangerous.

A Proposal

I don't know who can make the decision to relicense RethinkDB. I don't kow what is going on with investors or who is in control. I am an outsider. Here is a proposal that might provide a way out today:

PROPOSAL: Dear RethinkDB, sell me an Apache (or BSD) license to the RethinkDB source code. Make this the last thing your company sells before it shuts down. Just do it.


Hacker News Discussion

by William Stein ([email protected]) at June 01, 2017 01:25 PM