December 12, 2018

Sébastien Labbé

Comparison of Wang tiling solvers

During the last year, I have written a Python module to deal with Wang tiles containing about 4K lines of code including doctests and documentation.

It can be installed like this:

sage -pip install slabbe

It can be used like this:

sage: from slabbe import WangTileSet
sage: tiles = [(2,4,2,1), (2,2,2,0), (1,1,3,1), (1,2,3,2), (3,1,3,3),
....: (0,1,3,1), (0,0,0,1), (3,1,0,2), (0,2,1,2), (1,2,1,4), (3,3,1,2)]
sage: T0 = WangTileSet([map(str,t) for t in tiles])
sage: T0.tikz(ncolumns=11).pdf()
/Files/2018/T0_tiles.svg

The module on wang tiles contains a class WangTileSolver which contains three reductions of the Wang tiling problem the first using MILP solvers, the second using SAT solvers and the third using Knuth's dancing links.

Here is one example of a tiling found using the dancing links reduction:

sage: %time tiling = T0.solver(10,10).solve(solver='dancing_links')
CPU times: user 36 ms, sys: 12 ms, total: 48 ms
Wall time: 65.5 ms
sage: tiling.tikz().pdf()
/Files/2018/T0_10x10tiling.svg

All these reductions now allow me to compare the efficiency of various types of solvers restricted to the Wang tiling type of problems. Here is the list of solvers that I often use.

List of solvers
Solver Description
'Gurobi' MILP solver
'GLPK' MILP solver
'PPL' MILP solver
'LP' a SAT solver using a reduction to LP
'cryptominisat' SAT solver
'picosat' SAT solver
'glucose' SAT solver
'dancing_links' Knuth's algorihm

In this recent work on the substitutive structure of Jeandel-Rao tilings, I introduced various Wang tile sets \(T_i\) for \(i\in\{0,1,\dots,12\}\). In this blog post, we will concentrate on the 11 Wang tile set \(T_0\) introduced by Jeandel and Rao as well as \(T_2\) containing 20 tiles and \(T_3\) containing 24 tiles.

Tiling a n x n square

The most natural question to ask is to find valid Wang tilings of \(n\times n\) square with given Wang tiles. Below is the time spent by each mentionned solvers to find a valid tiling of a \(n\times n\) square in less than 10 seconds for each of the three wang tile sets \(T_0\), \(T_2\) and \(T_3\).

/Files/2018/T0_square_tilings.svg /Files/2018/T2_square_tilings.svg /Files/2018/T3_square_tilings.svg

We remark that MILP solvers are slower. Dancing links can solve 20x20 squares with Jeandel Rao tiles \(T_0\) and SAT solvers are performing very well with Glucose being the best as it can find a 55x55 tiling with Jeandel-Rao tiles \(T_0\) in less than 10 seconds.

Finding all dominoes allowing a surrounding of given radius

One thing that is often needed in my research is to enumerate all horizontal and vertical dominoes that allow a given surrounding radius. This is a difficult question in general as deciding if a given tile set admits a tiling of the infinite plane is undecidable. But in some cases, the information we get from the dominoes admitting a surrounding of radius 1, 2, 3 or 4 is enough to conclude that the tiling can be desubstituted for instance. This is why we need to answer this question as fast as possible.

Below is the comparison in the time taken by each solver to compute all vertical and horizontal dominoes allowing a surrounding of radius 1, 2 and 3 (in less than 1000 seconds for each execution).

/Files/2018/T0_dominoes_surrounding.svg /Files/2018/T2_dominoes_surrounding.svg /Files/2018/T3_dominoes_surrounding.svg

What is surprising at first is that the solvers that performed well in the first \(n\times n\) square experience are not the best in the second experiment computing valid dominoes. Dancing links and the MILP solver Gurobi are now the best algorithms to compute all dominoes. They are followed by picosat and cryptominisat and then glucose.

The source code of the above comparisons

The source code of the above comparison can be found in this Jupyter notebook. Note that it depends on the use of Glucose as a Sage optional package (#26361) and on the most recent development version of slabbe optional Sage Package.

by Sébastien Labbé at December 12, 2018 03:24 PM

September 07, 2018

Sébastien Labbé

Wooden laser-cut Jeandel-Rao tiles

I have been working on Jeandel-Rao tiles lately.

/Files/2018/article2_T0_tiles.svg

Before the conference Model Sets and Aperiodic Order held in Durham UK (Sep 3-7 2018), I thought it would be a good idea to bring some real tiles at the conference. So I first decided of some conventions to represent the above tiles as topologically closed disk basically using the representation of integers in base 1:

/Files/2018/T0_shapes.svg

With these shapes, I created a 33 x 19 patch. With 3cm on each side, the patch takes 99cm x 57cm just within the capacity of the laser cut machine (1m x 60 cm):

/Files/2018/33x19_A_scale3.svg

With the help of David Renault from LaBRI, we went at Coh@bit, the FabLab of Bordeaux University and we laser cut two 3mm thick plywood for a total of 1282 Wang tiles. This is the result:

/Files/2018/laser_cut_8x8.jpg

One may recreate the 33 x 19 tiling as follows (note that I am using Cartesian-like coordinates, so the first list data[0] actually is the first column from bottom to top):

sage: data = [[10, 4, 6, 1, 3, 3, 7, 0, 9, 7, 2, 6, 1, 3, 8, 7, 0, 9, 7],
....:  [4, 5, 6, 1, 8, 10, 4, 0, 9, 3, 8, 7, 0, 9, 7, 5, 0, 9, 3],
....:  [3, 7, 6, 1, 7, 2, 5, 0, 9, 8, 7, 5, 0, 9, 3, 7, 0, 9, 10],
....:  [10, 4, 6, 1, 3, 8, 7, 0, 9, 7, 5, 6, 1, 8, 10, 4, 0, 9, 3],
....:  [2, 5, 6, 1, 8, 7, 5, 0, 9, 3, 7, 6, 1, 7, 2, 5, 0, 9, 8],
....:  [8, 7, 6, 1, 7, 5, 6, 1, 8, 10, 4, 6, 1, 3, 8, 7, 0, 9, 7],
....:  [7, 5, 6, 1, 3, 7, 6, 1, 7, 2, 5, 6, 1, 8, 7, 5, 0, 9, 3],
....:  [3, 7, 6, 1, 10, 4, 6, 1, 3, 8, 7, 6, 1, 7, 5, 6, 1, 8, 10],
....:  [10, 4, 6, 1, 3, 3, 7, 0, 9, 7, 5, 6, 1, 3, 7, 6, 1, 7, 2],
....:  [2, 5, 6, 1, 8, 10, 4, 0, 9, 3, 7, 6, 1, 10, 4, 6, 1, 3, 8],
....:  [8, 7, 6, 1, 7, 5, 5, 0, 9, 10, 4, 6, 1, 3, 3, 7, 0, 9, 7],
....:  [7, 5, 6, 1, 3, 7, 6, 1, 10, 4, 5, 6, 1, 8, 10, 4, 0, 9, 3],
....:  [3, 7, 6, 1, 10, 4, 6, 1, 3, 3, 7, 6, 1, 7, 2, 5, 0, 9, 8],
....:  [10, 4, 6, 1, 3, 3, 7, 0, 9, 10, 4, 6, 1, 3, 8, 7, 0, 9, 7],
....:  [4, 5, 6, 1, 8, 10, 4, 0, 9, 3, 3, 7, 0, 9, 7, 5, 0, 9, 3],
....:  [3, 7, 6, 1, 7, 2, 5, 0, 9, 8, 10, 4, 0, 9, 3, 7, 0, 9, 10],
....:  [10, 4, 6, 1, 3, 8, 7, 0, 9, 7, 5, 5, 0, 9, 10, 4, 0, 9, 3],
....:  [2, 5, 6, 1, 8, 7, 5, 0, 9, 3, 7, 6, 1, 10, 4, 5, 0, 9, 8],
....:  [8, 7, 6, 1, 7, 5, 6, 1, 8, 10, 4, 6, 1, 3, 3, 7, 0, 9, 7],
....:  [7, 5, 6, 1, 3, 7, 6, 1, 7, 2, 5, 6, 1, 8, 10, 4, 0, 9, 3],
....:  [3, 7, 6, 1, 10, 4, 6, 1, 3, 8, 7, 6, 1, 7, 2, 5, 0, 9, 8],
....:  [10, 4, 6, 1, 3, 3, 7, 0, 9, 7, 2, 6, 1, 3, 8, 7, 0, 9, 7],
....:  [4, 5, 6, 1, 8, 10, 4, 0, 9, 3, 8, 7, 0, 9, 7, 5, 0, 9, 3],
....:  [3, 7, 6, 1, 7, 2, 5, 0, 9, 8, 7, 5, 0, 9, 3, 7, 0, 9, 10],
....:  [10, 4, 6, 1, 3, 8, 7, 0, 9, 7, 5, 6, 1, 8, 10, 4, 0, 9, 3],
....:  [3, 3, 7, 0, 9, 7, 5, 0, 9, 3, 7, 6, 1, 7, 2, 5, 0, 9, 8],
....:  [8, 10, 4, 0, 9, 3, 7, 0, 9, 10, 4, 6, 1, 3, 8, 7, 0, 9, 7],
....:  [7, 5, 5, 0, 9, 10, 4, 0, 9, 3, 3, 7, 0, 9, 7, 5, 0, 9, 3],
....:  [3, 7, 6, 1, 10, 4, 5, 0, 9, 8, 10, 4, 0, 9, 3, 7, 0, 9, 10],
....:  [10, 4, 6, 1, 3, 3, 7, 0, 9, 7, 5, 5, 0, 9, 10, 4, 0, 9, 3],
....:  [2, 5, 6, 1, 8, 10, 4, 0, 9, 3, 7, 6, 1, 10, 4, 5, 0, 9, 8],
....:  [8, 7, 6, 1, 7, 5, 5, 0, 9, 10, 4, 6, 1, 3, 3, 7, 0, 9, 7],
....:  [7, 5, 6, 1, 3, 7, 6, 1, 10, 4, 5, 6, 1, 8, 10, 4, 0, 9, 3]]

The above patch have been chosen among 1000 other randomly generated as the closest to the asymptotic frequencies of the tiles in Jeandel-Rao tilings (or at least in the minimal subshift that I describe in the preprint):

sage: from collections import Counter
sage: c = Counter(flatten(data))
sage: tile_count = [c[i] for i in range(11)]

The asymptotic frequencies:

sage: phi = golden_ratio.n()
sage: Linv = [2*phi + 6, 2*phi + 6, 18*phi + 10, 2*phi + 6, 8*phi + 2,
....:      5*phi + 4, 2*phi + 6, 12/5*phi + 14/5, 8*phi + 2,
....:      2*phi + 6, 8*phi + 2]
sage: perfect_proportions = vector([1/a for a in Linv])

Comparison of the number of tiles of each type with the expected frequency:

sage: header_row = ['tile id', 'Asymptotic frequency', 'Expected nb of copies',
....:               'Nb copies in the 33x19 patch']
sage: columns = [range(11), perfect_proportions, vector(perfect_proportions)*33*19, tile_count]
sage: table(columns=columns, header_row=header_row)
  tile id   Asymptotic frequency   Expected nb of copies   Nb copies in the 33x19 patch
+---------+----------------------+-----------------------+------------------------------+
  0         0.108271182329550      67.8860313206280        67
  1         0.108271182329550      67.8860313206280        65
  2         0.0255593590340479     16.0257181143480        16
  3         0.108271182329550      67.8860313206280        71
  4         0.0669152706817991     41.9558747174880        42
  5         0.0827118232955023     51.8603132062800        51
  6         0.108271182329550      67.8860313206280        65
  7         0.149627093977301      93.8161879237680        95
  8         0.0669152706817991     41.9558747174880        44
  9         0.108271182329550      67.8860313206280        67
  10        0.0669152706817991     41.9558747174880        44

I brought the \(33\times19=641\) tiles at the conference and offered to the first 7 persons to find a \(7\times 7\) tiling the opportunity to keep the 49 tiles they used. 49 is a good number since the frequency of the lowest tile (with id 2) is about 2% which allows to have at least one copy of each tile in a subset of 49 tiles allowing a solution.

A natural question to ask is how many such \(7\times 7\) tilings does there exist? With ticket #25125 that was merged in Sage 8.3 this Spring, it is possible to enumerate and count solutions in parallel with Knuth dancing links algorithm. After the installation of the Sage Optional package slabbe (sage -pip install slabbe), one may compute that there are 152244 solutions.

sage: from slabbe import WangTileSet
sage: tiles = [(2,4,2,1), (2,2,2,0), (1,1,3,1), (1,2,3,2), (3,1,3,3),
....: (0,1,3,1), (0,0,0,1), (3,1,0,2), (0,2,1,2), (1,2,1,4), (3,3,1,2)]
sage: T0 = WangTileSet(tiles)
sage: T0_solver = T0.solver(7,7)
sage: %time T0_solver.number_of_solutions(ncpus=8)
CPU times: user 16 ms, sys: 82.3 ms, total: 98.3 ms
Wall time: 388 ms
152244

One may also get the list of all solutions and print one of them:

sage: %time L = T0_solver.all_solutions(); print(len(L))
152244
CPU times: user 6.46 s, sys: 344 ms, total: 6.8 s
Wall time: 6.82 s
sage: L[0]
A wang tiling of a 7 x 7 rectangle
sage: L[0].table()  # warning: the output is in Cartesian-like coordinates
[[1, 8, 10, 4, 5, 0, 9],
 [1, 7, 2, 5, 6, 1, 8],
 [1, 3, 8, 7, 6, 1, 7],
 [0, 9, 7, 5, 6, 1, 3],
 [0, 9, 3, 7, 6, 1, 8],
 [1, 8, 10, 4, 6, 1, 7],
 [1, 7, 2, 2, 6, 1, 3]]

This is the number of distinct sets of 49 tiles which admits a 7x7 solution:

sage: from collections import Counter
sage: def count_tiles(tiling):
....:     C = Counter(flatten(tiling.table()))
....:     return tuple(C.get(a,0) for a in range(11))
sage: Lfreq = map(count_tiles, L)
sage: Lfreq_count = Counter(Lfreq)
sage: len(Lfreq_count)
83258

Number of other solutions with the same set of 49 tiles:

sage: Counter(Lfreq_count.values())
Counter({1: 49076, 2: 19849, 3: 6313, 4: 3664, 6: 1410, 5: 1341, 7: 705, 8:
293, 9: 159, 14: 116, 10: 104, 12: 97, 18: 44, 11: 26, 15: 24, 13: 10, 17: 8,
22: 6, 32: 6, 16: 3, 28: 2, 19: 1, 21: 1})

How the number of \(k\times k\)-solutions grows for k from 0 to 9:

sage: [T0.solver(k,k).number_of_solutions() for k in range(10)]
[0, 11, 85, 444, 1723, 9172, 50638, 152244, 262019, 1641695]

Unfortunately, most of those \(k\times k\)-solutions are not extendable to a tiling of the whole plane. Indeed the number of \(k\times k\) patches in the language of the minimal aperiodic subshift that I am able to describe and which is a proper subset of Jeandel-Rao tilings seems, according to some heuristic, to be something like:

[1, 11, 49, 108, 184, 268, 367, 483]

I do not share my (ugly) code for this computation yet, as I will rather share clean code soon when times come. So among the 152244 about only 483 (0.32%) of them are prolongable into a uniformly recurrent tiling of the plane.

by Sébastien Labbé at September 07, 2018 09:16 AM

May 02, 2018

OpenDreamKit

Free Computational Mathematics

This is an announcement for a large one week dissemination conference organized by OpenDreamKit at the CIRM premises near Marseille, France. With OpenDreamKit approaching its end (august 2019), this will our main public closing event.

Webpage on the CIRM website for registrations

May 02, 2018 11:01 PM

Jupyter receives the ACM Software System Award!

Project Jupyter has been awarded the 2017 ACM Software System Award, joining an illustrious list of projects that contains major highlights of computing history, including Unix, TeX, S (R’s predecessor), the Web, Mosaic, Java, INGRES (modern databases) and more. We are delighted of this strong recognition of a major piece of the ecosystem OpenDreamKit is based upon and contributing to.

Congratulations to the Jupyter team, including our fellows Min Ragan-Kelley and Thomas Kluyver!

May 02, 2018 12:00 AM

April 24, 2018

Harald Schilly

SageMath GSoC 2018 Projects

We're happy to announce our list of Google Summer of Code projects for 2018! Thank you to everyone involved, all students, mentors and of course, Google! 


Sai Harsh Tondomker (David Coudert and Dima Pasechnik):

Addition of SPQR-tree to graph module

In this project, our goal is to extend the graph theory library in Sage by implementing functionality to find triconnected subgraphs, which will partition an undirected graph into 3-connected components and the construction of the corresponding SPQR-tree. These modules can later be used as pre-processing for several graph problems such as Hamiltonian Cycle, traveling salesman problem.



Meghana M Reddy (David Coudert and Dima Pasechnik):

Addition of SPQR-trees to the graph module of Sage Math

The aim of the project is to code the linear time algorithm for partitioning a graph into 3-connected components and constructing the corresponding SPQR-tree of the graph. Further, this algorithm can be used as a subroutine for several other graph problems such as recognition of chordless graphs, hamiltonian cycle etc.



Filip Ion (Johan Rosenkilde):

Databases and bounds of codes

The following proposal detail some possible improvements of the coding theory component of SageMath. We aim to build databases of precomputed bounds and optimal examples of linear codes for each choice of parameters below a maximum range.



Raghukul Raman (Travis Scrimshaw and Benjamin Hutz):

Rational Point on Varieties

This project aims at implementing basic algorithms for finding rational points on varieties. Classically algebraic variety is defined as the set of solutions of polynomial equations over a number field. A rational point of an algebraic variety is a solution of set of equations in the given field (rational field if not mentioned). Much of the number theory can be viewed as the study of rational point of algebraic varieties. Some of the great achievements of number theory amount to determining the rational points on particular curves. For example, Fermat’s last theorem is equivalent to the statement that for an integer n ≥ 3, the only rational point of the curve xn+yn=zn in P2 over Qare the obvious ones. Common variants of these question include determining the set of all points of V(K) of height up to some bound. The aim of this project is to implement some basic rational point finding algorithms (sieving modulo prime and enumeration) and extend these to product_projective_space scheme.



Dario Asprone (Dima Pasechnik)

Checking graph isomorphism using the Weisfeiler-Lehman algorithm

Currently SageMath checks for graph isomorphism through an internal package with a corresponding method, called isomorphic and contained in sage.groups.perm_gps.partn_ref.refinement_graphs. This method, in addition to being quite convoluted and without much documentation about its inner workings, was last updated in a significant manner in 2011.
The project aims at creating a new package which implements an efficient in practice heuristic method to check if two graphs are isomorphic, using one of the members of the Weisfeiler-Lehman (WL) algorithm hierarchy. An attempt was made in the past at the same task (but with a narrower scope, limited to only implementing the second order version of the WL algorithm), but the code was found to contain a bug and was never implemented/removed from the codebase (see Trac #10482).
To the best of my knowledge, this would be the first working open-source implementation of k-WL, for k > 1.


by Harald Schilly ([email protected]) at April 24, 2018 04:40 PM

April 18, 2018

OpenDreamKit

Spring school in Experimental Mathematics (MathExp2018)

OpenDreamKit is organizing a spring school on experimental mathematics (“Mathématiques Expérimentales”). The first week will focus on learning tools (4 courses given by expert in computer science and mathematics as well as programming exercises). During the second week each participant is asked to work on a programming project related to his or her research.

April 18, 2018 12:00 AM

March 19, 2018

March 15, 2018

OpenDreamKit

Toward versatile JupyterHub deployments, with the Binder and JupyterHub convergence

About this document

Nowadays, many institutions run a JupyterHub server, providing their members with easy access to Jupyter-based virtual environments (a.k.a. notebook servers), preinstalled with a stack of computational software, tailored to the typical needs of the institution’s members. Meanwhile, since a few years ago, Binder lets any user on the internet define, run, and share temporary virtual environments equipped with an arbitrary software stack (examples).

In Fall 2017, Binder was revamped as BinderHub, a lightweight layer on top of JupyterHub. The next step in this convergence is to bring together the best of both worlds: think persistent authenticated Binder; or repo2docker enabled JupyterHub. For now, let’s call them versatile JupyterHub deployments.

This document brainstorms this convergence process: it sets up the ground with a scenario and assumptions for a typical institution-wide JupyterHub deployment, proposes specifications from a user perspective, and describes some typical use cases that would be enabled by such specifications. It further discusses security aspects and what remains to be implemented, before concluding with more advanced features and open questions.

This document started as a collection of private notes reflecting on in-development JupyterHub deployment at Paris-Saclay and EGI respectively, with some additional contributions. They were largely informed by many discussions at March 2018’s JupyterHub coding sprint in Orsay that involved dev-ops of those deployments and two of the main JupyterHub and BinderHub devs: Min Ragan Kelley and Chris Holdgraf. It was also inspired by some of cocalc features. Silly ideas reflecting here are mine, hard work is theirs; thank you all!!!

This document is meant for brainstorming; please hop in and edit.

Typical scenario

An institution – typically a university, a national lab, a transnational research infrastructure such as the European XFEL, or transational infrastructure provider like EGI – wishes to provide its members and users with a Jupyter service.

The service lets user spawn and access personal or collaborative virtual environments: namely a web interface to a light weight virtual machine, in which they can use Jupyter notebooks, run calculations, etc. In the remainder of this document we will use JupyterHub’s terminology and call such virtual environments notebook servers.

To cater for a large variety of use cases in teaching and research, the main aim of the upcoming specifications is to make the service as versatile as possible. In particular, it should empower the users to customize the service (available software stack, storage setup, …), without a need for administrator intervention.

Assumptions

The institution has access to:

  • An authentication service (Single Sign-On)

    Examples: Paris-Sud’ Adonis internal SSO, the federated “Recherche et Enseignement Supérieur” authentication service of Renater, EGI CheckIn, …

  • Computing resources

    Examples: a local cluster, access to a externalized cloud (GC, AWS, Azure, …)

  • A shared volume service using the above authentication service

    E.g. a local NextCloud service, or …

  • (Optional) a forge

    Examples: a local gitlab service, github, … if private repositories are needed, the forge presumably will need the same authentication service

Specifications / User Story

Main page of the service

After authentication, the user faces a page that is similar to binder’s main page:

  • A form to describe and launch the desired persistent notebook server.

    For the sake of simplicity, the form could optionally start hidden, and be replaced by two items: “Choose preconfigured notebook server” / “Create my own notebook server”.

  • Links to documentation

  • Warnings about potential security concerns, to inform the user choices.

    Alternatively, such warnings could be displayed in a later security confirmation dialog “with the given configuration, a malicious image/collaborator could …; do you trust it? Proceed/Edit Configuration/Cancel/Don’t ask again”

  • Institutional credits (service provided by …)

The form consists of:

  • The usual binder items:

    • the description of the computing environment: a repo-to-docker-style git repo+branch+…

    • the file/url to open on startup

    • a UI to get a URL/badge referencing the machine

  • Persistence and access options:

    • server_name: name to give to the server

      If server_name is not specified, create a random server name?

    • mount options: [[mount point, volume url], […]]

      This assumes that the user has appropriate credentials to access the given volumes through the authentication service

    • collaborators=[….]: (optional) a white list of other users of this jupyterhub that can access this server

    • a flag allowing public ‘temporary read-only’ access (meaning that the container and all changes are thrown away at the end of the session; and that any ‘mounted’ data sources are read-only during the session)

      alternatively

    • credentials: whether to pass the user credentials into the container (as environment variable, or file)

    • resources scaling (optional): memory, number of processors, duration (time or time of inactivity after which the server will be automatically stopped / destroyed)

Behavior upon clicking Launch:

  • If a notebook server with the given name already exists and the parameters are not changed (or not set): connect to that server, restarting it if needed

  • If the parameters have been changed, update the existing server when possible? Or should the user just delete the server?

  • Otherwise, create and launch

Behavior upon following a server URL/badge:

  • Display the authentication page (if not yet authenticated)

  • Display a security confirmation dialog as above (if origin is not within the jupyterhub), with a description of the configuration and origin.

  • As above after clicking “Launch”

Some use cases

Local binder (better name? [[email protected]?])

Scenarios:

  • Luc, a researcher, discovered a nice computing environment on Binder. Maybe a notebook demonstrating an interesting workflow to analyze data. He wants to use it more intensively on his own data.

  • Lucy has found a notebook & binder environment published with a paper, and she wants to re-execute the notebook to reproduce the published results and start her research in the field. However, no binder (compute) resources are available in the cloud. The computation takes 20 minutes on a standard PC and she would like to run this calculation on her local server.

Setup:

They recreate the same environment on their local server (for example by just changing the server name in the binder URL).

More advanced scenario to explore: Lucy would like to use her Desktop PC because that resource is readily available and idles 99% of the time.

Easy sharing of computational environments

Scenarios:

  • Sarah, a power user, is using some specialized stack of software on a daily basis; maybe she authored some of it. She wants her colleagues in her lab to try out that software.

  • Paul organizes a training session for his colleagues.

  • Alice has authored notebooks that she wants to share with her colleagues. Maybe the notebooks automatize some complicated processes and present them in the form of interactive web applications built using Jupyter widgets (demo). Think of a wizard to setup parameters and data, run a computation, and visualize the results.

Setup:

They prepare a notebook server with the appropriate software stack installed configured and access to the user’s shared home directory. Maybe they provide some document. They then just have to share the URL with their colleagues. No lengthy software installation. The colleagues can then start working right away, in their own environment, using their own data, saving their work in their home directory.

In all cases, the explicit description of the computing environment (and the use of open source software!) eases:

  • the publication of the same computational environment / notebooks elsewhere, e.g. on a public Binder;
  • the installation the same software on the user’s personal computer.

Collaboration

Scenario: Alice and Bob want to collaborate on some data analysis.

Setup:

They create a shared volume. Then either:

  • They set up each their own notebook server, and let them share the same volume.

  • Alice sets up a single server, with Bob as collaborator. Within the server, they are considered as the same user.

At this stage, they should not edit the same notebook simultaneously. However the stable version of JupyterLab, due sometime in 2018, should enable real-time collaboration in both setups, thanks to a CRDT file format for notebooks.

Class management

Scenario: using the server for a class’ computer labs and assignments

Desired features:

  • Full customizability of the computing environment by the teacher;
  • Support for live manipulation of the class notes;
  • Support for submission, collection and auto-grading of assignments;
  • Access from the computer labs or outside (home, …);
  • Possibility to either use the server, needing only a web browser (no software installation required; supports phones, tablets, …), or install and run the software locally.

Prerequisites:

  • A JupyterHub instance, configured as above, accessible from the teachers and students;
  • A forge such as gitlab or github, accessible from JupyterHub
  • A shared drive service (e.g. next cloud/nsf/…), serving home directories, and letting teachers setup shared volumes
  • A shared authentication (e.g. SSO), so that notebook servers in JupyterHub can access the shared drive.
  • Some web server

Procedure for the teacher(s):

  • Set up a shared volume for the whole class
  • Prepare a computing environment in a git repository on the forge.

    Typically includes: computational software, [nbgrader] + configuration, …

  • Prepare the course material typically in a git repository on the forge (the same one or another)
  • Use JupyterHub’s form UI to setup (and test) a full description of the student’s notebook servers, with mounting of the home directory (or subdirectory thereof?) and shared volume. Possibly add the teacher(s) as collaborator(s) ??? Get the corresponding URL.
  • Possibly prepare a variant thereof for teachers of the class.
  • Set up a web page for the class, with hyperlink(s) to the above URL.

    There can typically be an hyperlink for each session pointing directly to the exercises for that particular session.

Fetching the class material:

  • Option 1: manually download from the web (wget in the terminal, or web upload, …) or shared volume
  • Option 2: use nbgitpuller from the command line
  • Option 3: use nbgrader, either from the command line or with the UI to get the files from the shared volume
  • Option 4: automatize the above using a notebook server extension such as that for nbgitpuller

Submitting assignments:

  • Use nbgrader, either from the command line or with the UI to push the files to the shared volume

To explore: integration with local e-learning platforms like Moodle, typically using LTI, in particular for class management and grades reporting. There already exists an LTI authenticator for Jupyter.

Security concerns

A malicious image description, image, or collaborator can:

  • Take any action within the image being built or within the notebook server

  • Waste computing resources (cpu/memory);

  • With internet: connect to any website, without specific priviledges (e.g. participate to a DOS attack); abuse computing resources, e.g. for bitcoining. The image building almost certainly needs internet access.

  • With persistent storage and internet access: access and leak information from the storage; e.g. private information from the user;

  • With read-write persistent storage: corrupt the storage (e.g. the user’s home directory)

  • With credentials: take any action on behalf of the user in any service that use the same authentication.

Implementation status

Most of the features are already there. Here are some of the missing steps:

  • Extending binder’s form as described above;

  • Implementing the logic to mount shared volumes;

  • Instructions / scripts for setting up a local docker registry;

    The current Binder installation tutorial assumes that a docker registry is already available; e.g. that provided by google cloud services.

    For a smaller setup using the same host for both building images and running notebook servers, no docker registry is needed. In this case, JupyterHub could just run repo2docker locally before launching the notebook server. repo2docker however does not implement image caching; so a simplified version of the image cache mechanism of Binder needs to be implemented.

Alternatives

It should be noted that there is basically no coupling between JupyterHub/Binder and Jupyter. The former is merely a general purpose web service for provisioning web-based virtual environments. For example, JupyterHub/Binder has also been used to serve R-Studio based virtual environments. Reciprocally, there are alternative services to do such provisioning from which to get inspiration, like Simulagora.

Advanced features & open questions

Redirection

The main form could contain an additional item:

  • URL input / dropdown menu to choose another jupyterhub instances to redirect to.

Use cases:

  • A user finds a nice image on binder; he wants to customize it to run it on his institution’s jupyterhub; possibly adding persistent storage to it. Or reciprocally: a user wants to share on binder a local image of his.

  • An institution wants to advertise other jupyterhub instances; this could e.g. be used by a single entry point for federating a collection of instances (e.g. all French academic JupyterHub’s).

Marketplace of images

With the URL mechanism, any (power) user can prepare a dedicated image and share it with his collaborators. Images can be more or less specific: from just a computing environment to a fully specified machine, with mount points, …

Thanks to the above there is no need for a tightly coupled Marketplace. Nevertheless it may be useful to have one location (or more) for collecting and publicizing links to popular images. Some minimal coupling may be needed if one would want to sort the images according to their popularity.

Note: at this stage, a user cannot produce an image by setting up a machine “by hand” and save its state. The construction must be fully scripted. On the plus side, this encourages users to script their images, making them more reproducible.

National and international initiatives such as the European Open Science Cloud may help providing such a catalog of relevant Jupyter notebooks/images.

Default volume configuration

  • Choose good defaults, if at all possible compatible with binder. Main question: where should the files provided by the binder environment be copied? In a subdirectory of the persistent home? In the home directory, with the persistent home being mounted in a subdirectory thereof?

Intensive use and resource management / accounting

The above has been written with casual use in mind. For extensive use, some form of accounting and controlling of the resources used would be needed. For example, for LAL’s cloud we may want to have some form of bridge between the OpenStack dashboard and the hub. UI to be designed. Could the user provision a machine using the dashboard, and then specify on JupyterHub that the container shall be run on that machine?

March 15, 2018 12:00 AM

March 07, 2018

OpenDreamKit

OpenDreamKit at the RSE conference

Open Dream Kit presence at the second RSE conference

The Manchester Museum of Science and Industry (MOSI) saw the second Research Software Engineering (RSE) conference on September 7-8th, 2017. Over 200 attendees gathered to discuss ways to better support and advance science via software, innovative computational tools, and policy making. There were more than 40 talks and 15 workshops covering a diverse range of topics, from community building to imposter syndrome, data visualization, and High-Performance Computing.

Attending the conference is an excellent opportunity to integrate within the international RSE community and appreciate how much this has grown over the last few years. All thanks to the great work done by RSEs within their institutions and their efforts to make software a valuable research asset. It will be, for certain, interesting to see how this will continue to grow and evolve as policy changes take place and more research councils, funding bodies, and research institutions acknowledge the importance of research software and its scientific impact.

OpenDreamKit member, Tania Allard, ran a hands-on workshop on Jupyter notebooks for reproducible research. This workshop focused on the use of Jupyter notebooks as a means to disseminate reproducible analysis workflows and how this can be leveraged using tools such as nbdime and nbval. Both nbdime and nbval were developed by members of the OpenDreamKit project as a response to the growing popularity of the Jupyter notebooks and the lack of native integration between these technologies and existing version control and validation/testing tools.

An exceptional win was that this workshop was, in fact, one of the most popular events of the conference and we were asked to run it twice as it was massively oversubscribed. This reflects, on one hand, the popularity of Jupyter notebooks due to the boom of literate programming and its focus on human-readable code. Allowing researchers to share their findings and the code they used along the way in a compelling narrative. On the other hand, it demonstrates the importance of reproducible science and the need for tools that help RSE and researchers to achieve this goal, which aligns perfectly with the goals of OpenDreamKit.

The workshop revolved around 3 main topics:

  1. Version control of the Jupyter notebooks
  2. Notebooks validation
  3. The basics of reproducible software practices.

The main focus was on how tools like nbdime and nbval can support people already using Jupyter notebooks but have struggled to integrate these with software best development practices due to a number of limitations on the existing tools. Then, we followed on other actions that can be taken to ensure that their data analysis workflows were reproducible and sustainable. This lead to a number of interesting discussions about the topic and allowed for the attendees to share their previous experiences regarding reproducibility and/or the lack thereof in different research areas.

We plan to run a set or workshops around reproducibility over the duration of the ODK project and we’ll make sure to report on them here too. Finally, all the materials are licensed under CC-BY and can be found in this GitHub repository .

March 07, 2018 12:00 AM

February 21, 2018

OpenDreamKit

Research Software Engineer position opening at Université Paris-Sud (tentativelly filled)

This is an announcement for a research software engineer position opening at Université Paris-Sud, working on web-based user interfaces and semantic interoperability layers for mathematical computational systems and databases.

Time line

Interviews in March 2018 for a recruitment as soon as possible.

Update (March 27th): after the interviews on March 21st, we selected and ranked two candidates and made an offer to the first one. Pending administrative hoops, they will take the position.

Salary

For a full-time position, and depending on the applicant’s past experience, between 2000€ and 3000€ of monthly “salaire net” (salary after non-wage labour cost but before income tax). Equivalently, what this salary represents for is a “salaire brut” of up to 46200€ yearly. We have secured funding until the end of the project (August 2019).

Location

The research software engineer will work at the Laboratoire de Recherche en Informatique of Université Paris Sud, in the Orsay-Bures-Gif-Saclay campus, 25 km South-West of Paris city centre.

Mission and activities

Paris Sud is the leading site of OpenDreamKit, with eight participants involved in all the work packages. The research software engineer will join that team and support its efforts in WP4 and WP6, targeting respectively Jupyter-based user interfaces and interoperability for mathematical computational systems and databases. A common theme is how to best exploit the mathematical knowledge embedded in the systems. For some context, see e.g. the recent publications describing the Math-In-The-Middle approach.

More specifically, a successful candidate will be expected to contribute significantly to some of the following tasks (see also OpenDreamKit’s Proposal:

  • Dynamic documentation and exploration system (Task 4.5)

    Introspection has become a critical tool in interactive computation, allowing user to explore, on the fly, the properties and capabilities of the objects under manipulation. This challenge becomes particularly acute in systems like Sage where large parts of the class hierarchy is built dynamically, and static documentation builders like Sphinx cannot anymore render all the available information.

    In this task, we will investigate how to further enhance the user experience. This will include:

    • On the fly generation of Javadoc style documentation, through introspection, allowing e.g. the exploration of the class hierarchy, available methods, etc.

    • Widgets based on the HTML5 and web component standards to display graphical views of the results of SPARQL queries, as well as populating data structures with the results of such queries,

    • D4.16: Exploratory support for semantic-aware interactive Jupyter widgets providing views on objects of the underlying computational or database components. Preliminary steps are demonstrated in the Larch Environment project (see demo videos) and sage-explorer. The ultimate aim would be to automatically generate LMFDB-style interfaces.

    Whenever possible, those features will be implemented generically for any computation kernel by extending the Jupyter protocol with introspection and documentation queries.

  • Memoisation and production of new data (Task 6.9)

    Many CAS users run large and intensive computations, for which they want to collect the results while simultaneously working on software improvements. GAP retains computed attribute values of objects within a session; Sage currently has a limited cached_method. Neither offers storage that is persistent across sessions or supports publication of the result or sharing within a collaboration. We will use, extend and contribute back to, an appropriate established persistent memoisation infrastructure, such as python-joblib, redis-simple-cache or dogpile.cache, adding features needed for storage and use of results in mathematical research. We will design something that is simple to deploy and configure, and makes it easy to share results in a controlled manner, but provides enough assurance to enable the user to rely on the data, give proper credit to the original computation and rerun the computation if they want to.

  • Knowledge-based code infrastructure (Task 6.5)

    Over the last decades, computational components, and in particular Axiom, MuPAD, GAP, or Sage, have embedded more and more mathematical knowledge directly inside the code, as a way to better structure it for expressiveness, flexibility, composability, documentation, and robustness. In this task we will review the various approaches taken in these software (e.g. categories and dynamic class hierarchies) and in proof assistants like Coq (e.g. static type systems), and compare their respective strength and weaknesses on concrete case studies. We will also explore whether paradigms offered by recent programming languages like Julia or Scala could enable a better implementation. Based on this we will suggest and experiment with design improvements, and explore challenges such as the compilation, verification, or interoperability of such code.

Skills and background requirements

  • Degree in mathematics or computer science; PhD appreciated but not required;

  • Strong programming experience with languages such as Python, Scala, Javascript, etc; experience with web technologies in general and the Jupyter stack in particular appreciated;

  • Experience in software design and practical implementation in large software projects; experience with computational mathematics software (e.g. SageMath) appreciated;

  • Experience in open-source development (collaborative development tools, interaction with the community, …);

  • Strong communication skills;

  • Fluency in oral and written English; speaking French is not a prerequisite.

Context

The position will be funded by OpenDreamKit, a Horizon 2020 European Research Infrastructure project that will run for four years, starting from September

  1. This project brings together the open-source computational mathematics ecosystem – and in particular LinBox, MPIR, SageMath, GAP, PARI/GP, LMFDB, Singular, MathHub, and the IPython/Jupyter interactive computing environment. – toward building a flexible toolkit for Virtual Research Environments for mathematics. Lead by Université Paris-Sud, this project involves about 50 people spread over 15 sites in Europe, with a total budget of about 7.6 million euros.

Within this ecosystem, the applicant will work primarily on the free open-source mathematics software system Sagemath. Based on the Python language and many existing open-source math libraries, SageMath is developed since 10 years by a worldwide community of 300 researchers, teachers and engineers, and has reached 1.5M lines of code.

The applicant will work within one of the largest teams of SageMath developers, composed essentially of researchers in mathematics and computer science, at the Laboratoire de Recherche en Informatique (LRI) and in nearby institutions. The LRI also hosts a strong team working on proof systems.

Applications

To apply for this position, please send an e-mail to upsud-recruitement-research-engineer at opendreamkit.org by March 10, with the following documents (in English) attached:

  • cover_letter.pdf: a cover letter explaining your interest in this particular position;

  • CV.pdf: a CV, highlighting among other things your skills and background and your contributions to open source software;

  • degree.pdf: copy of your most recent degree including (if applicable) the reviewers reports;

  • reference letters: files reference_letter_.pdf or contact information of potential referees.

Applications sent after March 10 will be considered until the position is filled.

February 21, 2018 12:00 AM

February 19, 2018

The Matroid Union

Google Summer of Code

As you might know, SageMath is a software system for mathematical computation. Built on Python, it has extensive libraries for numerous areas of mathematics. One of these areas is Matroid Theory, as has been exhibited several times on this blog.

Google Summer of Code is a program where Google pays students to work on open-source software during the summer.

Once again, SageMath has been selected as a mentoring organization for the Google Summer of Code. We’ve had students work on aspects of the Matroid Theory functionality for the past four years. Maybe this year, YOU can join those illustrious ranks! Check out the call for proposals and ideas list. Read the instructions on both pages carefully. Applications open on March 12, so it’s a good idea to start talking to potential mentors and begin writing your proposal!

by Stefan van Zwam at February 19, 2018 02:52 PM

February 06, 2018

OpenDreamKit

Remote project meeting

This is an online project meeting to review all achievements since March 2017.

February 06, 2018 12:00 AM

January 23, 2018

OpenDreamKit

Expérimentation mathématique et combinatoire avec Sage

Viviane Pons gave a two hours lecture on mathematical experimentation, research and open-source mathematical software development for a seminar organized by the students of the prestigious school Ecole Normale supérieure de Lyon.

January 23, 2018 12:00 AM

January 01, 2018

William Stein

Low latency local CoCalc and SageMath on the Google Pixelbook: playing with Crouton, Gallium OS, Rkt, Docker

I just got CoCalc fully working locally on my Google Pixelbook Chromebook! I want this, since (1) I was inspired by a recent blog post about computer latency, and (2) I'm going to be traveling a lot next week (the JMM in San Diego -- come see me at the Sage booth), and may have times without Internet during which I want to work on CoCalc's development.


I first tried Termux, which is a "Linux in Android" userland that runs on the Pixelbook (via Android), but there were way, way too many problems for CoCalc, which is a very complicated application, so this was out. The only option was to enable ChromeOS dev mode.

I next considered partitioning the hard drive, installing Linux natively (in addition to ChromeOS), and dual booting. However, it seems the canonical option is Gallium OS and it nobody has got that to work with Pixelbook yet (?). In fact, it appears that Gallium OS development made have stopped a year ago (?). Bummer. So I gave up on that approach...

The next option was to try Crouton + Docker, since we have a CoCalc Docker image. Unfortunately, it seems currently impossible to use Docker with the standard ChromeOS kernel.  The next thing I considered was to use Crouton + Rkt, since there are blog posts claiming Rkt can run vanilla Docker containers on Crouton.

I setup Crouton, installed the cli-extra chroot, and easily installed Rkt. I learned how Rkt is different than Docker, and tried a bunch of simple standard Docker containers, which worked. However, when I tried running the (huge) CoCalc Docker container, I hit major performance issues, and things broke down. If I had the 16GB Chromebook and more patience, maybe this would have worked. But with only 8GB RAM, it really wasn't feasible.

The next idea was to just use Crouton Linux directly (so no containers), and fix whatever issues arose. I did this, and it worked well, with CoCalc giving me a very nice local browser-based interface to my Crouton environment. Also, since we've spent so much time optimizing CoCalc to be fast over the web, it feels REALLY fast when used locally. I made some changes to the CoCalc sources and added a directory, to hopefully make this easier if anybody else tries. This is definitely not a 1-click solution.

Finally, for SageMath I first tried the Ubuntu PPA, but realized it is hopelessly out of date. I then downloaded and extracted the Ubuntu 16.04 binary and it worked fine. Of course, I'm also building Sage from source (I'm the founder of SageMath after all), but that takes a long time...


Anyway, Crouton works really, really well on the Pixelbook, especially if you do not need to run Docker containers.

by William Stein ([email protected]) at January 01, 2018 10:29 PM