July 22, 2016

William Stein

DataDog's pricing: don't make the same mistake I made

I stupidly made a mistake recently by choosing to use DataDog for monitoring the infrastructure for my startup (SageMathCloud).

I got bit by their pricing UI design that looks similar to many other sites, but is different in a way that caused me to spend far more money than I expected.

I'm writing this post so that you won't make the same mistake I did.  As a product, DataDog is of course a lot of hard work to create, and they can try to charge whatever they want. However, my problem is that what they are going to charge was confusing and misleading to me.

I wanted to see some nice web-based data about my new autoscaled Kubernetes cluster, so I looked around at options. DataDog looked like a new and awesomely-priced service for seeing live logging. And when I looked (not carefully enough) at the pricing, it looked like only $15/month to monitor a bunch of machines. I'm naive about the cost of cloud monitoring -- I've been using Stackdriver on Google cloud platform for years, which is completely free (for now, though that will change), and I've also used self hosted open solutions, and some quite nice solutions I've written myself. So my expectations were way out of whack.

Ever busy, I signed up for the "$15/month plan":

One of the people on my team spent a little time and installed datadog on all the VM's in our cluster, and also made DataDog automatically start running on any nodes in our Kubernetes cluster. That's a lot of machines.

Today I got the first monthly bill, which is for the month that just happened. The cost was $639.19 USD charged to my credit card. I was really confused for a while, wondering if I had bought a year subscription.

After a while I realized that the cost is per host! When I looked at the pricing page the first time, I had just saw in big letters "$15", and "$18 month-to-month" and "up to 500 hosts". I completely missed the "Per Host" line, because I was so naive that I didn't think the price could possibly be that high.

I tried immediately to delete my credit card and cancel my plan, but the "Remove Card" button is greyed out, and it says you can "modify your subscription by contacting us at [email protected]":

So I wrote to [email protected]:

Dear Datadog,

Everybody on my team was completely mislead by your
horrible pricing description.

Please cancel the subscription for wstein immediately
and remove my credit card from your system.

This is the first time I've wasted this much money
by being misled by a website in my life.

I'm also very unhappy that I can't delete my credit
card or cancel my subscription via your website. It's
like one more stripe API call to remove the credit card
(I know -- I implemented this same feature for my site).

And they responded:

Thanks for reaching out. If you'd like to cancel your
Datadog subscription, you're able to do so by going into
the platform under 'Plan and Usage' and choose the option
downgrade to 'Lite', that will insure your credit card
will not be charged in the future. Please be sure to
reduce your host count down to the (5) allowed under
the 'Lite' plan - those are the maximum allowed for
the free plan.

Also, please note you'll be charged for the hosts
monitored through this month. Please take a look at
our billing FAQ.

They were right -- I was able to uninstall the daemons, downgrade to Lite, remove my card, etc. all through the website without manual intervention.

When people have been confused with billing for my site, I have apologized, immediately refunded their money, and opened a ticket to make the UI clearer.  DataDog didn't do any of that.

I wish DataDog would at least clearly state that when you use their service you are potentially on the hook for an arbitrarily large charge for any month. Yes, if they had made that clear, they wouldn't have had me as a customer, so they are not incentivized to do so.

A fool and their money are soon parted. I hope this post reduces the chances you'll be a fool like me.  If you chose to use DataDog, and their monitoring tools are very impressive, I hope you'll be aware of the cost.


On Hacker News somebody asked: "How could their pricing page be clearer? It says per host in fairly large letters underneath it. I'm asking because I will be designing a similar page soon (that's also billed per host) and I'd like to avoid the same mistakes."  My answer:

[EDIT: This pricing page by the top poster in this thread is way better than I suggest below -- https://www.serverdensity.com/pricing/]

1. VERY clearly state that when you sign up for the service, then you are on the hook for up to $18*500 = $9000 + tax in charges for any month. Even Google compute engine (and Amazon) don't create such a trap, and have a clear explicit quota increase process.
2. Instead of "HUGE $15" newline "(small light) per host", put "HUGE $18 per host" all on the same line. It would easily fit. I don't even know how the $15/host datadog discount could ever really work, given that the number of hosts might constantly change and there is no prepayment.
3. Inform users clearly in the UI at any time how much they are going to owe for that month (so far), rather than surprising them at the end. Again, Google Cloud Platform has a very clear running total in their billing section, and any time you create a new VM it gives the exact amount that VM will cost per month.
4. If one works with a team, 3 is especially important. The reason that I had monitors on 50+ machines is that another person working on the project, who never looked at pricing or anything, just thought -- he I'll just set this up everywhere. He had no idea there was a per-machine fee.

by William Stein ([email protected]) at July 22, 2016 02:17 PM

June 27, 2016

Extending Matroid Functionality Google Summer of Code 2016

Midterm ish

My summer of code is broken up into several projects. There were a lot of small ones, a couple medium ones, and one large one. Right now, I'm in the midst of working on the large project. Basically, we want to feed Sage a collection of subsets of an edge set E, and have Sage tell us if there is a graph that has cycles which correspond to the subsets of E, and if so, to give a corresponding. This boils down to asking if a matroid is graphic, and asking for a graph that realizes the matroid.

For instance, if we give have E = {1, 2, 3, 4}, and our collection of sets is any three element subset of E, then we can't get an appropriate graph. To see this, we start constructing a graph. Our first cycle is {1, 2, 3}, There is only one graph on three elements that has this cycle, namely a triangle. To add the edge 4, we need to have a cycle {1, 2, 4}. But this means that we have to add 4 in parallel to the edge 3. This is a problem, because then {1, 3, 4}, in particular, is not a cycle of our graph.

This example illustrates a key idea of the algorithm. The set {1, 2} is a maximal set that is not contained in a cylce, so we skipped over those elements, and started with 3. We then added 3 and any needed elements of {1, 2} to our partial graph. And we kept adding elements till we either had a problem, or till we added all of the elements.

In our case, we didn't get so complicated of a graph that we had a choice about which graph to use for our partial graph. In general, this is not the case. It would be troublesome to check if we could add the new element to every graphs that realizes the already added elements, so we use a decomposition made possible by Whitney's 2 isomorphism theorem to check all of the graphs options at once. This of course makes the code more complicated. The algorithm that we are following comes from a paper by Ronald Bixby and Donald Wagner.

The tricky part, so far, has been trying to get information in and out of graphs. graph theorists care a lot about the vertices of a graph and much less about the edges of the graph. That is, they store their edges as a list of the two vertices that they are incident with, and a possible label. matroid theorists, however, care a lot more about the edges of a graph. This is true in general, and is true in particular for this project.

by [email protected] (Tara) at June 27, 2016 05:41 PM

June 23, 2016

Lauren Devitt

GSOC Midterm Update

One of the great joys in life that not all will be able to experience is when your code builds. When you have fixed all the syntax errors you build it, you test it, and you see ---------------------------------------------------------------------- All tests passed! ---------------------------------------------------------------------- Having to wait for your code to build is agonizing. You pray to some God, magic force, or Steve Jobs to help you in this time of need. Then it works and you feel

June 23, 2016 12:00 AM

June 09, 2016

Lauren Devitt

GSOC Week 2 Update

Two weeks into GSOC 2016 I’m so grateful that this is how I get to spend my summer. Using my coding muscles has made me stronger and more confident with my code. I am ready to create my first ticket in SAGE relating to GSOC. Sage uses Trac, which is an open source project managing software. It allows SAGE to track what people are currently working on for SAGE. It does this by are giving a ticket number to each piece of functionality pushed to the server. Peers, who pull them from the server,

June 09, 2016 12:00 AM

June 03, 2016

Extending Matroid Functionality Google Summer of Code 2016

First Week or so

Before coding started, I spent some time on code academy getting more familiar with the syntax of Python. I was impressed with the setup that they had (I would recommend it to my mom), and it helped me to learn python in a systematic way.

Since the 23rd I've been working on adding certificated (proof that we gave the right answer to a yes-no question) to some of the functions in the matroid part of Sage. For the first two days, I spent a lot of time trying to get Sage to compile. For a while, the problem was an error in a new release, and then I had some type of trouble on my end. I've also spent a good amount of time figuring out the ins and outs of documentation practices.

by [email protected] (Tara) at June 03, 2016 12:30 AM

May 21, 2016

Lauren Devitt

GSOC 2016

What I love about programing is it is akin to solving a logic puzzle. You have all the pieces you need to solve it, you're allowed to search the internet for assistance, but the internet will not give you the answer, you still have to find it on you own. If you have misread one letter or decoded one word wrong, you will not find the answer. You can spend twice as much time trying to solve the problem than actually solving the puzzle. That is why in coding, as with logic puzzles,

May 21, 2016 12:00 AM

May 20, 2016

Lauren Devitt

For love of numbers...

My love of mathematics started with a love of numbers. I enjoyed finding all possible ways I could add, subtract, and multiply different numbers in order to find a specific number, say twelve.  Twelve was my favorite number; I loved twelve. We become attached to numbers that have a deeper meaning to us; numbers that make us feel, make us remember. Sesame Street had the “Pinball Song”; this song consisted of counting to twelve with a catchy jingle every child could remember.

May 20, 2016 12:00 AM

May 09, 2016

Extending Matroid Functionality Google Summer of Code 2016

Getting Started

I first heard about Google Summer of Code a little over a year ago. It was something that I wanted to do for several reasons. I only had a chance to take a couple of programing classes in undergrad. (I didn't realized that I liked it till part way through my Junior year.) Since then, I've wanted to grow the length and complexity of projects that I was capable of successfully working on. Secondly, I like the idea of open source resources, because its free, and that lets poor college students use cool resources.

My project is building and expanding tools in Sage to be used by people studying matroid theory. A matroid is a notion of independence that generalizes the independence structure that is found in vector spaces and that comes from looking at cycleless subgraphs of graphs. Sage already has a lot of tools that let people work with matroids, mostly created by Stefan van Zwam and Rudi Pendavingh. My project focuses on a small collection of new tools.

I'll be working with Stefan and Michael on this project.

by [email protected] (Tara) at May 09, 2016 05:29 PM

March 10, 2016

William Stein

Open source is now ready to compete with Mathematica for use in the classroom

When I think about what makes SageMath different, one of the most fundamental things is that it was created by people who use it every day.  It was created by people doing research math, by people teaching math at universities, and by computer programmers and engineers using it for research.  It was created by people who really understand computational problems because we live them.  We understand the needs of math research, teaching courses, and managing an open source project that users can contribute to and customize to work for their own unique needs.

The tools we were using, like Mathematica, are clunky, very expensive, and just don't do everything we need.  And worst of all, they are closed source software, meaning that you can't even see how they work, and can't modify them to do what you really need.  For teaching math, professors get bogged down scheduling computer labs and arranging for their students to buy and install expensive software.

So I started SageMath as an open source project at Harvard in 2004, to solve the problem that other math software is expensive, closed source, and limited in functionality, and to create a powerful tool for the students in my classes.  It wasn't a project that was intended initially as something to be used by hundred of thousands of people.  But as I got into the project and as more professors and students started contributing to the project, I could clearly see that these weren't just problems that pissed me off, they were problems that made everyone angry.

The scope of SageMath rapidly expanded.  Our mission evolved to create a free open source serious competitor to Mathematica and similar closed software that the mathematics community was collective spending hundreds of millions of dollars on every year. After a decade of work by over 500 contributors, we made huge progress.

But installing SageMath was more difficult than ever.  It was at that point that I decided I needed to do something so that this groundbreaking software that people desperately needed could be shared with the world.

So I created SageMathCloud, which is an extremely powerful web-based collaborative way for people to easily use SageMath and other open source software such as LaTeX, R, and Jupyter notebooks easily in their teaching  and research.   I created SageMathCloud based on nearly two decades of experience using math software in the classroom and online, at Harvard, UC San Diego, and University of Washington.

SageMathCloud is commercial grade, hosted in Google's cloud, and very large classes are using it heavily right now.  It solves the installation problem by avoiding it altogether.  It is entirely open source.

Open source is now ready to directly compete with Mathematica for use in the classroom.  They told us we could never make something good enough for mass adoption, but we have made something even better.  For the first time, we're making it possible for you to easily use Python and R in your teaching instead of Mathematica; these are industry standard mainstream open source programming languages with strong support from Google, Microsoft and other industry leaders.   For the first time, we're making it possible for you to collaborate in real time and manage your course online using the same cutting edge software used by elite mathematicians at the best universities in the world.

A huge community in academia and in industry are all working together to make open source math software better at a breathtaking pace, and the traditional closed development model just can't keep up.

by William Stein ([email protected]) at March 10, 2016 08:02 AM

March 03, 2016

Liang Ze

Noncommutative Algebras in Sage

In this post, I’ll demonstrate 3 ways to define non-commutative rings in Sage. They’re essentially different ways of expressing the non-commutative relations in the ring:

  1. Via g_algebra: define the relations directly
  2. Via NCPolynomialRing_plural: define a pair of structural matrices
  3. Via a quotient of a letterplace ring: define the ideal generated by the relations (only works for homogeneous relations)

As far as I know, all 3 methods rely on Sage’s interface with Singular and its non-commutative extension Plural.

In addition to all the documentation linked above, I also relied heavily on Greuel and Pfister’s A Singular Introduction to Commutative Algebra. Despite the title, it does have a pretty substantial section (1.9) devoted to non-commutative $G$-algebras.

$U(\mathfrak{sl}_2)$ and its homogenization

The running example throughout this post will be the universal enveloping algebra $U(\mathfrak{sl}_2)$ over $\mathbb{Q}$.

We’ll define this to be the (non-commutative) $\mathbb{Q}$-algebra $U$ with generators $e,f,h$ subject to the relations

If we set $e,f,h$ to have degree 1, these relations are not homogeneous. Their left-hand sides only have degree 2 terms, while their right-hand sides have degree 1 terms as well. This is fine with the first two methods, but won’t work for method 3 (which requires homogeneous relations).

To demonstrate the third method, we’ll define the $\mathbb{Q}$-algebra $H$ with generators $e,f,h,t$ subject to the homogeneous relations

We can obtain $U$ both as a quotient and a localization of $H$:


Using the g_algebra method of Sage’s FreeAlgebra class, we can simply plug our noncommutative relations in, and get our non-commutative ring. This is about as easy as it gets:

Let’s unravel what’s going on here.

Monomial orderings and PBW basis

Most algorithms for commutative and non-commutative rings require an ordering on the generators. In our case, let’s use the ordering

This is implicitly stated in our code: we wrote F.<e,f,h> instead of F.<h,e,f>, for example.

A standard word is a monomial of the form

In the polynomial ring $\mathbb{Q}[e,f,h]$, every monomial can be expressed in this form, so the set of standard words forms a $\mathbb{Q}$-basis for $\mathbb{Q}[e,f,h]$.

In a non-commutative ring, whether or not the standard words form a basis depends on what relations we have. Such a basis, if it exists, is called a PBW basis.

The free algebra $F = \mathbb{Q}\langle e,f,h\rangle$ has no relations, so does not have a PBW basis. Fortunately, our algebra $U$ does have a PBW basis.

This means that we can always express a non-standard monomial (e.g. $fe$) as a sum of standard monomials (e.g. $ef - h$). The non-commutative relations that define $U$ can thus be thought of as an algorithm for turning non-standard words into sums of standard words.

To do this in Sage, we define a dictionary whose keys are non-standard words and values are the standard words they become.

In the above example, our dictionary was short enough to fit into one line, but we could also define a dictionary separately and pass it into g_algebra:

It’s very important that the keys are non-standard words and the values are sums of standard words. Mathematically, the relation $fe = ef - h$ is the same as $ef = fe + h$, but if we replace f*e : e*f - h with e*f : f*e + h in the code, we’ll get an error (try it!).

What are $G$-algebras?

The reason why $U$ has a PBW basis is because it is a $G$-algebra. Briefly, $G$-algebras are algebras whose relations satisfy certain non-degeneracy conditions that make the algebra nice to work with.

For a full definition of $G$-algebras, refer to A Singular Introduction to Commutative Algebra or the Plural manual.

If $A$ is a $G$-algebra, then it has a PBW basis, is left and right Noetherian, and is an integral domain. More importantly (for this site at least!), it means that we can define $A$ in Singular/Plural, and hence in Sage.

Structural matrices for a $G$-algebra

Another way of writing our non-commutative relations is

where $ * $ denotes element-wise multiplication (so there isn’t any linear algebra going on here; we’re just using matrices to organize the information). Let $N,C,S,D$ be the matrices above, in that order, so that $N = C*S + D$.

If we let $x_1 = e, x_2 = f, x_3 = h$ (so that $x_i \leq x_j$ if $i \leq j$) then for $i < j$

In other words, $N$ contains the non-standard words that we’re trying to express in terms of the standard words in $S$.

The matrices $C$ and $D$ are called the structural matrices of the $G$-algebra, and their entries are such that our relations may be written

with zeros everywhere else ($i \geq j$). If $C = D = 0$, the resulting algebra will be commutative.

We can use the structural matrices $C$ and $D$ to define our algebra via Sage’s NCPolynomialRing_plural function (note that Python uses zero-indexing for matrices):

Note that R is a commutative polynomial ring. In fact, up till the point where we call NCPolynomialRing_plural, even the variables e,f,h are treated as commutative variables.

This method of defining $U$ is considerably longer and more prone to mistakes than using g_algebra. As stated in the documentation, this is not intended for use! I’m including it here because this is essentially how one would go about defining a $G$-algebra in Singular. In fact, the Sage method g_algebra calls NCPolynomialRing_plural, which in turn calls Singular.

Quotients of letterplace rings

Our final method for defining non-commutative rings makes use of Sage’s implementation of Singular’s letterplace rings.

As mentioned at the start of this post, this method requires the relations to be homogeneous, so we’ll work with $H$ instead of $U$.

Let $\mathbb{Q}\langle e,f,h,t \rangle$ be the free algebra on 4 variables. Consider the two-sided ideal $I$ generated by the relations for $H$:


This can be expressed Sage-ly:

The expression F*I*F is the two-sided ideal generated by elements in the list I.

Although $U$ cannot be defined using this method, $H$ can be defined using all three methods. As a (fun?) exercise, try defining $H$ using the other two methods.


These methods can be used to define many non-commutative algebras such as the Weyl algebra and various enveloping algebras of Lie algebras. One can also define these algebras over fields other than $\mathbb{Q}$, such as $\mathbb{C}$ or $\mathbb{F}_p$.

However, we cannot define algebras over $\mathbb{Q}(q)$, the fraction field of $\mathbb{Q}[q]$:

This is a problem if we want to define rings with relations such as

Such relations occur frequently when studying quantum groups, for example.

This is suprising, because one can easily define $\mathbb{Q}(q)$ and non-commutative $\mathbb{Q}(q)$-algebras in Singular/Plural, which is what Sage is using. It seems that the problem is in Sage’s wrapper for Singular/Plural, because Sage can’t even pass the ring $\mathbb{Q}(q)$ to Singular.

There’s a trac ticket for this problem, but until it gets resolved, we’ll just have to define such rings directly in Singular/Plural. Thanks to the amazing capabilities of the Sage Cell Server, we’ll do this in the next post!

March 03, 2016 12:00 AM

February 25, 2016

William Stein

"If you were new faculty, would you start something like SageMathCloud sooner?"

I was recently asked by a young academic: "If you were a new faculty member again, would you start something like SageMathCloud sooner or simply leave for industry?" The academic goes on to say "I am increasingly frustrated by continual evidence that it is more valuable to publish a litany of computational papers with no source code than to do the thankless task of developing a niche open source library; deep mathematical software is not appreciated by either mathematicians or the public."

I wanted to answer that "things have gotten better" since back in 2000 when I started as an academic who does computation. Unfortunately, I think they have gotten worse. I do not understand why. In fact, this evening I just received the most recent in a long string of rejections by the NSF.

Regarding a company versus taking a job in industry, for me personally there is no point in starting a company unless you have a goal that can only be accomplished via a company, since building a business from scratch is extremely hard and has little to do with math or research. I do have such a goal: "create a viable open source alternative to Mathematica, etc...". I was very clearly told by Michael Monagan (co-founder of Maplesoft) in 2006 that this goal could not be accomplished in academia, and I spent the last 10 years trying to prove him wrong.

On the other hand, leaving for a job in industry means that your focus will switch from "pure" research to solving concrete problems that make products better for customers. That said, many of the mathematicians who work on open source math software do so because they care so much about making the experience of using math software much better for the math community. What often drives Sage developers is exactly the sort of passionate care for "consumer focus" and products that also makes one successful in industry. I'm sure you know exactly what I mean, since it probably partly motivates your work. It is sad that the math community turns its back on such people. If the community were to systematically embrace them, instead of losing all these $300K+/year engineers to mathematics entirely -- which is exactly what we do constantly -- the experience of doing mathematics could be massively improved into the future. But that is not what the community has chosen to do. We are shooting ourselves in the foot.

Now that I have seen how academia works from the inside over 15 years I'm starting to understand a little why these things change very slowly, if ever. In the mathematics department I'm at, there are a small handful of research areas in pure math, and due to how hiring works (voting system, culture, etc.) we have spent the last 10 years hiring in those areas little by little (to replace people who die/retire/leave). I imagine most mathematics departments are very similar. "Open source software" is not one of those traditional areas. Nobody will win a Fields Medal in it.

Overall, the mathematical community does not value open source mathematical software in proportion to its value, and doesn't understand its importance to mathematical research and education. I would like to say that things have got a lot better over the last decade, but I don't think they have. My personal experience is that much of the "next generation" of mathematicians who would have changed how the math community approaches open source software are now in industry, or soon will be, and hence they have no impact on academic mathematical culture. Every one of my Ph.D. students are now at Google/Facebook/etc.

We as a community overall would be better off if, when considering how we build departments, we put "mathematical software writers" on an equal footing with "algebraic geometers". We should systematically consider quality open source software contributions on a potentially equal footing with publications in journals.

To answer the original question, YES, knowing what I know now, I really wish I had started something like SageMathCloud sooner. In fact, here's the previously private discussion from eight years ago when I almost did.


- There is a community generated followup ...

by William Stein ([email protected]) at February 25, 2016 03:14 PM

February 24, 2016

William Stein

Elliptic curves: Magma versus Sage

Elliptic Curves

Elliptic curves are certain types of nonsingular plane cubic curves, e.g., y^2 = x^3 + ax +b, which are central to both number theory and cryptography (e.g., they are used to compute the hash in bitcoin).

Magma and Sage

If you want to do a wide range of explicit computations with elliptic curves, for research purposes, you will very likely use SageMath or Magma. If you're really serious, you'll use both.

Both Sage and Magma are far ahead of all other software (e.g., Mathematica, Maple and Matlab) for elliptic curves.

A Little History

When I started contributing to Magma in 1999, I remember that Magma was way, way behind Pari. I remember having lunch with John Cannon (founder of Magma), and telling him I would no longer have to use Pari if only Magma would have dramatically faster code for computing point counts on elliptic curves.

A few years later, John wisely hired Mark Watkins to work fulltime on Magma, and Mark has been working there for over a decade. Mark is definitely one of the top people in the world at implementing (and using) computational number theory algorithms, and he's ensured that Magma can do a lot. Some of that "do a lot" means catching up with (and surpassing!) what was in Pari and Sage for a long time (e.g., point counting, p-adic L-functions, etc.)

However, in addition, many people have visited Sydney and added extremely deep functionality for doing higher descents to Magma, which is not available in any open source software. Search for Magma in this paper to see how, even today, there seems to be no open source way to compute the rank of the curve y2 = x3 + 169304x + 25788938.  (The rank is 0.)

Two Codebases

There are several elliptic curves algorithms available only in Magma (e.g., higher descents) ... and some available only in Sage (L-function rank bounds, some overconvergent modular symbols, zeros of L-functions, images of Galois representations). I could be wrong about functionality not being in Magma, since almost anything can get implemented in a year...

The code bases are almost completely separate, which is a very good thing. Any time something gets implemented in one, it gets (or should get) tested via a big run on elliptic curves up to some bound in the other. This sometimes results in bugs being found. I remember refereeing the "integral points" code in Sage by running it against all curves up to some bound and comparing to what Magma output, and getting many discrepancies, which showed that there were bugs in both Sage and Magma.
Thus we would be way better off if Sage could do everything Magma does (and vice versa).

by William Stein ([email protected]) at February 24, 2016 12:07 PM

February 18, 2016

Sébastien Labbé

unsupported operand parent for *, Matrix over number field, vector over symbolic ring

Yesterday I received this email (in french):

avec Thomas on a une question bête:


J'aimerais multiplier une matrice avec des coefficients en x par un vecteur
contenant des variables a et b.  Il dit "unsupported operand parent for *,
Matrix over number field, vector over symbolic ring"

Est ce grave ?

Here is my answer. Indeed, in Sage, symbolic variables can't multiply with elements in an Number Field in x:

sage: x = var('x')
sage: K.<x> = NumberField(x*x-x-1)
sage: a = var('a')
sage: a*x
Traceback (most recent call last)
TypeError: unsupported operand parent(s) for '*': 'Symbolic Ring' and
'Number Field in x with defining polynomial x^2 - x - 1'

But, we can define a polynomial ring with variables in a,b and coefficients in the NumberField. Then, we are able to multiply a with x:

sage: x = var('x')
sage: K.<x> = NumberField(x*x-x-1)
sage: K
Number Field in x with defining polynomial x^2 - x - 1
sage: R.<a,b> = K['a','b']
sage: R
Multivariate Polynomial Ring in a, b over Number Field in x with
defining polynomial x^2 - x - 1
sage: a*x

With two square brackets, we obtain powers series:

sage: R.<a,b> = K[['a','b']]
sage: R
Multivariate Power Series Ring in a, b over Number Field in x with
defining polynomial x^2 - x - 1
sage: a*x*b

It works with matrices:

sage: MS = MatrixSpace(R,2,2)
sage: MS
Full MatrixSpace of 2 by 2 dense matrices over Multivariate Power
Series Ring in a, b over Number Field in x with defining polynomial
x^2 - x - 1
sage: MS([0,a,b,x])
[  0   a]
[  b (x)]
sage: m1 = MS([0,a,b,x])
sage: m2 = MS([0,a+x,b*b+x,x*x])
sage: m1 + m2 * m1
[              (x)*b + a*b       (x + 1) + (x + 1)*a]
[                (x + 2)*b (3*x + 1) + (x)*a + a*b^2]

by Sébastien Labbé at February 18, 2016 10:17 AM

February 17, 2016

Liang Ze

The Weyl Algebra and $\mathfrak{sl}_2$

I’ve been away from this blog for quite a while - almost a year, in fact! My excuses are my wedding and the prelims (a.k.a. quals), as well as all the preparation that had to go into them (although, to be honest, those things only occupied me till September last year!).

Looking back at my previous posts, I’ve realized that in attempting to teach both math and code, I probably ended up doing neither. This is really not the best place to learn representation theory (for example) - there are better books and blogs out there. Also, most of the code that I wrote to illustrate those posts feels contrived, and neither highlights Sage’s strengths nor reflects how I normally use Sage for my assignments and projects.

I’ve thus decided to write shorter posts with code that I actually use (on SageMathCloud), along with some explanations of the code. Lately, I’ve been writing code for non-commutative algebra and combinatorics, so today I’ll start with a simple example of a non-commutative algebra.

The Weyl Algebra

The $1$-dim. Weyl algebra is the (non-commutative) algebra generated by $x, \partial_x$ subject to the relations

If we treat $x$ as “multiplication by $x$” and $\partial_x$ as “differentiation w.r.t. $x$”, this relation is really just an application of the chain rule:

We can generalize to higher dimensions: the $n$-dim. Weyl algebra is the algebra generated by $x_1,\dots,x_n,\partial_{x_1},\dots,\partial_{x_n}$ quotiented by the relations that arise from treating them as the obvious operators on $\mathbb{F}[x_1,\dots,x_n]$.

Weyl algebras in Sage

It’s easy to define the Weyl algebra in Sage:

Calling inject_variables allows us to use the operators x,y,z,dx,dy,dz in subsequent code (where dx denotes $\partial_x$, etc).

One can do rather complicated computations:

By default, Sage chooses to represent monomials with x,y,z in front of dx,dy,dz:

Keep in mind that x does not refer to the polynomial $x \in \mathbb{F}[x]$, so one should not expect dx*x to be 1.

(For some reason show does not give the right output. Try show(x) or show(x*dx), for example.)

Representations of $\mathfrak{sl}_2$

It turns out that the $1$-dim. Weyl algebra gives a representation of $\mathfrak{sl}_2(\mathbb{F})$.

The Lie algebra $\mathfrak{sl}_2(\mathbb{F})$ is generated by $E,F,H$ subject to the relations

Define the following elements of the $1$-dim. Weyl algebra:

We can use Sage to quickly verify that these elements indeed satisfy the relations for $\mathfrak{sl}_2$ (using the commutator as the Lie bracket i.e. $[A,B] = AB - BA$):

Working over $\mathbb{C}$, this action of $\mathfrak{sl}_2(\mathbb{C})$ makes $\mathbb{C}[x]$ a Verma module of highest weight $0$.

In fact, we can make $\mathbb{C}[x]$ a Verma module of highest weight $c$ for any $c \in \mathbb{C}$ by using:

We verify this again in Sage:

In subsequent posts, I’ll talk more about defining other non-commutative algebras in Sage and Singular.

February 17, 2016 12:00 AM

January 15, 2016

William Stein

Thinking of using SageMathCloud in a college course?

SageMathCloud course subscriptions

"We are  college instructors of the calculus sequence and ODE’s.  If the college were to purchase one of the upgrades for us as we use Sage with our students, who gets the benefits of the upgrade?  Is is the individual students that are in an instructor’s Sage classroom or is it the  collaborators on an instructor’s project?"

If you were to purchase just the $7/month plan and apply the upgrades to *one* single project, then all collaborators on that one project would benefit from those upgrades while using that project.

If you were to purchase a course plan for say $399/semester, then you could apply the upgrades (network access and members only hosting) to 70 projects that you might create for a course.   When you create a course by clicking +New, then "Manage a Course", then add students, each student has their own project created automatically.  All instructors (anybody who is a collaborator on the project where you clicked "Manage a course") is also added to the student's project. In course settings you can easily apply the upgrades you purchase to all projects in the course.  

Also I'm currently working on a new feature where instructors may choose to require all students in their course to pay for the upgrade themselves.  There's a one time $9/course fee paid by the student and that's it.  At some colleges (in some places) this is ideal, and at other places it's not an option at all.   I anticipate releasing this very soon.

Getting started with SageMathCloud courses

You can fully use the SMC course functionality without paying anything in order to get familiar with it and test it out.  The main benefit of paying is that you get network access and all projects get moved to members only servers, which are much more robust; also, we greatly prioritize support for paying customers.   

This blog post is an overview of using SMC courses:


This has some screenshots and the second half is about courses:


Here are some video tutorials made by an instructor that used SMC with a large class in Iceland recently:


Note that the above videos show the basics of courses, then talk specifically about automated grading of Jupyter notebooks.  That might not be at all what you want to do -- many math courses use Sage worksheets, and probably don't automate the grading yet.

Regarding using Sage itself for teaching your courses, check out the free pdf book to "Sage for Undergraduates" here, which the American Mathematical Society just published (there is also a very nice print version for about $23):


by William Stein ([email protected]) at January 15, 2016 09:14 AM

January 08, 2016

William Stein

Mathematics Graduate School: preparation for non-academic employment

This is about my personal experience as a mathematics professor whose students all have non-academic jobs that they love. This is in preparation for a panel at the Joint Mathematics Meetings in Seattle.

My students and industry
My graduated Ph.D. students:
  • 3 at Google
  • 1 at Facebook
  • 1 at CCR
My graduating student (Hao Chen):
  • Applying for many postdocs
  • But just did summer internship at Microsoft Research with Kristin. (I’ve had four students do summer internships with Kristin)
All my students:
  • Have done a lot of Software development, maybe having little to do with math, e.g., “developing the Cython compiler”, “transition the entire Sage project to git”, etc.
  • Did a thesis squarely in number theory, with significant theoretical content.
  • Guilt (or guilty pleasure?) spending time on some programming tasks instead of doing what they are “supposed” to do as math grad students.

Me: academia and industry

  • Math Ph.D. from Berkeley in 2000; many students of my advisor (Lenstra) went to work at CCR after graduating…
  • Academia: I’m a tenured math professor (since 2005) – number theory.
  • Industry: I founded a Delaware C Corp (SageMath, Inc.) one year ago to “commercialize Sage” due to VERY intense frustration trying to get grant funding for Sage development. Things have got so bad, with so many painful stupid missed opportunities over so many years, that I’ve given up on academia as a place to build Sage.
Reality check: Academia values basic research, not products. Industry builds concrete valuable products. Not understanding this is a recipe for pain (at least it has been for me).

Advice for students from students

Robert Miller (Google)

My student Robert Miller’s post on Facebook yesterday: “I LOVE MY JOB”. Why: “Today I gave the first talk in a seminar I organized to discuss this result: ‘Graph Isomorphism in Quasipolynomial Time’. Dozens of people showed up, it was awesome!”
Background: When he was my number theory student, working on elliptic curves, he gave a talk about graph theory in Sage at a Sage Days (at IPAM). His interest there was mainly in helping an undergrad (Emily Kirkman) with a Sage dev project I hired her to work on. David Harvey asked: “what’s so hard about implementing graph isomorphism”, and Robert wanted to find out, so he spent months doing a full implementation of Brendan McKay’s algorithm (the only other one). This had absolutely nothing to do with his Ph.D. thesis work on the Birch and Swinnerton-Dyer conjecture, but I was very supportive.

Craig Citro (Google)

Craig Citro did a Ph.D. in number theory (with Hida), but also worked on Sage aLOT as a grad student and postdoc. He’s done a lot of hiring at Google. He says: “My main piece of advice to potential google applicants is ‘start writing as much code as you can, right now.’ Find out whether you’d actually enjoyworking for a company like Google, where a large chunk of your job may be coding in front of a screen. I’ve had several friends from math discover (the hard way) that they don’t really enjoy full-time programming (any more than they enjoy full-time teaching?).”
“Start throwing things on github now. Potential interviewers are going to check out your github profile; having some cool stuff at the top is great, but seeing a regular stream of commits is also a useful signal.”

Robert Bradshaw (Google)

“A lot of mathematicians are good at (and enjoy) programming. Many of them aren’t (and don’t). Find out. Being involved in Sage is significantly more than just having taken a suite of programming courses or hacking personal scripts on your own: code reviews, managing bugs, testing, large-scale design, working with others’ code, seeing projects through to completion, and collaborating with others, local and remote, on large, technical projects are all important. It demonstrates your passion.”

Rado Kirov (Google)

Robert Bradshaw said it before me, but I have to repeat. Large scale software development requires exposure to a lot of tooling and process beyond just writing code - version control, code reviews, bug tracking, code maintenance, release process, coordinating with collaborators. Contributing to an active open-source project with a large number of contributors like Sage, is a great way to experience all that and see if you would like to make it your profession. A lot of mathematicians write clever code for their research, but if less than 10 people see it and use it, it is not a realistic representation of what working as a software engineer feels like. 

The software industry is in large demand of developers and hiring straight from academia is very common. Before I got hired by Google, the only software development experience on my resume was the Sage graph editor. Along with solid understanding of algorithms and data structures that was enough to get in."

David Moulton (Google)

“Google hires mathematicians now as quantitative analysts = data engineers. Google is very flexible for a tech company about the backgrounds of its employees. We have a long-standing reading group on category theory, and we’re about to start one on Babai’s recent quasi- polynomial-time algorithm for graph isomorphism. And we have a math discussion group with lots of interesting math on it.”

My advice for math professors

Obviously, encourage your students to get involved in open source projects like Sage, even if it appears to be a waste of time or distraction from their thesis work (this will likely feel very counterintuitive you’ll hate it).
At Univ of Washington, a few years ago I taught a graduate-level course on Sage development. The department then refused to run it again as a grad course, which was frankly very frustrating to me. This is exactly the wrong thing to do if you want to increase the options of your Ph.D. students for industry jobs. Maybe quit trying to train our students to be only math professors, and instead give them a much wider range of options.

by William Stein ([email protected]) at January 08, 2016 10:25 PM

November 30, 2015

Sébastien Labbé

slabbe-0.2.spkg released

These is a summary of the functionalities present in slabbe-0.2.spkg optional Sage package. It works on version 6.8 of Sage but will work best with sage-6.10 (it is using the new code for cartesian_product merged the the betas of sage-6.10). It contains 7 new modules:

  • finite_word.py
  • language.py
  • lyapunov.py
  • matrix_cocycle.py
  • mult_cont_frac.pyx
  • ranking_scale.py
  • tikz_picture.py

Cheat Sheets

The best way to have a quick look at what can be computed with the optional Sage package slabbe-0.2.spkg is to look at the 3-dimensional Continued Fraction Algorithms Cheat Sheets available on the arXiv since today. It gathers a handful of informations on different 3-dimensional Continued Fraction Algorithms including well-known and old ones (Poincaré, Brun, Selmer, Fully Subtractive) and new ones (Arnoux-Rauzy-Poincaré, Reverse, Cassaigne).



sage -i http://www.slabbe.org/Sage/slabbe-0.2.spkg    # on sage 6.8
sage -p http://www.slabbe.org/Sage/slabbe-0.2.spkg    # on sage 6.9 or beyond


Computing the orbit of Brun algorithm on some input in \(\mathbb{R}^3_+\) including dual coordinates:

sage: from slabbe.mult_cont_frac import Brun
sage: algo = Brun()
sage: algo.cone_orbit_list((100, 87, 15), 4)
[(13.0, 87.0, 15.0, 1.0, 2.0, 1.0, 321),
 (13.0, 72.0, 15.0, 1.0, 2.0, 3.0, 132),
 (13.0, 57.0, 15.0, 1.0, 2.0, 5.0, 132),
 (13.0, 42.0, 15.0, 1.0, 2.0, 7.0, 132)]

Computing the invariant measure:

sage: fig = algo.invariant_measure_wireframe_plot(n_iterations=10^6, ndivs=30)
sage: fig.savefig('a.png')

Drawing the cylinders:

sage: cocycle = algo.matrix_cocycle()
sage: t = cocycle.tikz_n_cylinders(3, scale=3)
sage: t.png()

Computing the Lyapunov exponents of the 3-dimensional Brun algorithm:

sage: from slabbe.lyapunov import lyapunov_table
sage: lyapunov_table(algo, n_orbits=30, n_iterations=10^7)
  30 succesfull orbits    min       mean      max       std
  $\theta_1$              0.3026    0.3045    0.3051    0.00046
  $\theta_2$              -0.1125   -0.1122   -0.1115   0.00020
  $1-\theta_2/\theta_1$   1.3680    1.3684    1.3689    0.00024

Dealing with tikzpictures

Since I create lots of tikzpictures in my code and also because I was unhappy at how the view command of Sage handles them (a tikzpicture is not a math expression to put inside dollar signs), I decided to create a class for tikzpictures. I think this module could be usefull in Sage so I will propose its inclusion soon.

I am using the standalone document class which allows some configurations like the border:

sage: from slabbe import TikzPicture
sage: g = graphs.PetersenGraph()
sage: s = latex(g)
sage: t = TikzPicture(s, standalone_configs=["border=4mm"], packages=['tkz-graph'])

The repr method does not print all of the string since it is often very long. Though it shows how many lines are not printed:

sage: t
\useasboundingbox (0,0) rectangle (5.0cm,5.0cm);
... 68 lines not printed (3748 characters in total) ...

There is a method to generates a pdf and another for generating a png. Both opens the file in a viewer by default unless view=False:

sage: pathtofile = t.png(density=60, view=False)
sage: pathtofile = t.pdf()

Compare this with the output of view(s, tightpage=True) which does not allow to control the border and also creates a second empty page on some operating system (osx, only one page on ubuntu):

sage: view(s, tightpage=True)

One can also provide the filename where to save the file in which case the file is not open in a viewer:

sage: _ = t.pdf('petersen_graph.pdf')

Another example with polyhedron code taken from this Sage thematic tutorial Draw polytopes in LateX using TikZ:

sage: V = [[1,0,1],[1,0,0],[1,1,0],[0,0,-1],[0,1,0],[-1,0,0],[0,1,1],[0,0,1],[0,-1,0]]
sage: P = Polyhedron(vertices=V).polar()
sage: s = P.projection().tikz([674,108,-731],112)
sage: t = TikzPicture(s)
sage: t
        [x={(0.249656cm, -0.577639cm)},
        y={(0.777700cm, -0.358578cm)},
        z={(-0.576936cm, -0.733318cm)},
... 80 lines not printed (4889 characters in total) ...
\node[vertex] at (1.00000, 1.00000, -1.00000)     {};
\node[vertex] at (1.00000, 1.00000, 1.00000)     {};
sage: _ = t.pdf()

by Sébastien Labbé at November 30, 2015 11:53 AM

November 19, 2015

William Stein

"Prime Numbers and the Riemann Hypothesis", Cambridge University Press, and SageMathCloud


Barry Mazur and I spent over a decade writing a popular math book "Prime Numbers and the Riemann Hypothesis", which will be published by Cambridge Univeristy Press in 2016.  The book involves a large number of illustrations created using SageMath, and was mostly written using the LaTeX editor in SageMathCloud.

This post is meant to provide a glimpse into the writing process and also content of the book.

This is about making research math a little more accessible, about math education, and about technology.

Intended Audience: Research mathematicians! Though there is no mathematics at all in this post.

The book is here: http://wstein.org/rh/
Download a copy before we have to remove it from the web!

Goal: The goal of our book is simply to explain what the Riemann Hypothesis is really about. It is a book about mathematics by two mathematicians. The mathematics is front and center; we barely touch on people, history, or culture, since there are already numerous books that address the non-mathematical aspects of RH.  Our target audience is math-loving high school students, retired electrical engineers, and you.

Clay Mathematics Institute Lectures: 2005

The book started in May 2005 when the Clay Math Institute asked Barry Mazur to give a large lecture to a popular audience at MIT and he chose to talk about RH, with me helping with preparations. His talk was entitled "Are there still unsolved problems about the numbers 1, 2, 3, 4, ... ?"

See http://www.claymath.org/library/public_lectures/mazur_riemann_hypothesis.pdf

Barry Mazur receiving a prize:

Barry's talk went well, and we decided to try to expand on it in the form of a book. We had a long summer working session in a vacation house near an Atlantic beach, in which we greatly refined our presentation. (I remember that I also finally switched from Linux to OS X on my laptop when Ubuntu made a huge mistake pushing out a standard update that hosed X11 for everybody in the world.)

Classical Fourier Transform

Going beyond the original Clay Lecture, I kept pushing Barry to see if he could describe RH as much as possible in terms of the classical Fourier transform applied to a function that could be derived via a very simple process from the prime counting function pi(x). Of course, he could. This led to more questions than it answered, and interesting numerical observations that are more precise than analytic number theorists typically consider.

Our approach to writing the book was to try to reverse engineer how Riemann might have been inspired to come up with RH in the first place, given how Fourier analysis of periodic functions was in the air. This led us to some surprisingly subtle mathematical questions, some of which we plan to investigate in research papers. They also indirectly play a role in Simon Spicer's recent UW Ph.D. thesis. (The expert analytic number theorist Andrew Granville helped us out of many confusing thickets.)

In order to use Fourier series we naturally have to rely heavily on Dirac/Schwartz distributions.


University of Washington has a great program called SIMUW: "Summer Institute for Mathematics at Univ of Washington.'' It's for high school; admission is free and based on student merit, not rich parents, thanks to an anonymous wealthy donor!  I taught a SIMUW course one summer from the RH book.  I spent one very intense week on the RH book, and another on the Birch and Swinnerton-Dyer conjecture.

The first part of our book worked well for high school students. For example, we interactively worked with prime races, multiplicative parity, prime counting, etc., using Sage interacts. The students could also prove facts in number theory. They also looked at misleading data and tried to come up with conjectures. In algebraic number theory, usually the first few examples are a pretty good indication of what is true. In analytic number theory, in contrast, looking at the first few million examples is usually deeply misleading.

Reader feedback: "I dare you to find a typo!"

In early 2015, we posted drafts on Google+ daring anybody to find typos. We got massive feedback. I couldn't believe the typos people found. One person would find a subtle issue with half of a bibliography reference in German, and somebody else would find a different subtle mistake in the same reference. Best of all, highly critical and careful non-mathematicians read straight through the book and found a large number of typos and minor issues that were just plain confusing to them, but could be easily clarified.

Now the book is hopefully not riddled with errors. Thanks entirely to the amazingly generous feedback of these readers, when you flip to a random page of our book (go ahead and try), you are now unlikely to see a typo or, what's worse, some corrupted mathematics, e.g., a formula with an undefined symbol.

Designing the cover

Barry and Gretchen Mazur, Will Hearst, and I designed a cover that combined the main elements of the book: title, Riemann, zeta:

Then designers at CUP made our rough design more attractive according their tastes. As non-mathematician designers, they made it look prettier by messing with the Riemann Zeta function...

Publishing with Cambridge University Press

Over years, we talked with people from AMS, Springer-Verlag and Princeton Univ Press about publishing our book. I met CUP editor Kaitlin Leach at the Joint Mathematics Meetings in Baltimore, since the Cambridge University Press (CUP) booth was directly opposite the SageMath booth, which I was running. We decided, due to their enthusiasm, which lasted more than for the few minutes while talking to them (!), past good experience, and general frustration with other publishers, to publish with CUP.

What is was like for us working with CUP

The actual process with CUP has had its ups and downs, and the production process has been frustrating at times, being in some ways not quite professional enough and in other ways extremely professional. Traditional book publication is currently in a state of rapid change. Working with CUP has been unlike my experiences with other publishers.

For example, CUP was extremely diligent putting huge effort into tracking down permissions for every one of the images in our book. And they weren't satisfy with a statement on Wikipedia that "this image is public domain", if the link didn't work. They tracked down alternatives for all images for which they could get permissions (or in some cases have us partly pay for them). This is in sharp contrast to my experience with Springer-Verlag, which spent about one second on images, just making sure I signed a statement that all possible copyright infringement was my fault (not their's).

The CUP copyediting and typesetting appeared to all be outsourced to India, organized by people who seemed far more comfortable with Word than LaTeX. Communication with people that were being contracted out about our book's copyediting was surprisingly difficult, a problem that I haven't experienced before with Springer and AMS. That said, everything seems to have worked out fine so far.

On the other hand, our marketing contact at CUP mysteriously vanished for a long time; evidently, they had left to another job, and CUP was recruiting somebody else to take over. However, now there are new people and they seem extremely passionate!

The Future

I'm particularly excited to see if we can produce an electronic (Kindle) version of the book later in 2016, and eventually a fully interactive complete for-pay SageMathCloud version of the book, which could be a foundation for something much broader with publishers, which addresses the shortcoming of the Kindle format for interactive computational books. Things like electronic versions of books are the sort of things that AMS is frustratingly slow to get their heads around...


  1. Publishing a high quality book is a long and involved process.
  2. Working with CUP has been frustrating at times; however, they have recruited a very strong team this year that addresses most issues.
  3. I hope mathematicians will put more effort into making mathematics accessible to non-mathematicians.
  4. Hopefully, this talk will give provide a more glimpse into the book writing process and encourage others (and also suggest things to think about when choosing a publisher and before signing a book contract!)

by William Stein ([email protected]) at November 19, 2015 11:34 AM

October 21, 2015

The Matroid Union

Google Summer of Code 2015: outcomes

Guest post by Chao Xu

In the summer, I have extended the SAGE code base for matroids for Google Summer of Code. This post shows a few example of it’s new capabilities.


Let $M$ be a matroid with groundset $E$ and rank function $r$. A partition of the groundset $\{E_1,E_2\}$ is a $m$-separation if $|E_1|,|E_2|\geq m$ and $r(E_1)+r(E_2)-r(E)\leq m-1$. $M$ is called $k$-connected if there is no $m$-separation for any $m k$. The Fano matroid is an example of $3$-connected matroid.

The Fano matroid is not $4$-connected. Using the certificate=True field, we can also output a certificate that verify its not-$4$-connectness. The certificate is a $m$-separation where $m 4$. Since we know Fano matroid is $3$-connected, we know the output should be a $3$-separation.

We also have a method for deciding $k$-connectivity, and returning a certificate.

There are 3 algorithms for $3$-connectivity. One can pass it as a string to the algorithm field of is_3connected.

  1. "bridges": The $3$-connectivity algorithm Bixby and Cunningham. [BC79]
  2. "intersection": the matroid intersection based algorithm
  3. "shifting": the shifting algorithm. [Raj87]

The default algorithm is the bridges based algorithm.

The following is an example to compare the running time of each approach.

The new bridges based algorithm is much faster than the previous algorithm in SAGE.

For $4$-connectivity, we tried to use the shifting approach, which has an running time of $O(n^{4.5}\sqrt{\log n})$, where $n$ is the size of the groundset. The intuitive idea is fixing some elements and tries to grow a separator. In theory, the shifting algorithm should be fast if the graph is not $4$-connected, as we can be lucky and find a separator quickly. In practice, it is still slower than the optimized matroid intersection based algorithm, which have a worst case $O(n^5)$ running time. There might be two reasons: the matroid intersection actually avoids the worst case running time in practice, and the shifting algorithm is not well optimized.

Matroid intersection and union

There is a new implementation of matroid intersection algorithm based on Cunningham’s paper [Cun86]. For people who are familiar with blocking flow algorithms for maximum flows, this is the matroid version. The running time is $O(\sqrt{p}rn)$, where $p$ is the size of the maximum common independent set, $r$ is the rank, and $n$ is the size of the groundset. Here is an example of taking matroid intersection between two randomly generated linear matroids.

Using matroid intersection, we have preliminary support for matroid union and matroid sum. Both construction takes a list of matroids.

The matroid sum operation takes disjoint union of the groundsets. Hence the new ground set will have the first coordinate indicating which matroid it comes from, and second coordinate indicate the element in the matroid.

Here is an example of matroid union of two copies of uniform matroid $U(1,5)$ and $U(2,5)$. The output is isomorphic to $U(4,5)$.

One of the application of matroid union is matroid partitioning, which partitions the groundset of the matroid to minimum number of independent sets. Here is an example that partitions the edges of a graph to minimum number of forests.


I would like to thank my mentors Stefan van Zwam and Michael Welsh for helping me with the project. I also like to thank Rudi Pendavingh, who have made various valuable suggestions and implemented many optimizations himself.


[BC79] R.E Bixby, W.H Cunningham. Matroids, graphs and 3-connectivity, J.A Bondy, U.S.R Murty (Eds.), Graph Theory and Related Topics, Academic Press, New York (1979), pp. 91-103.

[Raj87] Rajan, A. (1987). Algorithmic applications of connectivity and related topics in matroid theory. Northwestern university.

[Cun86] William H Cunningham. 1986. Improved bounds for matroid partition and intersection algorithms. SIAM J. Comput. 15, 4 (November 1986), 948-957.

by Guest Contributor at October 21, 2015 03:55 PM

September 30, 2015

William Stein

What is SageMath's strategy?

Here is SageMath's strategy, or at least what my strategy toward SageMath has been for the last 5 years.

Diagnose the problem

Statement of problem: SageMath is not growing.


Facts: Growth in the number of active users [1] of SageMath has stalled since about 2011 (as defined by Google analytics on sagemath.org). From 2008 to 2011, year-on-year growth was about 50%, which isn't great. However, from 2011 to now, year-on-year growth is slightly less than 0%. It was maybe -10% from 2013 to 2014. Incidentally, number of monthly active users of sagemath.org is about 68,652 right now, but the raw number isn't as import as the year-to-year rate of change.

I set an overall mission statement for the Sage project at the outset, which was is to be a viable alternative to Magma, Maple, Mathematica and Matlab. Being a "viable alternative" is something that holds or doesn't for specific people. A useful measure of this mission then is whether or not people use Sage. This is a different metric than trying to argue from "first principles" by making a list of features of each system, comparing benchmarks, etc.

Guiding policies

Statement of policy: focus on undergraduate students in STEM courses (science, tech, engineering, math)


In order for Sage to start growing again, identify groups of people that are not using Sage. Then decide, for each of these groups, who might find value in using Sage, especially if we are able to put work into making it easier for them to benefit from Sage. This is something to re-evaluate periodically. In itself, this is very generic -- it's what any software project that wishes to grow should do. The interesting part is the details.
Some big groups of potential future users of Sage, who use Sage very little now, include
  • employees/engineers in various industries (from defense contractors, to finance, to health care to "data science").
  • researchers in area of mathematics where Sage is currently not popular
  • undergraduate students in STEM courses (science, tech, engineering, math)
I think by far the most promising group is "undergraduate students in STEM courses". In many cases they use no software at all or are unhappy with what they do use. They are extremely cost sensitive. Open source provides a unique advantage in education because it is less expensive than closed source software, and having access to source code is something that instructors consider valuable as part of the learning experience. Also, state of the art performance, which often requires enormous dedicated for-pay work, is frequently not a requirement.


  • (a) Make access to Sage as easy as possible.
  • (b) Encourage the creation of educational resources (books, tutorials, etc.) that make using Sage for particular courses as easy as possible.
  • (c) Implement missing functionality in Sage that is needed in support of undergraduate teaching.


Why don't more undergraduates use Sage? For the most part, students use what they are told to use by their instructors. So why don't instructors chose to use Sage? (a) Sage is not trivial to install (in fact it is incredibly hard to install), (b) There are limited resources (books, tutorials, course materials, etc.) for making using Sage really easy, (c) Sage is missing key functionality needed in support undergraduate teaching.

Regarding (c), in 2008 Sage was utterly useless for most STEM courses. However, over the years things changed for the better, due to the hard work of Rob Beezer, Karl Dieter, Burcin Erocal, and many others. Also, for quite a bit of STEM work, the numerical Python ecosystem (and/or R) provides much of what is needed, and both have evolved enormously in recent years. They are all usable from Sage, and making such use easier should be an extremely high priority. Related -- Bill Hart wrote "I recently sat down with some serious developers and we discussed symbolics in Sage (which I know nothing about). They argued that Sage is not a viable contender in that area, and we discussed some of the possible reasons for that. " The reason is that the symbolic functionality in Sage is motivated by making Sage useful for undergraduate teaching; it has nothing to do with what serious developers in symbolics would care about.

Regarding (b), an NSF (called "UTMOST") helped in this direction... Also, Gregory Bard wrote "Sage for undergraduates", which is exactly the sort of thing we should be very strongly encouraging. This is a book that is published by the AMS and is also freely available. And it squarely addresses exactly this audience. Similarly, the French book that Paul Zimmerman edited is fantastic for France. Let's make an order of magnitude similar resources along these lines! Let's make vastly more tutorials and reference manuals that are "for undergraduates".

Regarding (a), in my opinion the most viable option that fits with current trends in software is a full web application that provides access to Sage. SageMathCloud is what I've been doing in this direction, and it's been growing since 2013 at over 100% year on year, and much is in place so that it could scale up to more users. It still has a huge way to go regarding user friendliness, and it is still losing money every month. But it is a concrete action toward which nontrivial effort has been invested, and it has the potential to solve problem (a) for a large number of potential STEM users. College students very often have extremely good bandwidth coupled with cheap weak laptops, so a web application is the natural solution for them.

Though much has been done to make Sage easier to install on individual computers, it's exactly the sort of problem that money could help solve, but for which we have little money. I'm optimistic that OpenDreamKit will do something in this direction.

[I've made this post motivated by the discussion in this thread.  Also, I used the framework from this book.]

by William Stein ([email protected]) at September 30, 2015 09:56 PM

September 22, 2015

William Stein

SageMathCloud's poor user retention rate

Poor retention rate

Many people try SageMathCloud, but only a small percentage stick around.  I definitely don't know why. Recent SageMathCloud rates are below 4%:

Is it performance?

Question: Are the people who try SMC discouraged by performance issues?

I think it's unlikely many users are leaving due to hitting noticeable performance issues.  I think I would know, since there's a huge bold messages all over the site that say "Email [email protected]: in case of problems, do not hesitate to immediately email us. We want to know if anything is broken!"    In the past when there have been performance or availability issues -- which of course do happen sometimes due to bugs or whatever -- I quickly get a lot of emails.  I haven't got anything that mentioned performance recently.  And usage of SMC is at an all time high: in the last day there were 676 projects created and 3500 projects modified -- which is significantly higher than ever before since the site started.  It's also about 2.2x what we had exactly a year ago.    

Is it the user interface?

Question: Is the SMC user interface highly discouraging and difficult to use?

My current best guess is that the main reason for attrition of our users is that 
they do not understand how to actually use SageMathCloud (SMC), and the interface doesn't help at all.   I think a large number of users get massively confused and lost when trying to use SMC.    It's pretty obvious this happens if you just watch what they do...    In order to have a foundation on which to fix that, the plan I came up with in May was to at least fix the frontend implementation so that it would be  much easier to do development with -- by switching from a confusing  mess of jQuery soup, e.g., 2012-style single page app development -- to Facebook's new React.js approach.  This is basically half done and deployed, and I'm going to work very hard for a while to finish it.   Once it's done, it's going to be much easier to improve the UI to  make it more user friendly.

Is it the open source software?

Question: Is open source mathematical software not sufficiently user friendly?

Fixing the UI probably won't help so much with improving the underlying open 
source mathematical software to be friendly though.    This is a massive, deep, and very difficult problem, and might be why growth of Sage stopped in 2011:

SageMath (and maybe Numpy/Scipy/IPython/etc.) are not as user friendly as Mathematica/Matlab.   I think they could be even more user friendly, but it's highly unlikely as long as the developers are mostly working on SageMath in their spare time as part of advanced research projects (which have little to do with user friendliness).  

Analyzing data about mistakes, frustation, and issues people actually have with  real worksheets and notebooks could also help a lot with directing our  effort in improving Sage/Python/Numpy/etc to be more user friendly.

Is it support?

Question: Are users frustrated by lack of interactive support?

Having integrated high-quality support for users inside SMC, in which we help 
them write code, answer questions, etc., could help with retention.  

Why don't you use SageMathCloud?

I've been watching this  stuff closely for over a decade most waking moments, and everybody likes to complain to me.    Why don't you use SageMathCloud?   Tell me:  [email protected].

by William Stein ([email protected]) at September 22, 2015 07:52 PM

September 21, 2015

Sébastien Labbé

There are 13.366.431.646 solutions to the Quantumino game

Some years ago, I wrote code in Sage to solve the Quantumino puzzle. I also used it to make a one-minute video illustrating the Dancing links algorithm which I am proud to say it is now part of the Dancing links wikipedia page.


Let me recall that the goal of the Quantumino puzzle is to fill a \(2\times 5\times 8\) box with 16 out of 17 three-dimensional pentaminos. After writing the sage code to solve the puzzle, one question was left: how many solutions are there? Is the official website realist or very prudent when they say that there are over 10.000 potential solutions? Can it be computed in hours? days? months? years? The only thing I knew was that the following computation (letting the 0-th pentamino aside) never finished on my machine:

sage: from sage.games.quantumino import QuantuminoSolver
sage: QuantuminoSolver(0).number_of_solutions()   # long time :)

Since I spent already too much time on this side-project, I decided in 2012 to stop investing any more time on it and to really focus on finishing writing my thesis.

So before I finish writing my thesis, I knew that the computation was not going to take a light-year, since I was able to finish the computation of the number of solutions when the 0-th pentamino is put aside and when one pentamino is pre-positioned somewhere in the box. That computation completed in 4 hours on my old laptop and gave about 5 millions solutions. There are 17 choices of pentatminos to put aside, there are 360 distinct positions of that pentamino in the box, so I estimated the number of solution to be something like \(17\times 360\times 5000000 = 30 \times 10^9\). Most importantly, I estimated the computation to take \(17\times 360\times 4= 24480\) hours or 1020 days. Therefore, I knew I could not do it on my laptop.

But last year, I received an email from the designer of the Quantumino puzzle:

-------- Message transféré --------
Sujet : quantumino
Date : Tue, 09 Dec 2014 13:22:30 +0100
De : Nicolaas Neuwahl
Pour : Sebastien Labbe

hi sébastien labbé,

i'm the designer of the quantumino puzzle.
i'm not a mathematician, i'm an architect. i like mathematics.
i'm quite impressed to see the sage work on quantumino, also i have not the
knowledge for full understanding.

i have a question for you - can you tell me HOW MANY different quantumino-
solutions exist?

ty and bye

nicolaas neuwahl

This summer was a good timing to launch the computation on my beautiful Intel® Core™ i5-4590 CPU @ 3.30GHz × 4 at Université de Liège. First, I improved the Sage code to allow a parallel computation of number of solutions in the dancing links code (#18987, merged in a Sage 6.9.beta6). Secondly, we may remark that each tiling of the \(2\times 5\times 8\) box can be rotated in order to find 3 other solutions. It is possible to gain a factor 4 by avoiding to count 4 times the same solution up to rotations (#19107, still needs work from myself). Thanks to Vincent Delecroix for doing the review on both ticket. Dividing the estimated 1024 days of computation needed by a factor \(4\times 4=16\) gives an approximation of 64 days to complete the computation. Two months, just enough to be tractable!

With those two tickets (some previous version to be honest) on top of sage-6.8, I started the computation on August 4th and the computation finished last week on September 18th for a total of 45 days. The computation was stopped only once on September 8th (I forgot to close firefox and thunderbird that night...).

The number of solutions and computation time for each pentamino put aside together with the first solution found is shown in the table below. We remark that some values are equal when the aside pentaminoes are miror images (why!?:).

/Files/2015/b0.png /Files/2015/b1.png
634 900 493 solutions 634 900 493 solutions
2 days, 6:22:44.883358 2 days, 6:19:08.945691
/Files/2015/b2.png /Files/2015/b3.png
509 560 697 solutions 509 560 697 solutions
2 days, 0:01:36.844612 2 days, 0:41:59.447773
/Files/2015/b4.png /Files/2015/b5.png
628 384 422 solutions 628 384 422 solutions
2 days, 7:52:31.459247 2 days, 8:44:49.465672
/Files/2015/b6.png /Files/2015/b7.png
1 212 362 145 solutions 1 212 362 145 solutions
3 days, 17:25:00.346627 3 days, 19:10:02.353063
/Files/2015/b8.png /Files/2015/b9.png
197 325 298 solutions 556 534 800 solutions
22:51:54.439932 1 day, 19:05:23.908326
/Files/2015/b10.png /Files/2015/b11.png
664 820 756 solutions 468 206 736 solutions
2 days, 8:48:54.767662 1 day, 20:14:56.014557
/Files/2015/b12.png /Files/2015/b13.png
1 385 955 043 solutions 1 385 955 043 solutions
4 days, 1:40:30.270929 4 days, 4:44:05.399367
/Files/2015/b14.png /Files/2015/b15.png
694 998 374 solutions 694 998 374 solutions
2 days, 11:44:29.631 2 days, 6:01:57.946708
1 347 221 708 solutions  
3 days, 21:51:29.043459  

Therefore the total number of solutions up to rotations is 13 366 431 646 which is indeed more than 10000:)

sage: L = [634900493, 634900493, 509560697, 509560697, 628384422,
628384422, 1212362145, 1212362145, 197325298, 556534800, 664820756,
468206736, 1385955043, 1385955043, 694998374, 694998374, 1347221708]
sage: sum(L)
sage: factor(_)
2 * 23 * 271 * 1072231
The machine (4 cores) Intel® Core™ i5-4590 CPU @ 3.30GHz × 4 (Université de Liège)
Computation Time 45 days, (Aug 4th -- Sep 18th, 2015)
Number of solutions (up to rotations) 13 366 431 646
Number of solutions / cpu / second 859

My code will be available on github.

About the video on wikipedia.

I must say that the video is not perfect. On wikipedia, the file talk page of the video says that the Jerky camera movement is distracting. That is because I managed to make the video out of images created by .show(viewer='tachyon') which changes the coordinate system, hardcodes a lot of parameters, zoom properly, simplifies stuff to make sure the user don't see just a blank image. But, for making a movie, we need access to more parameters especially the placement of the camera (to avoid the jerky movement). I know that Tachyon allows all of that. It is still a project that I have to create a more versatile Graphics3D -> Tachyon conversion allowing to construct nice videos of evolving mathematical objects. That's another story.

by Sébastien Labbé at September 21, 2015 02:55 PM

September 10, 2015

William Stein

Funding Open Source Mathematical Software in the United States

I do not know how to get funding for open source mathematical software in the United States. However, I'm trying.

Why: Because Sage is Hobbling Along

Despite what we might think in our Sage-developer bubble, Sage is hobbling along, and without an infusion of financial support very soon, I think the project is going to fail in the next few years. I have access to Google analytics data for sagemath.org since 2007, and there has been no growth  in active users of the website since 2011:

Something that is Missing

The worse part of all for me, after ten years, is seeing things like this email today from John Palmieri, where he talks about writing slow but interesting algebraic topology code, and needing help from somebody who knows Cython to actually make his code fast.

I know from my three visits to the Magma group in Sydney that such assistance is precisely what having real financial support can provide. Such money makes it possible to have fulltime people who know the tools and how to optimize them well, and they work on this sort of speedup and integration -- this "devil is in the details" work -- for each major contribution (they are sort of like a highly skilled version of a journal copy editor and referee all in one). Doing this makes a massive difference, but also costs on the order of $1 million / year to have any real impact. 1 million is probably the Magma budget to support around 10 people and periodic visitors, and of course like 1% of the budget of Matlab/Mathematica. Magma has this support partly because Magma is closed source, and maintains tight control on who may use it.

Searching for a Funding Model

Sage is open source and freely available to all, so it is of potential huge value to the community by being owned by everybody and changeable. However, those who fund Magma (either directly or indirectly) haven't funded Sage at the same level for some reason. I can't make Sage closed source and copy that very successful funding model. I've tried everything I can think of given the time and resources I have, and the only model left that seems able to support open source is having a company that does something else well and makes money, then using some of the profit to fund open source (Intel is the biggest contributor to Linux).

SageMath, Inc.

Since I failed to find any companies that passionately care about Sage like Intel/Google/RedHat/etc. care about Linux, I started one. I've been working on SageMathCloud extremely hard for over 3 years now, with the hopes that at least it could be a way to fund Sage development.

by William Stein ([email protected]) at September 10, 2015 07:38 PM

September 05, 2015

William Stein

The Simons Foundation and Open Source Software

Jim Simons

Jim Simons is a mathematician who left academia to start a hedge fund that beat the stock market. He contributes back to the mathematical community through the Simons Foundation, which provides an enormous amount of support to mathematicians and physicists, and has many outreach programs.

SageMath is a large software package for mathematics that I started in 2005 with the goal of creating a free open source viable alternative to Magma, Mathematica, Maple, and Matlab. People frequently tell me I should approach the Simons Foundation for funding to support Sage. For example:
Jim Simons, after retiring from Renaissance Technologies with a cool 15 billion, has spent the last 10 years giving grants to people in the pure sciences. He's a true academic at heart [...] Anyways, he's very fond of academics and gives MacArthur-esque grants, especially to people who want to change the way mathematics is taught. Approach his fund. I'm 100% sure he'll give you a grant on the spot.

The National Science Foundation

Last month the http://sagemath.org website had 45,114 monthly active users. However, as far as I know, there is no NSF funding for Sage in the United States right now, and development is mostly done on a shoestring in spare time. We have recently failed to get several NSF grants for Sage, despite there being Sage-related grants in the past from NSF. I know that funding is random, and I will keep trying. I have two proposals for Sage funding submitted to NSF right now.

Several million dollars per year

I was incredibly excited in 2012 when David Eisenbud invited me to a meeting at the Simons Foundation headquarters in New York City with the following official description of their goals:
The purpose of this round table is to investigate what sorts of support would facilitate the development, deployment and maintenance of open-source software used for fundamental research in mathematics, statistics and theoretical physics. We hope that this group will consider what support is currently available, and whether there are projects that the Simons Foundation could undertake that would add significantly to the usefulness of computational tools for basic research. Modes of support that duplicate or marginally improve on support that is already available through the universities or the federal government will not be of interest to the foundation. Questions of software that is primarily educational in nature may be useful as a comparison, but are not of primary interest.  The scale of foundation support will depend upon what is needed and on the potential scientific benefit, but could be substantial, perhaps up to several million dollars per year.
Current modes of funding for research software in mathematics, statistics and physics differ very significantly. There may be correspondingly great differences in what the foundation might accomplish in these areas. We hope that the round table members will be able to help the foundation understand the current landscape  (what are the needs, what is available, whether it is useful, how it is supported) both in general and across the different disciplines, and will help us think creatively about new possibilities.
I flew across country to this the meeting, where we spent the day discussing ways in which "several million dollars per year" could revolutionize "the development, deployment and maintenance of open-source software used for fundamental research in mathematics...".

In the afternoon Jim Simons arrived, and shook our hands. He then lectured us with some anecdotes, didn't listen to what we had to say, and didn't seem to understand open source software. I was frustrated watching how he treated the other participants, so I didn't say a word to him. I feel bad for failing to express myself.

The Decision

In the backroom during a coffee break, David Eisenbud told me that it had already been decided that they were going to just fund Magma by making it freely available to all academics in North America. WTF? I explained to David that Magma is closed source and that not only does funding Magma not help open source software like Sage, it actively hurts it. A huge motivation for people to contribute to Sage is that they do not have access to Magma (which was very expensive).

I wandered out of that meeting in a daze; things had gone so differently than I had expected. How could a goal to "facilitate the development, deployment and maintenance of open-source software... perhaps up to several million dollars per year" result in a decision that would make things possibly much worse for open source software?

That day I started thinking about creating what would become SageMathCloud. The engineering work needed to make Sage accessible to a wider audience wasn't going to happen without substantial funding (I had put years of my life into this problem but it's really hard, and I couldn't do it by myself). At least I could try to make it so people don't have to install Sage (which is very difficult). I also hoped a commercial entity could provide a more sustainable source of funding for open source mathematics software. Three years later, the net result of me starting SageMathCloud and spending almost every waking moment on it is that I've gone from having many grants to not, and SageMathCloud itself is losing money. But I remain cautiously optimistic and forge on...

We will not fund Sage

Prompted by numerous messages recently from people, I wrote to David Eisenbud this week. He suggested I write to Yuri Schinkel, who is the current director of the Simons Foundation:
Dear William,
Before I joined the foundation, there was a meeting conducted by David Eisenbud to discuss possible projects in this area, including Sage.
After that meeting it was decided that the foundation would support Magma.
Please keep me in the loop regarding developments at Sage, but I regret that we will not fund Sage at this time.
Best regards, Yuri
The Simons Foundation, the NSF, or any other foundation does not owe the Sage project anything. Sage is used by a lot of people for free, who together have their research and teaching supported by hundreds of millions of dollars in NSF grants. Meanwhile the Sage project barely hobbles along. I meet people who have fantastic development or documentations projects for Sage that they can't do because they are far too busy with their fulltime teaching jobs. More funding would have a massive impact. It's only fair that the US mathematical community is at least aware of a missed opportunity.
Funding in Europe for open source math software is much better.

Hacker News discussion

by William Stein ([email protected]) at September 05, 2015 04:47 PM

September 01, 2015

William Stein

React, Flux, RethinkDB and SageMathCloud -- Summer 2015 update

I've been using databases and doing web development for over 20 years, and I've never really loved any database before and definitely didn't love any web development frameworks either. That all changed for me this summer...


SageMathCloud is a web application in which you collaboratively use Python, LaTeX, Markdown, Sage worksheets (sophisticated mathematics), task lists, R, Jupyter Notebooks, manage courses, write C programs, make chatrooms, and more. It is hosted on Google Compute Engine, but is also entirely open source and there is a pre-made Virtual Machine that you can download. A project in SMC is a Linux account, with resources constrained using cgroups and quotas. Many SMC users can collaborate on the same project, and have equal privileges in that project. Interaction with all file types (including Jupyter notebooks, task lists and course managements) is synchronized in realtime, like Google docs. There is also a global notifications feed that shows all editing activity on all files in all projects on which the user collaborates, which is a sort of highly technical version of Facebook's feed.

Rewrite motivation

I originally wrote the SageMathCloud frontend using progressive-refinement jQuery (no third-party framework beyond that) and the Cassandra database. These were reasonable choices when I started. There are much better approaches now, which are critical to dramatically improving the user experience with SMC, and also growing the developer base. So far SMC has had no nontrivial outside contributions, probably due to the difficulty of understanding the code. In fact, I think nobody besides me has ever even installed SMC, despite these install notes.

We (me, Jon Lee, Nicholas Ruhland) are currently completely rewriting the entire frontend of SMC using React.js, Flux, and RethinkDB. We started this rewrite in June 2015, with Jon being supported by Google Summer of Code (2015), Nich being supported some by NSF grants from Randy Leveque and Rekha Thomas, and with me being unemployed.

Terrible funding situation

I'm living on credit cards -- I have no NSF grant support anymore, and SageMathCloud is still losing a lot of money every month, and I'm unhappy about this situation. It was either completely quit working on SMC and instead teach or consult a lot, or lose tens of thousands of dollars. I am doing the latter right now. I was very caught off guard, since this is my first summer ever to not have NSF support since I got my Ph.D. in 2000, and I didn't expect to have my grant proposals all denied (which happened in June). There is some modest Angel investment in SageMath, Inc., but I can't bring myself to burn through that money on salary, since it would run out quickly, and I don't want to have to shut down the site due to not being able to pay the hosting bill. I've failed to get any significant free hosting, due to already getting free hosting in the past, and SageMath, Inc. not being in any incubators. For example, we tried very hard to get hosting from Google, but they flatly refused for these two reasons (they gave $60K in hosting to UW/Sage project in 2012). I'm clearly having trouble transitioning from an academic to an industry funding model. But if there are enough paying customers by January 2016, things will turn around.

Jon, Nich, and I have been working on this rewrite for three months, and hope to finish it by the end of September, when Jon and Nich will become busy with classes again. However, it seems unlikely we'll be able to finish at the current rate. Fortunately, I don't start teaching fulltime again until January, and we put a lot of work into doing a release in mid-August that fully uses RethinkDB and partly uses React.js, so that we can finish the second stage of the rewrite iteratively, without any major technical surprises.


Cassandra is an excellent database for many applications, but it is not the right database for SMC and I'm making no further use of Cassandra. SMC is a realtime application that does a lot more reading than writing to the database, and SMC greatly benefits from realtime push updates from the database. I've tried quite hard in the past to build an appropriate architecture for SMC on top of Cassandra, but it is the wrong tool for the job. RethinkDB scales up linearly (with sharding and replication), and has high availability and automatic failover as of version 2.1.2. See https://github.com/rethinkdb/rethinkdb/issues/4678 for my painful path to ensuring RethinkDB actually works for me (the RethinkDB developers are incredibly helpful!).


I learned about React.js first from some "random podcast", then got more interested in it when Chris Swenson gave a demo at a Sage Days workshop in San Diego in May 2015. React (+Flux) is a web development framework that actually has solid ideas behind it, backed by an implementation that has been optimized and tested by a highly nontrivial real world application: namely the Facebook website. Even if I were to have the idea of React, implementing in a way that is actually usable would be difficult. The key idea of React.js is that -- surprisingly -- it is possible to write efficient client-side code that describes how to render the application purely as a function of its state.

React is different than jQuery. With jQuery, you write lots of code explaining how to transform the user interface of your application from one complicated state (that you might never have anticipated happening) to another complicated state. When using React.js you don't write code about how your application's visible state changes -- instead you write code to answer the question: "given this state, what should the application look like". For me, it's a game changer. This is like what one does when writing video games; the innovation is that some people at Facebook figured out how to practically program this way in a client side web browser application, then tuned their implementation based on huge amounts of real world data (Facebook has users). Oh, and they open sourced the result and ran several conferences explaining React.

React.js reminds me of when Andrew Wiles proved Fermat's Last Theorem in the mid 1990s. Wiles (and Ken Ribet) had genuine new ideas, which dramatically reshaped the landscape of number theory. The best number theorists quickly realized this and adopted to the new world, pushing the envelope of Wiles work far beyond what I expected could happen. Other people pretended like Wiles didn't exist and continued studying Fibonnaci numbers. I browsed the web development section of Barnes and Noble last night and there were dozens of books on jQuery and zero on React.js. I feel for anybody who tries to learn client-side web development by reading books at Barnes and Noble.

IPython/Jupyter and PhosphorJS

I recently met with Fernando Perez, who founded IPython/Jupyter. He seemed to tell me that currently 9 people are working fulltime on rewriting the Jupyter web notebook using the PhosphorJS framework. I tried to understand PhosphorJS based on the github page, but couldn't, except to deduce that it is mostly the work of one person from Bloomberg/Continuum Analytics. Fernando told me that they chose PhosphorJS since it very fast, and that their main motivation is to (1) make Jupyter better use their huge high-resolution monitors on their new institute at Berkeley, and (2) make it easier for developers like me to integrate/extend Jupyter into their applications. I don't understand (2), because PhosphorJS is perhaps the least popular web framework I've ever heard of (is it a web framework -- I can't tell?). I pushed Fernando to explain why they made that design choice, but didn't really understand the answer, except that they had spent a lot of time investigating alternatives (like React first). I'm intimidated by their resources and concerned that I'm making the wrong choice; however, I just can't understand why they have made what seems to me to be the wrong choice. I hope to understand more at the joint Sage/Jupyter Days 70 that we are organizing together in Berkeley, CA in November. (Edit: see https://github.com/ipython/ipython/issues/8239 for a discussion of why IPython/Jupyter uses PhosphorJS.)

Tables and RethinkDB

Our rewrite of SMC is built on Tables, Flux and React. Tables are client-side technology I wrote inspired by Facebook's GraphQL/Relay technology (and Meteor, Firebase, etc.); they synchronize data between clients and the backend database in realtime. Tables are defined by a JSON schema file, which specifies the fields in the table, and explains what get and set queries are allowed. A table is a subset of a much larger table in the database, with the subset defined by conditions that are relative to the user making the query. For example, the projects table has one entry for each project that the user is a collaborator on.

Tables are automatically synchronized between the user and the database whenever the database changes, using RethinkDB changefeeds. RethinkDB's innovation is to build realtime updates -- triggered when the result of a query to the database changes -- directly into the database at the lowest level. Of course it is possible to build something that looks the same from the outside using either a message queue (say using RabbitMQ or ZeroMQ), or by watching the replication stream from the database and triggering actions based on that (like Meteor does using MongoDB). RethinkDB's approach seems better to me, putting the abstraction at the right level. That said, based on mailing list traffic, searches, etc., it seems that very, very few people get RethinkDB yet. Also, despite years of development, RethinkDB only became "production ready" a few months ago, and only got automatic failover a few weeks ago. That said, after ironing out some kinks, I'm now using it with heavy traffic in production and it works very well.


Once data is automatically synchronized between the database and web browsers in realtime, we can build everything else on top of this. Facebook also introduced an architecture pattern that they call Flux, which works well with React. It's very different than MVC-style two-way binding frameworks, where objects are directly linked to UI elements, with an object changing causing the UI element to change and vice versa. In SMC each major part of the system has two objects associated to it: Actions and Stores. We think of them in terms of the classical CQRS pattern -- command query responsibility segregation. Actions are commands -- they are Javascript "functions" that get stuff done, but they do not return values; instead, they impact the state of the store. The store has functions that allow one to query for the state of the store, but they do not change the state of the store. The store functions must only be functions of the internal state of the store and nothing else. They might cache their results and format their output to be very convenient for rendering. But that's it.

Actions usually cause the corresponding store (or stores) to change. When a store changes, it emit a change event, which causes any React components that depend on the store to be updated, which in many cases means they are re-rendered. There are optimizations one can introduce to reduce the amount of re-rendering, which if one isn't careful leads to subtle bugs; pretty much the only subtle React UI bugs one hits are caused by such optimizations. When the UI re-renders, the user sees their view of the world change. The user then clicks buttons, types, etc., which triggers actions, which in turn update stores (and tables, hence propogating changes to the ultimate source of truth, which is the RethinkDB database). As stores update, the UI again updates, etc.


So far, we have completely (re-)written the project listing, file manager, help/status page, new file page, project log, file finder, project settings, course management system, account settings, billing, project upgrade system, and file use notifications using React, Flux, and Tables, and the result works well. Bugs are much easier to fix, and it is easy (possible?) to understand the state of the system, since it is defined by the state of the database and the corresponding client-side stores. We've completely rethought everything about the UI in doing the rewrite of the above components, and it has taken several months. Also, as mentioned above, I completely rewrote most of the backend to use RethinkDB instead of Cassandra. There were also the weeks of misery for me after we made the switch over. Even after weeks of thinking/testing/wondering "what could go wrong?", we found out all kinds of surprising little things within hours of pushing everything into production, which took more than a week of sleep deprived days to sort out.

What's left? We have to rewrite the file editor tabs system, the project tabs system, and all the applications (except course management): editing text files using Codemirror, task lists (which are suprisingly complicated!), color xterm terminals, Jupyter notebooks (which will still use an iframe for the notebook itself), Sage worksheets (with complicated html output embedded in codemirror), compressed file de-archiver, the LaTeX editor, the wiki and markdown editors, and file chat. We hope to find a clean way to abstract away the various SMC applications as plugins, so that other people can easily write their own applications/plugins that will run inside of SMC. There will be a rich collection of example plugins to build on, namely the ones listed above, which are all driven by critical-to-us real world applications.

Discussion about this blog post on Hacker News.

by William Stein ([email protected]) at September 01, 2015 07:57 AM

August 22, 2015

Benjamin Hackl

Google Summer of Code 2015: Conclusion

The “Google Summer of Code 2015” program has ended yesterday, on the 21. of August at 19.00 UTC. This blog entry shall provide a short wrap-up of our GSoC project.

The aim of our project was to implement a basic framework that enables us to do computations with asymptotic expressions in SageMath — and I am very happy to say that we very much succeeded to do so. An overview of all our developments can be found at meta ticket #17601.

Although we did not really follow the timeline suggested in my original proposal (mainly because the implementation of the Asymptotic Ring took way longer than originally anticipated) we managed to implement the majority of ideas from my proposal — with the most important part being that our current prototype is stable. In particular, this means that we do not expect to make major design changes at this point. Every detail of our design is well-discussed and can be explained.

Of course, our “Asymptotic Expressions” project is far from finished, and we will continue to extend the functionality of our framework. For example, although working with exponential and logarithmic terms is currently possible, it is not very convenient because the $\log$, $\exp$, and power functions are not fully implemented. Furthermore, it would be interesting to investigate the performance-gain obtained by cythonizing the central parts of this framework (e.g. parts of the MutablePoset…) — and so on…

To conclude, I want to thank Daniel Krenn for his hard work and helpful advice als my mentor, as well as the SageMath community for giving me the opportunity to work on this project within the Google Summer of Code program! 🙂

by behackl at August 22, 2015 11:34 AM

August 17, 2015

Benjamin Hackl

Asymptotic Expressions: Current Developments

Since my last blog entry on the status of our implementation of Asymptotic Expressions in SageMath quite a lot of improvements have happened. Essentially, all the pieces required in order to have a basic working implementation of multivariate asymptotics are there. The remaining tasks within my Google Summer of Code project are:

  • Polish the documentation of our minimal prototype, which consists of #17716 and the respective dependencies. Afterwards, we will set this to needs_review.
  • Open a ticket for the multivariate asymptotic ring and put together everything that we have written so far there.

In this blog post I want to give some more examples of what can be done with our implementation right now and what we would like to be able to handle in the future.

Status Quo

After I wrote my last blog entry, we introduced a central idea/interface to our project: short notations. By using the short notation factory for growth groups (introduced in #18930) it becomes very simple to construct the desired growth group. Essentially, monomial growth groups (cf. #17600), i.e. groups that contain elements of the form

(for a fixed variable and powers from some base ring, e.g. the Integer Ring or even the Rational Field) are represented by
, where the base ring is also specified via its shortened name. The short notation factory then enables us to do the following:

sage: from sage.groups.asymptotic_growth_group import GrowthGroup
sage: G = GrowthGroup('x^ZZ'); G
Growth Group x^ZZ
sage: G.an_element()
sage: G = GrowthGroup('x^QQ'); G
Growth Group x^QQ
sage: G.an_element()

Naturally, this interface carries over to the generation of asymptotic rings: instead of the (slightly dubious)

keyword advertised in my last blog entry, we can now actually construct the growth group by specifying the respective growth group via its short representation:

sage: R.<x> = AsymptoticRing('x^ZZ', QQ); R
Asymptotic Ring <x^ZZ> over Rational Field
sage: (x^2 + O(x))^50
x^100 + O(x^99)

Recently, we also implemented another type of growth group: exponential growth groups (see #19028). These groups represent elements of the form

where the base is from some multiplicative group. For example, we could do the following:

sage: G = GrowthGroup('QQ^x'); G
Growth Group QQ^x
sage: G.an_element()
sage: G(2^x) * G(3^x)
sage: G(5^x) * G((1/7)^x)

Note: unfortunately, we did not implement a function that allows taking some element from some growth group (e.g.

from a monomial growth group) as the variable in an exponential growth group yet. Implementing some way to “change” between growth groups by taking the log or the exponential function is one of our next steps.

We also made this short notation a central interface for working with cartesian products. This is implemented in #18587. For example, this allows to construct growth groups containing elements like $2^x \sqrt[5]{x^2} \log(x)^2$:

sage: G = GrowthGroup('QQ^x * x^QQ * log(x)^ZZ'); G
Growth Group QQ^x * x^QQ * log(x)^ZZ
sage: G.an_element()
(1/2)^x * x^(1/2) * log(x)
sage: G(2^x * x^(2/5) * log(x)^2)
2^x * x^(2/5) * log(x)^2

Simple parsing from the symbolic ring (and from strings) is implemented. Like I have written above, operations like

are one of the next steps on our roadmap.

Further Steps

Of course, having an easy way to generate growth groups (and thus also asymptotic rings) is nice — however, it would be even better if the process of finding the correct parent would be even more automated. Unfortunately, this requires some non-trivial effort regarding the pushout construction — which will certainly not happen within the GSoC project.

As soon as we have an efficient way to “switch” between factors of a growth group (e.g. by taking the logarithm or the exponential function), this has to be carried over up to the asymptotic ring. Operations like

sage: 2^(x^2 + O(x))
2^(x^2) * 2^(O(x))

where the output could also be

2^(x^2) * O(x^g)
, where $g$ is determined by

Division of asymptotic expressions can be realized with just about the same idea, for example:

\[ \frac{1}{x^2 + O(x)} = \frac{1}{x^2} \frac{1}{1 + O(1/x)} = x^{-2} + O(x^{-3}), \]

and so on. If an infinite series occurs, it will have to be cut using an $O$-Term, most likely somehow depending on

as well.

Ultimately, we would also like to incorporate, for example, Stirling’s approximation of the factorial such that we could do something like

sage: n.factorial()
sqrt(2*pi) * e^(n*log(n)) * (1/e)^n * n^(1/2) + ...

which then can be used to obtain asymptotic expansions of binomial coefficients like $\binom{2n}{n}$:

sage: (2*n).factorial() / (n.factorial()^2)
1/sqrt(pi) * 4^n * n^(-1/2) + ...

As you can see, there is still a lot of work within our “Asymptotic Expressions” project — nevertheless, with the minimal working prototype and the ability to create cartesian products of growth groups, the fundament for all of this is already implemented! 😉

by behackl at August 17, 2015 07:15 AM

August 16, 2015

Michele Borassi

Conclusion of the Main Part of the Project

In this post, I will summarize the results obtained with the inclusion in Sage of Boost and igraph libraries. This was the main part of my Google Summer of Code project, and it was completed yesterday, when ticket 19003 was closed.

We have increased the number of graph algorithms available in Sage from 66 to 98 (according to the list used in the initial comparison of the graph libraries [1]). Furthermore, we decreased the running-time of several Sage algorithms: in some cases, we have been able to improve the asymptotic running-time, obtaining up to 10000x improvements in our tests. Finally, during the inclusion of external algorithms, we have refactored and cleaned some of Sage source code, like the shortest path routines: we have standardized the input and the output of 15 routines related to shortest paths, and we have removed duplicate code as much as possible.

More specifically, the first part of the project was the inclusion of Boost graph library: since the library is only available in C++, we had to develop an interface. This interface lets us convert easily a Sage graph into a Boost graph, and run algorithms on the converted graph. Then, we have written routines to re-translate the output into a Sage-readable format: this way, the complicated Boost library is "hidden", and users can interact with it as they do with Sage. In particular, we have interfaced the following algorithms:
  • Edge connectivity (trac.sagemath.org/ticket/18564);
  • Clustering coefficient (trac.sagemath.org/ticket/18811);
  • Cuthill-McKee and King vertex orderings (trac.sagemath.org/ticket/18876);
  • Minimum spanning tree (trac.sagemath.org/ticket/18910);
  • Dijkstra, Bellman-Ford, Johnson shortest paths (trac.sagemath.org/ticket/18931);
All these algorithms were either not available in Sage, or quite slow, compared to the Boost routines. As far as we know, Boost does not offer other algorithms that improve Sage algorithms: however, if such algorithms are developed in the future, it will be very easy to include them, using the new interface.

In the second part of the project, we included igraph: since this library already offers a Python interface, we decided to include it as an optional package (before it becomes a standard package, at least an year should pass [2]). To install the package, it is enough to type the following instruction from the Sage root folder:

sage -i igraph        # To install the igraph C core
sage -i python_igraph # To install the Python interface

Then, we can easily interact with igraph: for a list of available routines, it is enough to type "igraph." and click tab twice. To convert a Sage graph g_sage into an igraph graph it is enough to type g_igraph = g_sage.igraph_graph(), while a Sage graph can be instantiated from an igraph graph through g_sage=Graph(g_igraph) or g_sage=DiGraph(g_igraph). This way, all igraph algorithms are now available in Sage.

Furthermore, we have included the igraph maximum flow algoritm inside the Sage corresponding function, obtaining significant improvements (for more information and benchmarks, we refer to ticket 19003 [3]).

In conclusion, I think the project reached its main goal, the original plan was followed very closely, and we have been able to overcome all problems.

Before closing this post, I would like to thank many people that helped me with great advices, and who provided great solutions to all the problems I faced. First of all, my mentor David Coudert: he always answered very fast to all my queries, and he gave me great suggestions to improve the quality of the code I wrote. Then, a very big help came from Nathann Cohen, who often cooperated with David in reviewing my code and proposing new solutions. Moreover, I have to thank Martin Cross, who gave me good suggestions with Boost graph library, and Volker Braun, who closed all my ticket. Finally, I have to thank the whole Sage community for giving me this great opportunity!

[1] https://docs.google.com/spreadsheets/d/1Iu1hkQtRn9J-sgfZbQTu2RoXzyjoMEWP5-cm3nAwnWE/edit?usp=sharing
[2] http://doc.sagemath.org/html/en/developer/coding_in_other.html
[3] http://trac.sagemath.org/ticket/19003

by Michele Borassi ([email protected]) at August 16, 2015 06:59 AM