## March 10, 2016

### William Stein

#### Open source is now ready to compete with Mathematica for use in the classroom

When I think about what makes SageMath different, one of the most fundamental things is that it was created by people who use it every day.  It was created by people doing research math, by people teaching math at universities, and by computer programmers and engineers using it for research.  It was created by people who really understand computational problems because we live them.  We understand the needs of math research, teaching courses, and managing an open source project that users can contribute to and customize to work for their own unique needs.

The tools we were using, like Mathematica, are clunky, very expensive, and just don't do everything we need.  And worst of all, they are closed source software, meaning that you can't even see how they work, and can't modify them to do what you really need.  For teaching math, professors get bogged down scheduling computer labs and arranging for their students to buy and install expensive software.

So I started SageMath as an open source project at Harvard in 2004, to solve the problem that other math software is expensive, closed source, and limited in functionality, and to create a powerful tool for the students in my classes.  It wasn't a project that was intended initially as something to be used by hundred of thousands of people.  But as I got into the project and as more professors and students started contributing to the project, I could clearly see that these weren't just problems that pissed me off, they were problems that made everyone angry.

The scope of SageMath rapidly expanded.  Our mission evolved to create a free open source serious competitor to Mathematica and similar closed software that the mathematics community was collective spending hundreds of millions of dollars on every year. After a decade of work by over 500 contributors, we made huge progress.

But installing SageMath was more difficult than ever.  It was at that point that I decided I needed to do something so that this groundbreaking software that people desperately needed could be shared with the world.

So I created SageMathCloud, which is an extremely powerful web-based collaborative way for people to easily use SageMath and other open source software such as LaTeX, R, and Jupyter notebooks easily in their teaching  and research.   I created SageMathCloud based on nearly two decades of experience using math software in the classroom and online, at Harvard, UC San Diego, and University of Washington.

SageMathCloud is commercial grade, hosted in Google's cloud, and very large classes are using it heavily right now.  It solves the installation problem by avoiding it altogether.  It is entirely open source.

Open source is now ready to directly compete with Mathematica for use in the classroom.  They told us we could never make something good enough for mass adoption, but we have made something even better.  For the first time, we're making it possible for you to easily use Python and R in your teaching instead of Mathematica; these are industry standard mainstream open source programming languages with strong support from Google, Microsoft and other industry leaders.   For the first time, we're making it possible for you to collaborate in real time and manage your course online using the same cutting edge software used by elite mathematicians at the best universities in the world.

A huge community in academia and in industry are all working together to make open source math software better at a breathtaking pace, and the traditional closed development model just can't keep up.

## March 03, 2016

### Liang Ze

#### Noncommutative Algebras in Sage

In this post, I’ll demonstrate 3 ways to define non-commutative rings in Sage. They’re essentially different ways of expressing the non-commutative relations in the ring:

1. Via g_algebra: define the relations directly
2. Via NCPolynomialRing_plural: define a pair of structural matrices
3. Via a quotient of a letterplace ring: define the ideal generated by the relations (only works for homogeneous relations)

As far as I know, all 3 methods rely on Sage’s interface with Singular and its non-commutative extension Plural.

In addition to all the documentation linked above, I also relied heavily on Greuel and Pfister’s A Singular Introduction to Commutative Algebra. Despite the title, it does have a pretty substantial section (1.9) devoted to non-commutative $G$-algebras.

## $U(\mathfrak{sl}_2)$ and its homogenization

The running example throughout this post will be the universal enveloping algebra $U(\mathfrak{sl}_2)$ over $\mathbb{Q}$.

We’ll define this to be the (non-commutative) $\mathbb{Q}$-algebra $U$ with generators $e,f,h$ subject to the relations

If we set $e,f,h$ to have degree 1, these relations are not homogeneous. Their left-hand sides only have degree 2 terms, while their right-hand sides have degree 1 terms as well. This is fine with the first two methods, but won’t work for method 3 (which requires homogeneous relations).

To demonstrate the third method, we’ll define the $\mathbb{Q}$-algebra $H$ with generators $e,f,h,t$ subject to the homogeneous relations

We can obtain $U$ both as a quotient and a localization of $H$:

## $G$-algebras

Using the g_algebra method of Sage’s FreeAlgebra class, we can simply plug our noncommutative relations in, and get our non-commutative ring. This is about as easy as it gets:

Let’s unravel what’s going on here.

### Monomial orderings and PBW basis

Most algorithms for commutative and non-commutative rings require an ordering on the generators. In our case, let’s use the ordering

This is implicitly stated in our code: we wrote F.<e,f,h> instead of F.<h,e,f>, for example.

A standard word is a monomial of the form

In the polynomial ring $\mathbb{Q}[e,f,h]$, every monomial can be expressed in this form, so the set of standard words forms a $\mathbb{Q}$-basis for $\mathbb{Q}[e,f,h]$.

In a non-commutative ring, whether or not the standard words form a basis depends on what relations we have. Such a basis, if it exists, is called a PBW basis.

The free algebra $F = \mathbb{Q}\langle e,f,h\rangle$ has no relations, so does not have a PBW basis. Fortunately, our algebra $U$ does have a PBW basis.

This means that we can always express a non-standard monomial (e.g. $fe$) as a sum of standard monomials (e.g. $ef - h$). The non-commutative relations that define $U$ can thus be thought of as an algorithm for turning non-standard words into sums of standard words.

To do this in Sage, we define a dictionary whose keys are non-standard words and values are the standard words they become.

In the above example, our dictionary was short enough to fit into one line, but we could also define a dictionary separately and pass it into g_algebra:

It’s very important that the keys are non-standard words and the values are sums of standard words. Mathematically, the relation $fe = ef - h$ is the same as $ef = fe + h$, but if we replace f*e : e*f - h with e*f : f*e + h in the code, we’ll get an error (try it!).

### What are $G$-algebras?

The reason why $U$ has a PBW basis is because it is a $G$-algebra. Briefly, $G$-algebras are algebras whose relations satisfy certain non-degeneracy conditions that make the algebra nice to work with.

For a full definition of $G$-algebras, refer to A Singular Introduction to Commutative Algebra or the Plural manual.

If $A$ is a $G$-algebra, then it has a PBW basis, is left and right Noetherian, and is an integral domain. More importantly (for this site at least!), it means that we can define $A$ in Singular/Plural, and hence in Sage.

## Structural matrices for a $G$-algebra

Another way of writing our non-commutative relations is

where $*$ denotes element-wise multiplication (so there isn’t any linear algebra going on here; we’re just using matrices to organize the information). Let $N,C,S,D$ be the matrices above, in that order, so that $N = C*S + D$.

If we let $x_1 = e, x_2 = f, x_3 = h$ (so that $x_i \leq x_j$ if $i \leq j$) then for $i < j$

In other words, $N$ contains the non-standard words that we’re trying to express in terms of the standard words in $S$.

The matrices $C$ and $D$ are called the structural matrices of the $G$-algebra, and their entries are such that our relations may be written

with zeros everywhere else ($i \geq j$). If $C = D = 0$, the resulting algebra will be commutative.

We can use the structural matrices $C$ and $D$ to define our algebra via Sage’s NCPolynomialRing_plural function (note that Python uses zero-indexing for matrices):

Note that R is a commutative polynomial ring. In fact, up till the point where we call NCPolynomialRing_plural, even the variables e,f,h are treated as commutative variables.

This method of defining $U$ is considerably longer and more prone to mistakes than using g_algebra. As stated in the documentation, this is not intended for use! I’m including it here because this is essentially how one would go about defining a $G$-algebra in Singular. In fact, the Sage method g_algebra calls NCPolynomialRing_plural, which in turn calls Singular.

## Quotients of letterplace rings

Our final method for defining non-commutative rings makes use of Sage’s implementation of Singular’s letterplace rings.

As mentioned at the start of this post, this method requires the relations to be homogeneous, so we’ll work with $H$ instead of $U$.

Let $\mathbb{Q}\langle e,f,h,t \rangle$ be the free algebra on 4 variables. Consider the two-sided ideal $I$ generated by the relations for $H$:

Then

This can be expressed Sage-ly:

The expression F*I*F is the two-sided ideal generated by elements in the list I.

Although $U$ cannot be defined using this method, $H$ can be defined using all three methods. As a (fun?) exercise, try defining $H$ using the other two methods.

## Difficulties

These methods can be used to define many non-commutative algebras such as the Weyl algebra and various enveloping algebras of Lie algebras. One can also define these algebras over fields other than $\mathbb{Q}$, such as $\mathbb{C}$ or $\mathbb{F}_p$.

However, we cannot define algebras over $\mathbb{Q}(q)$, the fraction field of $\mathbb{Q}[q]$:

This is a problem if we want to define rings with relations such as

Such relations occur frequently when studying quantum groups, for example.

This is suprising, because one can easily define $\mathbb{Q}(q)$ and non-commutative $\mathbb{Q}(q)$-algebras in Singular/Plural, which is what Sage is using. It seems that the problem is in Sage’s wrapper for Singular/Plural, because Sage can’t even pass the ring $\mathbb{Q}(q)$ to Singular.

There’s a trac ticket for this problem, but until it gets resolved, we’ll just have to define such rings directly in Singular/Plural. Thanks to the amazing capabilities of the Sage Cell Server, we’ll do this in the next post!

## February 25, 2016

### William Stein

#### "If you were new faculty, would you start something like SageMathCloud sooner?"

I was recently asked by a young academic: "If you were a new faculty member again, would you start something like SageMathCloud sooner or simply leave for industry?" The academic goes on to say "I am increasingly frustrated by continual evidence that it is more valuable to publish a litany of computational papers with no source code than to do the thankless task of developing a niche open source library; deep mathematical software is not appreciated by either mathematicians or the public."

I wanted to answer that "things have gotten better" since back in 2000 when I started as an academic who does computation. Unfortunately, I think they have gotten worse. I do not understand why. In fact, this evening I just received the most recent in a long string of rejections by the NSF.

Regarding a company versus taking a job in industry, for me personally there is no point in starting a company unless you have a goal that can only be accomplished via a company, since building a business from scratch is extremely hard and has little to do with math or research. I do have such a goal: "create a viable open source alternative to Mathematica, etc...". I was very clearly told by Michael Monagan (co-founder of Maplesoft) in 2006 that this goal could not be accomplished in academia, and I spent the last 10 years trying to prove him wrong.

On the other hand, leaving for a job in industry means that your focus will switch from "pure" research to solving concrete problems that make products better for customers. That said, many of the mathematicians who work on open source math software do so because they care so much about making the experience of using math software much better for the math community. What often drives Sage developers is exactly the sort of passionate care for "consumer focus" and products that also makes one successful in industry. I'm sure you know exactly what I mean, since it probably partly motivates your work. It is sad that the math community turns its back on such people. If the community were to systematically embrace them, instead of losing all these $300K+/year engineers to mathematics entirely -- which is exactly what we do constantly -- the experience of doing mathematics could be massively improved into the future. But that is not what the community has chosen to do. We are shooting ourselves in the foot. Now that I have seen how academia works from the inside over 15 years I'm starting to understand a little why these things change very slowly, if ever. In the mathematics department I'm at, there are a small handful of research areas in pure math, and due to how hiring works (voting system, culture, etc.) we have spent the last 10 years hiring in those areas little by little (to replace people who die/retire/leave). I imagine most mathematics departments are very similar. "Open source software" is not one of those traditional areas. Nobody will win a Fields Medal in it. Overall, the mathematical community does not value open source mathematical software in proportion to its value, and doesn't understand its importance to mathematical research and education. I would like to say that things have got a lot better over the last decade, but I don't think they have. My personal experience is that much of the "next generation" of mathematicians who would have changed how the math community approaches open source software are now in industry, or soon will be, and hence they have no impact on academic mathematical culture. Every one of my Ph.D. students are now at Google/Facebook/etc. We as a community overall would be better off if, when considering how we build departments, we put "mathematical software writers" on an equal footing with "algebraic geometers". We should systematically consider quality open source software contributions on a potentially equal footing with publications in journals. To answer the original question, YES, knowing what I know now, I really wish I had started something like SageMathCloud sooner. In fact, here's the previously private discussion from eight years ago when I almost did. -- - There is a community generated followup ... ## February 24, 2016 ### William Stein #### Elliptic curves: Magma versus Sage ### Elliptic Curves Elliptic curves are certain types of nonsingular plane cubic curves, e.g., y^2 = x^3 + ax +b, which are central to both number theory and cryptography (e.g., they are used to compute the hash in bitcoin). ### Magma and Sage If you want to do a wide range of explicit computations with elliptic curves, for research purposes, you will very likely use SageMath or Magma. If you're really serious, you'll use both. Both Sage and Magma are far ahead of all other software (e.g., Mathematica, Maple and Matlab) for elliptic curves. ### A Little History When I started contributing to Magma in 1999, I remember that Magma was way, way behind Pari. I remember having lunch with John Cannon (founder of Magma), and telling him I would no longer have to use Pari if only Magma would have dramatically faster code for computing point counts on elliptic curves. A few years later, John wisely hired Mark Watkins to work fulltime on Magma, and Mark has been working there for over a decade. Mark is definitely one of the top people in the world at implementing (and using) computational number theory algorithms, and he's ensured that Magma can do a lot. Some of that "do a lot" means catching up with (and surpassing!) what was in Pari and Sage for a long time (e.g., point counting, p-adic L-functions, etc.) However, in addition, many people have visited Sydney and added extremely deep functionality for doing higher descents to Magma, which is not available in any open source software. Search for Magma in this paper to see how, even today, there seems to be no open source way to compute the rank of the curve y2 = x3 + 169304x + 25788938. (The rank is 0.) ### Two Codebases There are several elliptic curves algorithms available only in Magma (e.g., higher descents) ... and some available only in Sage (L-function rank bounds, some overconvergent modular symbols, zeros of L-functions, images of Galois representations). I could be wrong about functionality not being in Magma, since almost anything can get implemented in a year... The code bases are almost completely separate, which is a very good thing. Any time something gets implemented in one, it gets (or should get) tested via a big run on elliptic curves up to some bound in the other. This sometimes results in bugs being found. I remember refereeing the "integral points" code in Sage by running it against all curves up to some bound and comparing to what Magma output, and getting many discrepancies, which showed that there were bugs in both Sage and Magma. Thus we would be way better off if Sage could do everything Magma does (and vice versa). ## February 18, 2016 ### Sébastien Labbé #### unsupported operand parent for *, Matrix over number field, vector over symbolic ring Yesterday I received this email (in french): Salut, avec Thomas on a une question bête: K.<x>=NumberField(x*x-x-1) J'aimerais multiplier une matrice avec des coefficients en x par un vecteur contenant des variables a et b. Il dit "unsupported operand parent for *, Matrix over number field, vector over symbolic ring" Est ce grave ?  Here is my answer. Indeed, in Sage, symbolic variables can't multiply with elements in an Number Field in x: sage: x = var('x') sage: K.<x> = NumberField(x*x-x-1) sage: a = var('a') sage: a*x Traceback (most recent call last) ... TypeError: unsupported operand parent(s) for '*': 'Symbolic Ring' and 'Number Field in x with defining polynomial x^2 - x - 1'  But, we can define a polynomial ring with variables in a,b and coefficients in the NumberField. Then, we are able to multiply a with x: sage: x = var('x') sage: K.<x> = NumberField(x*x-x-1) sage: K Number Field in x with defining polynomial x^2 - x - 1 sage: R.<a,b> = K['a','b'] sage: R Multivariate Polynomial Ring in a, b over Number Field in x with defining polynomial x^2 - x - 1 sage: a*x (x)*a  With two square brackets, we obtain powers series: sage: R.<a,b> = K[['a','b']] sage: R Multivariate Power Series Ring in a, b over Number Field in x with defining polynomial x^2 - x - 1 sage: a*x*b (x)*a*b  It works with matrices: sage: MS = MatrixSpace(R,2,2) sage: MS Full MatrixSpace of 2 by 2 dense matrices over Multivariate Power Series Ring in a, b over Number Field in x with defining polynomial x^2 - x - 1 sage: MS([0,a,b,x]) [ 0 a] [ b (x)] sage: m1 = MS([0,a,b,x]) sage: m2 = MS([0,a+x,b*b+x,x*x]) sage: m1 + m2 * m1 [ (x)*b + a*b (x + 1) + (x + 1)*a] [ (x + 2)*b (3*x + 1) + (x)*a + a*b^2]  ## February 17, 2016 ### Liang Ze #### The Weyl Algebra and$\mathfrak{sl}_2$I’ve been away from this blog for quite a while - almost a year, in fact! My excuses are my wedding and the prelims (a.k.a. quals), as well as all the preparation that had to go into them (although, to be honest, those things only occupied me till September last year!). Looking back at my previous posts, I’ve realized that in attempting to teach both math and code, I probably ended up doing neither. This is really not the best place to learn representation theory (for example) - there are better books and blogs out there. Also, most of the code that I wrote to illustrate those posts feels contrived, and neither highlights Sage’s strengths nor reflects how I normally use Sage for my assignments and projects. I’ve thus decided to write shorter posts with code that I actually use (on SageMathCloud), along with some explanations of the code. Lately, I’ve been writing code for non-commutative algebra and combinatorics, so today I’ll start with a simple example of a non-commutative algebra. ## The Weyl Algebra The$1$-dim. Weyl algebra is the (non-commutative) algebra generated by$x, \partial_x$subject to the relations If we treat$x$as “multiplication by$x$” and$\partial_x$as “differentiation w.r.t.$x$”, this relation is really just an application of the chain rule: We can generalize to higher dimensions: the$n$-dim. Weyl algebra is the algebra generated by$x_1,\dots,x_n,\partial_{x_1},\dots,\partial_{x_n}$quotiented by the relations that arise from treating them as the obvious operators on$\mathbb{F}[x_1,\dots,x_n]$. ### Weyl algebras in Sage It’s easy to define the Weyl algebra in Sage: Calling inject_variables allows us to use the operators x,y,z,dx,dy,dz in subsequent code (where dx denotes$\partial_x$, etc). One can do rather complicated computations: By default, Sage chooses to represent monomials with x,y,z in front of dx,dy,dz: Keep in mind that x does not refer to the polynomial$x \in \mathbb{F}[x]$, so one should not expect dx*x to be 1. (For some reason show does not give the right output. Try show(x) or show(x*dx), for example.) ## Representations of$\mathfrak{sl}_2$It turns out that the$1$-dim. Weyl algebra gives a representation of$\mathfrak{sl}_2(\mathbb{F})$. The Lie algebra$\mathfrak{sl}_2(\mathbb{F})$is generated by$E,F,H$subject to the relations Define the following elements of the$1$-dim. Weyl algebra: We can use Sage to quickly verify that these elements indeed satisfy the relations for$\mathfrak{sl}_2$(using the commutator as the Lie bracket i.e.$[A,B] = AB - BA$): Working over$\mathbb{C}$, this action of$\mathfrak{sl}_2(\mathbb{C})$makes$\mathbb{C}[x]$a Verma module of highest weight$0$. In fact, we can make$\mathbb{C}[x]$a Verma module of highest weight$c$for any$c \in \mathbb{C}$by using: We verify this again in Sage: In subsequent posts, I’ll talk more about defining other non-commutative algebras in Sage and Singular. ## January 15, 2016 ### William Stein #### Thinking of using SageMathCloud in a college course? ## SageMathCloud course subscriptions "We are college instructors of the calculus sequence and ODE’s. If the college were to purchase one of the upgrades for us as we use Sage with our students, who gets the benefits of the upgrade? Is is the individual students that are in an instructor’s Sage classroom or is it the collaborators on an instructor’s project?" If you were to purchase just the$7/month plan and apply the upgrades to *one* single project, then all collaborators on that one project would benefit from those upgrades while using that project.

If you were to purchase a course plan for say $399/semester, then you could apply the upgrades (network access and members only hosting) to 70 projects that you might create for a course. When you create a course by clicking +New, then "Manage a Course", then add students, each student has their own project created automatically. All instructors (anybody who is a collaborator on the project where you clicked "Manage a course") is also added to the student's project. In course settings you can easily apply the upgrades you purchase to all projects in the course. Also I'm currently working on a new feature where instructors may choose to require all students in their course to pay for the upgrade themselves. There's a one time$9/course fee paid by the student and that's it.  At some colleges (in some places) this is ideal, and at other places it's not an option at all.   I anticipate releasing this very soon.

## Getting started with SageMathCloud courses

You can fully use the SMC course functionality without paying anything in order to get familiar with it and test it out.  The main benefit of paying is that you get network access and all projects get moved to members only servers, which are much more robust; also, we greatly prioritize support for paying customers.

This blog post is an overview of using SMC courses:

This has some screenshots and the second half is about courses:

http://blog.ouseful.info/2015/11/24/course-management-and-collaborative-jupyter-notebooks-via-sagemathcloud/

Here are some video tutorials made by an instructor that used SMC with a large class in Iceland recently:

Note that the above videos show the basics of courses, then talk specifically about automated grading of Jupyter notebooks.  That might not be at all what you want to do -- many math courses use Sage worksheets, and probably don't automate the grading yet.

## Searching for a Funding Model

Sage is open source and freely available to all, so it is of potential huge value to the community by being owned by everybody and changeable. However, those who fund Magma (either directly or indirectly) haven't funded Sage at the same level for some reason. I can't make Sage closed source and copy that very successful funding model. I've tried everything I can think of given the time and resources I have, and the only model left that seems able to support open source is having a company that does something else well and makes money, then using some of the profit to fund open source (Intel is the biggest contributor to Linux).

## SageMath, Inc.

Since I failed to find any companies that passionately care about Sage like Intel/Google/RedHat/etc. care about Linux, I started one. I've been working on SageMathCloud extremely hard for over 3 years now, with the hopes that at least it could be a way to fund Sage development.

## Jim Simons

Jim Simons is a mathematician who left academia to start a hedge fund that beat the stock market. He contributes back to the mathematical community through the Simons Foundation, which provides an enormous amount of support to mathematicians and physicists, and has many outreach programs.

SageMath is a large software package for mathematics that I started in 2005 with the goal of creating a free open source viable alternative to Magma, Mathematica, Maple, and Matlab. People frequently tell me I should approach the Simons Foundation for funding to support Sage. For example:
Jim Simons, after retiring from Renaissance Technologies with a cool 15 billion, has spent the last 10 years giving grants to people in the pure sciences. He's a true academic at heart [...] Anyways, he's very fond of academics and gives MacArthur-esque grants, especially to people who want to change the way mathematics is taught. Approach his fund. I'm 100% sure he'll give you a grant on the spot.

## The National Science Foundation

Last month the http://sagemath.org website had 45,114 monthly active users. However, as far as I know, there is no NSF funding for Sage in the United States right now, and development is mostly done on a shoestring in spare time. We have recently failed to get several NSF grants for Sage, despite there being Sage-related grants in the past from NSF. I know that funding is random, and I will keep trying. I have two proposals for Sage funding submitted to NSF right now.

## Several million dollars per year

I was incredibly excited in 2012 when David Eisenbud invited me to a meeting at the Simons Foundation headquarters in New York City with the following official description of their goals:
The purpose of this round table is to investigate what sorts of support would facilitate the development, deployment and maintenance of open-source software used for fundamental research in mathematics, statistics and theoretical physics. We hope that this group will consider what support is currently available, and whether there are projects that the Simons Foundation could undertake that would add significantly to the usefulness of computational tools for basic research. Modes of support that duplicate or marginally improve on support that is already available through the universities or the federal government will not be of interest to the foundation. Questions of software that is primarily educational in nature may be useful as a comparison, but are not of primary interest.  The scale of foundation support will depend upon what is needed and on the potential scientific benefit, but could be substantial, perhaps up to several million dollars per year.
Current modes of funding for research software in mathematics, statistics and physics differ very significantly. There may be correspondingly great differences in what the foundation might accomplish in these areas. We hope that the round table members will be able to help the foundation understand the current landscape  (what are the needs, what is available, whether it is useful, how it is supported) both in general and across the different disciplines, and will help us think creatively about new possibilities.
I flew across country to this the meeting, where we spent the day discussing ways in which "several million dollars per year" could revolutionize "the development, deployment and maintenance of open-source software used for fundamental research in mathematics...".

In the afternoon Jim Simons arrived, and shook our hands. He then lectured us with some anecdotes, didn't listen to what we had to say, and didn't seem to understand open source software. I was frustrated watching how he treated the other participants, so I didn't say a word to him. I feel bad for failing to express myself.

## The Decision

In the backroom during a coffee break, David Eisenbud told me that it had already been decided that they were going to just fund Magma by making it freely available to all academics in North America. WTF? I explained to David that Magma is closed source and that not only does funding Magma not help open source software like Sage, it actively hurts it. A huge motivation for people to contribute to Sage is that they do not have access to Magma (which was very expensive).

I wandered out of that meeting in a daze; things had gone so differently than I had expected. How could a goal to "facilitate the development, deployment and maintenance of open-source software... perhaps up to several million dollars per year" result in a decision that would make things possibly much worse for open source software?

That day I started thinking about creating what would become SageMathCloud. The engineering work needed to make Sage accessible to a wider audience wasn't going to happen without substantial funding (I had put years of my life into this problem but it's really hard, and I couldn't do it by myself). At least I could try to make it so people don't have to install Sage (which is very difficult). I also hoped a commercial entity could provide a more sustainable source of funding for open source mathematics software. Three years later, the net result of me starting SageMathCloud and spending almost every waking moment on it is that I've gone from having many grants to not, and SageMathCloud itself is losing money. But I remain cautiously optimistic and forge on...

## We will not fund Sage

Prompted by numerous messages recently from people, I wrote to David Eisenbud this week. He suggested I write to Yuri Schinkel, who is the current director of the Simons Foundation:
Dear William,
Before I joined the foundation, there was a meeting conducted by David Eisenbud to discuss possible projects in this area, including Sage.
After that meeting it was decided that the foundation would support Magma.
Please keep me in the loop regarding developments at Sage, but I regret that we will not fund Sage at this time.
Best regards, Yuri
The Simons Foundation, the NSF, or any other foundation does not owe the Sage project anything. Sage is used by a lot of people for free, who together have their research and teaching supported by hundreds of millions of dollars in NSF grants. Meanwhile the Sage project barely hobbles along. I meet people who have fantastic development or documentations projects for Sage that they can't do because they are far too busy with their fulltime teaching jobs. More funding would have a massive impact. It's only fair that the US mathematical community is at least aware of a missed opportunity.
Funding in Europe for open source math software is much better.

Hacker News discussion

## September 01, 2015

### William Stein

#### React, Flux, RethinkDB and SageMathCloud -- Summer 2015 update

I've been using databases and doing web development for over 20 years, and I've never really loved any database before and definitely didn't love any web development frameworks either. That all changed for me this summer...

### SageMathCloud

SageMathCloud is a web application in which you collaboratively use Python, LaTeX, Markdown, Sage worksheets (sophisticated mathematics), task lists, R, Jupyter Notebooks, manage courses, write C programs, make chatrooms, and more. It is hosted on Google Compute Engine, but is also entirely open source and there is a pre-made Virtual Machine that you can download. A project in SMC is a Linux account, with resources constrained using cgroups and quotas. Many SMC users can collaborate on the same project, and have equal privileges in that project. Interaction with all file types (including Jupyter notebooks, task lists and course managements) is synchronized in realtime, like Google docs. There is also a global notifications feed that shows all editing activity on all files in all projects on which the user collaborates, which is a sort of highly technical version of Facebook's feed.

### Rewrite motivation

I originally wrote the SageMathCloud frontend using progressive-refinement jQuery (no third-party framework beyond that) and the Cassandra database. These were reasonable choices when I started. There are much better approaches now, which are critical to dramatically improving the user experience with SMC, and also growing the developer base. So far SMC has had no nontrivial outside contributions, probably due to the difficulty of understanding the code. In fact, I think nobody besides me has ever even installed SMC, despite these install notes.

We (me, Jon Lee, Nicholas Ruhland) are currently completely rewriting the entire frontend of SMC using React.js, Flux, and RethinkDB. We started this rewrite in June 2015, with Jon being supported by Google Summer of Code (2015), Nich being supported some by NSF grants from Randy Leveque and Rekha Thomas, and with me being unemployed.

### Terrible funding situation

I'm living on credit cards -- I have no NSF grant support anymore, and SageMathCloud is still losing a lot of money every month, and I'm unhappy about this situation. It was either completely quit working on SMC and instead teach or consult a lot, or lose tens of thousands of dollars. I am doing the latter right now. I was very caught off guard, since this is my first summer ever to not have NSF support since I got my Ph.D. in 2000, and I didn't expect to have my grant proposals all denied (which happened in June). There is some modest Angel investment in SageMath, Inc., but I can't bring myself to burn through that money on salary, since it would run out quickly, and I don't want to have to shut down the site due to not being able to pay the hosting bill. I've failed to get any significant free hosting, due to already getting free hosting in the past, and SageMath, Inc. not being in any incubators. For example, we tried very hard to get hosting from Google, but they flatly refused for these two reasons (they gave $60K in hosting to UW/Sage project in 2012). I'm clearly having trouble transitioning from an academic to an industry funding model. But if there are enough paying customers by January 2016, things will turn around. Jon, Nich, and I have been working on this rewrite for three months, and hope to finish it by the end of September, when Jon and Nich will become busy with classes again. However, it seems unlikely we'll be able to finish at the current rate. Fortunately, I don't start teaching fulltime again until January, and we put a lot of work into doing a release in mid-August that fully uses RethinkDB and partly uses React.js, so that we can finish the second stage of the rewrite iteratively, without any major technical surprises. ### RethinkDB Cassandra is an excellent database for many applications, but it is not the right database for SMC and I'm making no further use of Cassandra. SMC is a realtime application that does a lot more reading than writing to the database, and SMC greatly benefits from realtime push updates from the database. I've tried quite hard in the past to build an appropriate architecture for SMC on top of Cassandra, but it is the wrong tool for the job. RethinkDB scales up linearly (with sharding and replication), and has high availability and automatic failover as of version 2.1.2. See https://github.com/rethinkdb/rethinkdb/issues/4678 for my painful path to ensuring RethinkDB actually works for me (the RethinkDB developers are incredibly helpful!). ### React.js I learned about React.js first from some "random podcast", then got more interested in it when Chris Swenson gave a demo at a Sage Days workshop in San Diego in May 2015. React (+Flux) is a web development framework that actually has solid ideas behind it, backed by an implementation that has been optimized and tested by a highly nontrivial real world application: namely the Facebook website. Even if I were to have the idea of React, implementing in a way that is actually usable would be difficult. The key idea of React.js is that -- surprisingly -- it is possible to write efficient client-side code that describes how to render the application purely as a function of its state. React is different than jQuery. With jQuery, you write lots of code explaining how to transform the user interface of your application from one complicated state (that you might never have anticipated happening) to another complicated state. When using React.js you don't write code about how your application's visible state changes -- instead you write code to answer the question: "given this state, what should the application look like". For me, it's a game changer. This is like what one does when writing video games; the innovation is that some people at Facebook figured out how to practically program this way in a client side web browser application, then tuned their implementation based on huge amounts of real world data (Facebook has users). Oh, and they open sourced the result and ran several conferences explaining React. React.js reminds me of when Andrew Wiles proved Fermat's Last Theorem in the mid 1990s. Wiles (and Ken Ribet) had genuine new ideas, which dramatically reshaped the landscape of number theory. The best number theorists quickly realized this and adopted to the new world, pushing the envelope of Wiles work far beyond what I expected could happen. Other people pretended like Wiles didn't exist and continued studying Fibonnaci numbers. I browsed the web development section of Barnes and Noble last night and there were dozens of books on jQuery and zero on React.js. I feel for anybody who tries to learn client-side web development by reading books at Barnes and Noble. ### IPython/Jupyter and PhosphorJS I recently met with Fernando Perez, who founded IPython/Jupyter. He seemed to tell me that currently 9 people are working fulltime on rewriting the Jupyter web notebook using the PhosphorJS framework. I tried to understand PhosphorJS based on the github page, but couldn't, except to deduce that it is mostly the work of one person from Bloomberg/Continuum Analytics. Fernando told me that they chose PhosphorJS since it very fast, and that their main motivation is to (1) make Jupyter better use their huge high-resolution monitors on their new institute at Berkeley, and (2) make it easier for developers like me to integrate/extend Jupyter into their applications. I don't understand (2), because PhosphorJS is perhaps the least popular web framework I've ever heard of (is it a web framework -- I can't tell?). I pushed Fernando to explain why they made that design choice, but didn't really understand the answer, except that they had spent a lot of time investigating alternatives (like React first). I'm intimidated by their resources and concerned that I'm making the wrong choice; however, I just can't understand why they have made what seems to me to be the wrong choice. I hope to understand more at the joint Sage/Jupyter Days 70 that we are organizing together in Berkeley, CA in November. (Edit: see https://github.com/ipython/ipython/issues/8239 for a discussion of why IPython/Jupyter uses PhosphorJS.) ### Tables and RethinkDB Our rewrite of SMC is built on Tables, Flux and React. Tables are client-side technology I wrote inspired by Facebook's GraphQL/Relay technology (and Meteor, Firebase, etc.); they synchronize data between clients and the backend database in realtime. Tables are defined by a JSON schema file, which specifies the fields in the table, and explains what get and set queries are allowed. A table is a subset of a much larger table in the database, with the subset defined by conditions that are relative to the user making the query. For example, the projects table has one entry for each project that the user is a collaborator on. Tables are automatically synchronized between the user and the database whenever the database changes, using RethinkDB changefeeds. RethinkDB's innovation is to build realtime updates -- triggered when the result of a query to the database changes -- directly into the database at the lowest level. Of course it is possible to build something that looks the same from the outside using either a message queue (say using RabbitMQ or ZeroMQ), or by watching the replication stream from the database and triggering actions based on that (like Meteor does using MongoDB). RethinkDB's approach seems better to me, putting the abstraction at the right level. That said, based on mailing list traffic, searches, etc., it seems that very, very few people get RethinkDB yet. Also, despite years of development, RethinkDB only became "production ready" a few months ago, and only got automatic failover a few weeks ago. That said, after ironing out some kinks, I'm now using it with heavy traffic in production and it works very well. ## Flux Once data is automatically synchronized between the database and web browsers in realtime, we can build everything else on top of this. Facebook also introduced an architecture pattern that they call Flux, which works well with React. It's very different than MVC-style two-way binding frameworks, where objects are directly linked to UI elements, with an object changing causing the UI element to change and vice versa. In SMC each major part of the system has two objects associated to it: Actions and Stores. We think of them in terms of the classical CQRS pattern -- command query responsibility segregation. Actions are commands -- they are Javascript "functions" that get stuff done, but they do not return values; instead, they impact the state of the store. The store has functions that allow one to query for the state of the store, but they do not change the state of the store. The store functions must only be functions of the internal state of the store and nothing else. They might cache their results and format their output to be very convenient for rendering. But that's it. Actions usually cause the corresponding store (or stores) to change. When a store changes, it emit a change event, which causes any React components that depend on the store to be updated, which in many cases means they are re-rendered. There are optimizations one can introduce to reduce the amount of re-rendering, which if one isn't careful leads to subtle bugs; pretty much the only subtle React UI bugs one hits are caused by such optimizations. When the UI re-renders, the user sees their view of the world change. The user then clicks buttons, types, etc., which triggers actions, which in turn update stores (and tables, hence propogating changes to the ultimate source of truth, which is the RethinkDB database). As stores update, the UI again updates, etc. ### Status So far, we have completely (re-)written the project listing, file manager, help/status page, new file page, project log, file finder, project settings, course management system, account settings, billing, project upgrade system, and file use notifications using React, Flux, and Tables, and the result works well. Bugs are much easier to fix, and it is easy (possible?) to understand the state of the system, since it is defined by the state of the database and the corresponding client-side stores. We've completely rethought everything about the UI in doing the rewrite of the above components, and it has taken several months. Also, as mentioned above, I completely rewrote most of the backend to use RethinkDB instead of Cassandra. There were also the weeks of misery for me after we made the switch over. Even after weeks of thinking/testing/wondering "what could go wrong?", we found out all kinds of surprising little things within hours of pushing everything into production, which took more than a week of sleep deprived days to sort out. What's left? We have to rewrite the file editor tabs system, the project tabs system, and all the applications (except course management): editing text files using Codemirror, task lists (which are suprisingly complicated!), color xterm terminals, Jupyter notebooks (which will still use an iframe for the notebook itself), Sage worksheets (with complicated html output embedded in codemirror), compressed file de-archiver, the LaTeX editor, the wiki and markdown editors, and file chat. We hope to find a clean way to abstract away the various SMC applications as plugins, so that other people can easily write their own applications/plugins that will run inside of SMC. There will be a rich collection of example plugins to build on, namely the ones listed above, which are all driven by critical-to-us real world applications. Discussion about this blog post on Hacker News. ## August 22, 2015 ### Benjamin Hackl #### Google Summer of Code 2015: Conclusion The “Google Summer of Code 2015” program has ended yesterday, on the 21. of August at 19.00 UTC. This blog entry shall provide a short wrap-up of our GSoC project. The aim of our project was to implement a basic framework that enables us to do computations with asymptotic expressions in SageMath — and I am very happy to say that we very much succeeded to do so. An overview of all our developments can be found at meta ticket #17601. Although we did not really follow the timeline suggested in my original proposal (mainly because the implementation of the Asymptotic Ring took way longer than originally anticipated) we managed to implement the majority of ideas from my proposal — with the most important part being that our current prototype is stable. In particular, this means that we do not expect to make major design changes at this point. Every detail of our design is well-discussed and can be explained. Of course, our “Asymptotic Expressions” project is far from finished, and we will continue to extend the functionality of our framework. For example, although working with exponential and logarithmic terms is currently possible, it is not very convenient because the$\log$,$\exp$, and power functions are not fully implemented. Furthermore, it would be interesting to investigate the performance-gain obtained by cythonizing the central parts of this framework (e.g. parts of the MutablePoset…) — and so on… To conclude, I want to thank Daniel Krenn for his hard work and helpful advice als my mentor, as well as the SageMath community for giving me the opportunity to work on this project within the Google Summer of Code program! ## August 17, 2015 ### Benjamin Hackl #### Asymptotic Expressions: Current Developments Since my last blog entry on the status of our implementation of Asymptotic Expressions in SageMath quite a lot of improvements have happened. Essentially, all the pieces required in order to have a basic working implementation of multivariate asymptotics are there. The remaining tasks within my Google Summer of Code project are: • Polish the documentation of our minimal prototype, which consists of #17716 and the respective dependencies. Afterwards, we will set this to needs_review. • Open a ticket for the multivariate asymptotic ring and put together everything that we have written so far there. In this blog post I want to give some more examples of what can be done with our implementation right now and what we would like to be able to handle in the future. ### Status Quo After I wrote my last blog entry, we introduced a central idea/interface to our project: short notations. By using the short notation factory for growth groups (introduced in #18930) it becomes very simple to construct the desired growth group. Essentially, monomial growth groups (cf. #17600), i.e. groups that contain elements of the form variable^power (for a fixed variable and powers from some base ring, e.g. the Integer Ring or even the Rational Field) are represented by variable^base , where the base ring is also specified via its shortened name. The short notation factory then enables us to do the following: sage: from sage.groups.asymptotic_growth_group import GrowthGroup sage: G = GrowthGroup('x^ZZ'); G Growth Group x^ZZ sage: G.an_element() x sage: G = GrowthGroup('x^QQ'); G Growth Group x^QQ sage: G.an_element() x^(1/2) Naturally, this interface carries over to the generation of asymptotic rings: instead of the (slightly dubious) "monomial" keyword advertised in my last blog entry, we can now actually construct the growth group by specifying the respective growth group via its short representation: sage: R.<x> = AsymptoticRing('x^ZZ', QQ); R Asymptotic Ring <x^ZZ> over Rational Field sage: (x^2 + O(x))^50 x^100 + O(x^99) Recently, we also implemented another type of growth group: exponential growth groups (see #19028). These groups represent elements of the form base^variable where the base is from some multiplicative group. For example, we could do the following: sage: G = GrowthGroup('QQ^x'); G Growth Group QQ^x sage: G.an_element() (1/2)^x sage: G(2^x) * G(3^x) 6^x sage: G(5^x) * G((1/7)^x) (5/7)^x Note: unfortunately, we did not implement a function that allows taking some element from some growth group (e.g. x from a monomial growth group) as the variable in an exponential growth group yet. Implementing some way to “change” between growth groups by taking the log or the exponential function is one of our next steps. We also made this short notation a central interface for working with cartesian products. This is implemented in #18587. For example, this allows to construct growth groups containing elements like$2^x \sqrt[5]{x^2} \log(x)^2$: sage: G = GrowthGroup('QQ^x * x^QQ * log(x)^ZZ'); G Growth Group QQ^x * x^QQ * log(x)^ZZ sage: G.an_element() (1/2)^x * x^(1/2) * log(x) sage: G(2^x * x^(2/5) * log(x)^2) 2^x * x^(2/5) * log(x)^2 Simple parsing from the symbolic ring (and from strings) is implemented. Like I have written above, operations like 2^G(x) or log(G(x)) are one of the next steps on our roadmap. ### Further Steps Of course, having an easy way to generate growth groups (and thus also asymptotic rings) is nice — however, it would be even better if the process of finding the correct parent would be even more automated. Unfortunately, this requires some non-trivial effort regarding the pushout construction — which will certainly not happen within the GSoC project. As soon as we have an efficient way to “switch” between factors of a growth group (e.g. by taking the logarithm or the exponential function), this has to be carried over up to the asymptotic ring. Operations like sage: 2^(x^2 + O(x)) 2^(x^2) * 2^(O(x)) where the output could also be 2^(x^2) * O(x^g) , where$g$is determined by series_precision() . Division of asymptotic expressions can be realized with just about the same idea, for example: $\frac{1}{x^2 + O(x)} = \frac{1}{x^2} \frac{1}{1 + O(1/x)} = x^{-2} + O(x^{-3}),$ and so on. If an infinite series occurs, it will have to be cut using an$O$-Term, most likely somehow depending on series_precision() as well. Ultimately, we would also like to incorporate, for example, Stirling’s approximation of the factorial such that we could do something like sage: n.factorial() sqrt(2*pi) * e^(n*log(n)) * (1/e)^n * n^(1/2) + ... which then can be used to obtain asymptotic expansions of binomial coefficients like$\binom{2n}{n}$: sage: (2*n).factorial() / (n.factorial()^2) 1/sqrt(pi) * 4^n * n^(-1/2) + ... As you can see, there is still a lot of work within our “Asymptotic Expressions” project — nevertheless, with the minimal working prototype and the ability to create cartesian products of growth groups, the fundament for all of this is already implemented! ## August 16, 2015 ### Michele Borassi #### Conclusion of the Main Part of the Project Hi! In this post, I will summarize the results obtained with the inclusion in Sage of Boost and igraph libraries. This was the main part of my Google Summer of Code project, and it was completed yesterday, when ticket 19003 was closed. We have increased the number of graph algorithms available in Sage from 66 to 98 (according to the list used in the initial comparison of the graph libraries [1]). Furthermore, we decreased the running-time of several Sage algorithms: in some cases, we have been able to improve the asymptotic running-time, obtaining up to 10000x improvements in our tests. Finally, during the inclusion of external algorithms, we have refactored and cleaned some of Sage source code, like the shortest path routines: we have standardized the input and the output of 15 routines related to shortest paths, and we have removed duplicate code as much as possible. More specifically, the first part of the project was the inclusion of Boost graph library: since the library is only available in C++, we had to develop an interface. This interface lets us convert easily a Sage graph into a Boost graph, and run algorithms on the converted graph. Then, we have written routines to re-translate the output into a Sage-readable format: this way, the complicated Boost library is "hidden", and users can interact with it as they do with Sage. In particular, we have interfaced the following algorithms: • Edge connectivity (trac.sagemath.org/ticket/18564); • Clustering coefficient (trac.sagemath.org/ticket/18811); • Cuthill-McKee and King vertex orderings (trac.sagemath.org/ticket/18876); • Minimum spanning tree (trac.sagemath.org/ticket/18910); • Dijkstra, Bellman-Ford, Johnson shortest paths (trac.sagemath.org/ticket/18931); All these algorithms were either not available in Sage, or quite slow, compared to the Boost routines. As far as we know, Boost does not offer other algorithms that improve Sage algorithms: however, if such algorithms are developed in the future, it will be very easy to include them, using the new interface. In the second part of the project, we included igraph: since this library already offers a Python interface, we decided to include it as an optional package (before it becomes a standard package, at least an year should pass [2]). To install the package, it is enough to type the following instruction from the Sage root folder: sage -i igraph # To install the igraph C core sage -i python_igraph # To install the Python interface Then, we can easily interact with igraph: for a list of available routines, it is enough to type "igraph." and click tab twice. To convert a Sage graph g_sage into an igraph graph it is enough to type g_igraph = g_sage.igraph_graph(), while a Sage graph can be instantiated from an igraph graph through g_sage=Graph(g_igraph) or g_sage=DiGraph(g_igraph). This way, all igraph algorithms are now available in Sage. Furthermore, we have included the igraph maximum flow algoritm inside the Sage corresponding function, obtaining significant improvements (for more information and benchmarks, we refer to ticket 19003 [3]). In conclusion, I think the project reached its main goal, the original plan was followed very closely, and we have been able to overcome all problems. Before closing this post, I would like to thank many people that helped me with great advices, and who provided great solutions to all the problems I faced. First of all, my mentor David Coudert: he always answered very fast to all my queries, and he gave me great suggestions to improve the quality of the code I wrote. Then, a very big help came from Nathann Cohen, who often cooperated with David in reviewing my code and proposing new solutions. Moreover, I have to thank Martin Cross, who gave me good suggestions with Boost graph library, and Volker Braun, who closed all my ticket. Finally, I have to thank the whole Sage community for giving me this great opportunity! [1] https://docs.google.com/spreadsheets/d/1Iu1hkQtRn9J-sgfZbQTu2RoXzyjoMEWP5-cm3nAwnWE/edit?usp=sharing [2] http://doc.sagemath.org/html/en/developer/coding_in_other.html [3] http://trac.sagemath.org/ticket/19003 ## July 27, 2015 ### Michele Borassi #### Including igraph Library Hello! In this new blog post, I would like to discuss the inclusion of igraph library inside Sage. Up to now, I have interfaced Sagemath with Boost graph library, in order to run Boost algorithms inside Sage. Now, I want to do the same with igraph, the other major C++ graph library, which stands out because it contains 62 routines, 29 of which are not available in Sage. Moreover, igraph library is very efficient, as shown in [1] and in the previous post on library comparison. This inclusion of igraph in Sage is quite complicated, because we have to include a new external library [2] (while in the Boost case we already had the sources). We started this procedure through ticket 18929: unfortunately, after this ticket is closed, igraph will only be an optional package, and we will have to wait one year before it becomes standard. The disadvantage of optional packages is that they must be installed before being able to use them; however, the installation is quite easy: it is enough to run Sage with option -i python_igraph. After the installation, the usage of igraph library is very simple, because igraph already provides a Python interface, that can be used in Sage. To transform the Sagemath network g_sage into an igraph network g_igraph, it is enough to type g_igraph=g_sage.igraph_graph(), while to create a Sagemath network from an igraph network it is enough to type g_sage = Graph(g_igraph) or g_sage=DiGraph(g_igraph). After this conversion, we can use all routines offered by igraph! For instance, if we want to create a graph through the preferential attachment model, we can do it with the Sagemath routine, or with the igraph routine: sage: G = graphs.RandomBarabasiAlbert(100, 2) sage: G.num_verts() 100 sage: G = Graph(igraph.Graph.Barabasi(100, int(2))) sage: G.num_verts() 100 The result is the same (apart from randomness), but the time is very different: sage: import igraph sage: %timeit G = Graph(igraph.Graph.Barabasi(10000000, int(2))) 1 loops, best of 3: 46.2 s per loop sage: G = graphs.RandomBarabasiAlbert(10000000, 2) Stopped after 3 hours. Otherwise, we may use igraph to generate graphs with Forest-Fire algorithm, which is not available in Sagemath: sage: G = Graph(igraph.Graph.Forest_Fire(10, 0.1)) sage: G.edges() [(0, 1, None), (0, 2, None), (1, 7, None), (2, 3, None), (2, 4, None), (3, 5, None), (3, 8, None), (4, 6, None), (8, 9, None)] We may also do the converse: transform a Sage network into an igraph network and apply an igraph algorithm. For instance, we can use label propagation to find communities (a task which is not implemented in Sage): sage: G = graphs.CompleteGraph(5)+graphs.CompleteGraph(5) sage: G.add_edge(0,5) sage: com = G.igraph_graph().community_label_propagation() sage: len(com) 2 sage: com[0] [0, 1, 2, 3, 4] sage: com[1] [5, 6, 7, 8, 9] The algorithm found the two initial cliques as communities. I hope that these examples are enough to show the excellent possibilities offered by igraph library, and that these features will soon be available in Sagemath! [1] https://sites.google.com/a/imtlucca.it/borassi/unpublished-works/google-summer-of-code/library-comparison [2] http://doc.sagemath.org/html/en/developer/packaging.html ## July 16, 2015 ### Benjamin Hackl #### Computing with Asymptotic Expressions It has been quite some time since my last update on the progress of my Google Summer of Code project, which has two reasons. On the one hand, I have been busy because of the end of the semester, as well as because of the finalization of my Master’s thesis — and on the other hand, it is not very interesting to write a post on discussing and implementing rather technical details. Nevertheless, Daniel Krenn and myself have been quite busy in order to bring asymptotic expressions to SageMath. Fortunately, these efforts are starting to become quite fruitful. In this post I want to discuss our current implementation roadmap (i.e. not only for the remaining Summer of Code, but also for the time afterwards), and give some examples for what we are currently able to do. ### Strutcture and Roadmap An overview of the entire roadmap can be found at here (trac #17601). Recall that the overall goal of this project is to bring asymptotic expressions like$2^n + n^2 \log n + O(n)$to Sage. Our implementation (which aims to be as general and expandable as possible) tackles this problem with a three-layer approach: • GrowthGroups and GrowthElements (trac #17600). These elements and parents manage the growth (and just the growth!) of a summand in an asymptotic expression like above. The simplest cases are monomial and logarithmic growth groups. For example, their elements are given by$n^r$and$\log(n)^r$where the exponent$r$is from some ordered ring like$\mathbb{Z}$or$\mathbb{Q}$. Both cases (monomial and logarithmic growth groups) can be handled in the current implementation — however, growth elements like$n^2 \log n$are intended to live in the cartesian product of a monomial and a logarithmic growth group (in the same variable). Parts of this infrastructure are already prepared (see trac #18587). • AsymptoticTerms and TermMonoids (trac #17715). While GrowthElements only represent the growth, AsymptoticTerms have more information: basically, they represent a summand in an asymptotic expression. There are different classes for each type of asymptotic term (e.g. ExactTerm and OTerm — with more to come). Additionally to a growth element, some types of asymptotic terms (like exact terms) also possess a coefficient. • AsymptoticExpression and AsymptoticRing (trac #17716). This is what we are currently working on, and we do have a running prototype! The version that can be found on trac is only missing some doctests and a bit of documentation. Asymptotic expressions are the central objects within this project, and essentially they are sums of several asymptotic terms. In the background, we use a special data structure (“mutable posets“, trac #17693) in order to model the (partial) order induced by the various growth elements belonging to an asymptotic expression. This allows to perform critical operations like absorption (when an $O$-term absorbs “weaker” terms) efficiently and in a simple way. The resulting minimal prototype can, in some sense, be compared to Sage’s PowerSeriesRing: however, we also allow non-integer exponents, and extending this prototype to work with multivariate expressions should not be too hard now, as the necessary infrastructure is there. Following the finalization of the minimal prototype, there are several improvements to be made. Here are some examples: • Besides addition and multiplication, we also want to divide asymptotic expressions, and higher-order operations like exponentiation and taking the logarithm would be interesting as well. • Also, conversion from, for example, the symbolic ring is important when it comes to usability of our tools. We will implement and enhance this conversion gradually. ### Examples An asymptotic ring (over a monomial growth group with coefficients and exponents from the rational field) can be created with sage: R.<x> = AsymptoticRing('monomial', QQ); R Asymptotic Ring over Monomial Growth Group in x over Rational Field with coefficients from Rational Field Note that we marked the code as experimental, meaning that you will see some warnings regarding the stability of the code. Now, as we have an asymptotic ring, we can do some calculations. For example, take$ (2\sqrt{x} + O(1))^{15}$: sage: (2*x^(1/2) + O(x^0))^15 O(x^7) + 32768*x^(15/2) We can also have a look at the underlying structure: sage: expr = (x^(3/7) + 2*x^(1/5)) * (x + O(x^0)); expr O(x^(3/7)) + 2*x^(6/5) + 1*x^(10/7) sage: expr.poset poset(O(x^(3/7)), 2*x^(6/5), 1*x^(10/7)) sage: print expr.poset.full_repr() poset(O(x^(3/7)), 2*x^(6/5), 1*x^(10/7)) +-- null | +-- no predecessors | +-- successors: O(x^(3/7)) +-- O(x^(3/7)) | +-- predecessors: null | +-- successors: 2*x^(6/5) +-- 2*x^(6/5) | +-- predecessors: O(x^(3/7)) | +-- successors: 1*x^(10/7) +-- 1*x^(10/7) | +-- predecessors: 2*x^(6/5) | +-- successors: oo +-- oo | +-- predecessors: 1*x^(10/7) | +-- no successors As you might have noticed, the “O”-constructor that is used for the PowerSeriesRing and related structures, can also be used here. In particular,$O(\mathit{expr})$acts exactly as expected: sage: expr O(x^(3/7)) + 2*x^(6/5) + 1*x^(10/7) sage: O(expr) O(x^(10/7)) Of course, the usual rules for computing with asymptotic expressions hold: sage: O(x) + O(x) O(x) sage: O(x) - O(x) O(x) So far, so good. Our next step is making the multivariate growth groups usable for the AsymptoticRing and then improving the overall user interface of the ring. ## July 09, 2015 ### Michele Borassi #### New Boost Algorithms Hello! My Google Summer of Code project is continuing, and I am currently trying to include more Boost algorithms in Sage. In this post, I will make a list of the main algorithms I'm working on. #### Clustering Coefficient If two different people have a friend in common, there is a high chance that they will become friends: this is the property that the clustering coefficient tries to capture. For instance, if I pick two random people, very probably they will not know each other, but if I pick two of my acquaintances, very probably they will know each other. In this setting, the clustering coefficient of a person is the probability that two random acquaintances of this person know each other. In order to quantify this phenomenon, we can formalize everything in terms of graphs: people are nodes and two people are connected if they are acquaintances. Hence, we define the clustering coefficient of a vertex $v$ in a graph $G=(V,E)$ as: $$\frac{2|\{(x,y) \in E:x,y \in N_v\}|}{\deg(v)(\deg(v)-1)}$$ where $N_v$ is the set of neighbors of $v$ and $\deg(v)$ is the number of neighbors of $v$. This is exactly the probability that two random neighbors of $v$ are linked with an edge. My work has included in Sagemath the Boost algorithm to compute the clustering coefficient, which is more efficient that the previous algorithm, which was based on NetworkX: sage: g = graphs.RandomGNM(20000,100000) sage: %timeit g.clustering_coeff(implementation='boost') 10 loops, best of 3: 258 ms per loop sage: %timeit g.clustering_coeff(implementation='networkx') 1 loops, best of 3: 3.99 s per loop But Nathann did better: he implemented a clustering coefficient algorithm from scratch, using Cython, and he managed to outperform the Boost algorithm, at least when the graph is dense. Congratulations, Nathann! However, when the graph is sparse, Boost algorithm still seems to be faster. #### Dominator tree Let us consider a road network, that is, a graph where vertices are street intersections, and edges are streets. The question is: if I close an intersection, where am I still able to go, assuming I am at home? The answer to this question can be summarized in a dominator tree. Assume that, in order to go from my home to my workplace, I can choose many different paths, but all these paths pass through the café, then they pass through the square (that is, if either the café or the square is closed, then there is no way I can go to work). In this case, in the dominator tree, the father of my workplace is the square, the father of the square is the café, and the father of the café is my home, that is also the root of the tree. More formally, given a graph $G$, the dominator tree of $G$ rooted at a vertex $v$ is defined by connecting each vertex $x$ with the last vertex $y \neq x$ that belongs to each path from $v$ to $x$ (note that this vertex always exists, because $v$ belongs to each path from $v$ to $x$). Until now, Sagemath did not have a routine to compute the dominator tree: I have been able to include the Boost algorithm. Unfortunately, due to several suggestions and improvements in the code, the ticket is not closed, yet. Hopefully, it will be closed very soon! #### Cuthill-McKee ordering / King ordering Let us consider a graph $G=(V,E)$: a matrix $M$ of size $|V|$ can be associated to this graph, where $M_{i,j}=1$ if and only if there is an edge between vertices $i$ and $j$. In some cases, this matrix can have specific properties, that can be exploited for many purposes, like speeding-up algorithms. One of this properties is bandwidth, which measures how far the matrix is from a diagonal matrix: it is defined as $\max_{M_{i,j} \neq 0}|i-j|$. A small bandwidth might help in computing several properties of the graph, like eigenvalues and eigenvectors. Since the bandwidth depends on the order of vertices, we can try to permute them in order to obtain a smaller value: in Sage, we have a routine that performs this task. However, this routine is very slow, and it is prohibitive even for very small graphs (in any case, finding an optimal ordering is NP-hard). Hence, researchers have developed heuristics to compute good orderings: the most important ones are Cuthill-McKee ordering and King ordering. Boost contains both routines, but Sage does not: for this reason, I would like to insert these two functions. The code is almost ready, but part of it depends on the code of the dominator tree: as soon as the dominator tree is reviewed, I will open a ticket on these two routines! #### Dijkstra/Bellman-Ford/Johnson shortest paths Let us consider again a road network. In this case, we are building a GPS software, which has to compute the shortest path between the place where we are and the destination. The textbook algorithm that performed this task is Dijkstra algorithm, which computes the distance between the starting point and any other reachable point (of course, there are more efficient algorithms involving a preprocessing, but Dijkstra is the most simple, and its running-time is asymptotically optimal). This algorithm is already implemented in Sagemath. Let's spice things up: what if that there are some streets with negative length? For instance, we like a street so much that we are willing to drive 100km more just to pass from that street, which is 50km long. It is like that street is -50km long! First of all, under these assumptions, a shortest path might not exist: if there is a cycle with negative length, we may drive along that cycle all the times we want, decreasing more and more the distance to the destination. At least, we have to assume that no negative cycle exists. Even with this assumption, Dijkstra algorithm does not work, and we have to perform Bellman-Ford algorithm, which is less efficient, but more general. Now, assume that we want something more: we are trying to compute the distance between all possible pairs of vertices. The first possibility is to run Bellman-Ford algorithm $n$ times, where $n$ is the number of nodes in the graph. But there is a better alternative: it is possible to perform Bellman-Ford algorithm only once, and then to modify the lengths of edges, so that all lengths are positive, and shortest paths are not changed. This way, we run Dijkstra algorithm $n$ times on this modified graph, obtaining a better running time. This is Johnson algorithm. Both Bellman-Ford and Johnson algorithms are implemented in Boost and not in Sagemath. As soon as I manage to create weighted Boost graphs (that is, graphs where edges have a length), I will include also these two algorithm! ## June 25, 2015 ### Michele Borassi #### Edge Connectivity through Boost Graph Library After two weeks, we have managed to interface Boost and Sagemath! However, the interface was not as simple as it seemed. The main problem we found is the genericity of Boost: almost all Boost algorithms work with several graph implementations, which differ in the data structures used to store edges and vertices. For instance, the code that implements breadth-first search works if the adjacency list of a vertex v is a vector, a list, a set, etc. This result is accomplished by using templates [1]. Unfortunately, the only way to interface Sagemath with C++ code is Cython, which is not template-friendly, yet. In particular, Cython provides genericity through fused types [2], whose support is still experimental, and which do not offer full integration with templates [3-5]. After a thorough discussion with David, Nathann, and Martin (thank you very much!), we have found a solution: for the input, we have defined a fused type "BoostGenGraph", including all Boost graph implementations, and all functions that interface Boost and Sagemath use this fused type. This way, for each algorithm, we may choose the most suitable graph implementation. For the output, whose type might be dependent on the input type, we use C++ to transform it into a "standard" type (vector, or struct). We like this solution because it is very clean, and it allows us to exploit Boost genericity without any copy-paste. Still, there are some drawbacks: 1) Cython fused types do not allow nested calls of generic functions; 2) Boost graphs cannot be converted to Python objects: they must be defined and deleted in the same Cython function; 3) No variable can have a generic type, apart from the arguments of generic functions. These drawbacks will be overcome as soon as Cython makes templates and generic types interact: this way, we will be able create a much stronger interface, by writing a graph backend based on Boost, so that the user might create, convert, and modify Boost graphs directly from Python. However, for the moment, we will implement all algorithms using the current interface, which already provides genericity, and which has no drawback if the only goal is to "steal" algorithms from Boost. As a test, we have computed the edge connectivity of a graph through Boost: the code is available in ticket 18564 [6]. Since the algorithm provided by Sagemath is not optimal (it is based on linear programming), the difference in the running time is impressive, as shown by the following tests: sage: G = graphs.RandomGNM(100,1000) sage: %timeit G.edge_connectivity() 100 loops, best of 3: 1.42 ms per loop sage: %timeit G.edge_connectivity(implementation="sage") 1 loops, best of 3: 11.3 s per loop sage: G = graphs.RandomBarabasiAlbert(300,3) sage: %timeit G.edge_connectivity(implementation="sage") 1 loops, best of 3: 9.96 s per loop sage: %timeit G.edge_connectivity() 100 loops, best of 3: 3.33 ms per loop Basically, on a random Erdos-Renyi graph with 100 vertices and 1000 edges, the new algorithm is 8,000 times faster, and on a random Barabasi-Albert graph with 300 nodes and average degree 3, the new algorithm is 3,000 times faster! This way, we can compute the edge connectivity of much bigger graphs, like a random Erdos-Renyi graph with 5,000 vertices and 50,000 edges: sage: G = graphs.RandomGNM(5,000, 50,000) sage: %timeit G.edge_connectivity() 1 loops, best of 3: 16.2 s per loop The results obtained with this first algorithm are very promising: in the next days, we plan to interface several other algorithms, in order to improve both the number of available routines and the speed of Sagemath graph library! [1] https://en.wikipedia.org/wiki/Template_%28C%2B%2B%29 [2] http://docs.cython.org/src/userguide/fusedtypes.html [3] https://groups.google.com/forum/#!topic/cython-users/qQpMo3hGQqI [4] https://groups.google.com/forum/#!searchin/cython-users/fused/cython-users/-7cHr6Iz00Y/Z8rS03P7-_4J [5] https://groups.google.com/forum/#!searchin/cython-users/fused$20template/cython-users/-7cHr6Iz00Y/Z8rS03P7-_4J
[6] http://trac.sagemath.org/ticket/18564

## June 09, 2015

### Michele Borassi

#### Performance Comparison of Different Graph Libraries

As promised in the last post, I have compared the performances of several graph libraries, in order to choose which ones should be deployed with Sagemath. Here, I provide the main results of this analysis, while more details are available on my website (see also the links below).
The libraries chosen are the most famous graph libraries written in Python, C, or C++ (I have chosen these languages because they are easier to integrate in Sagemath, using Cython). Furthermore, I have excluded NetworkX, which is already deployed with Sagemath.
First of all, I have to enforce that no graph library comparison can be completely fair, and also this comparison can be criticized, due to the large amount of available routines, to the constant evolution of libraries, and to many small differences in the outputs (for instance, one library might compute the value of a maximum s-t flow, another library might actually compute the flow, and a third one might compute all maximum flows). Despite this, I have tried to be as fair as possible, through a deeper and more detailed analysis than previous comparisons (https://graph-tool.skewed.de/performance, http://www.programmershare.com/3210372/, http://arxiv.org/pdf/1403.3005.pdf).
The first comparison deals with the number of algorithms implemented. I have chosen a set of 107 possible algorithms, trying to cover all possible tasks that a graph library should perform (avoiding easy tasks that are common to all libraries, like outputting the number of nodes, the number of edges, the neighbors of a node, etc). In some cases, two tasks were collapsed in one, if the algorithms solving these tasks are very similar (for instance, computing a maximum flow and computing a minimum cut, computing vertex betweenness and edge betweenness, etc).
The number of routines available for each library is plotted in the following chart, and a table containing all features is available in HTML or as a Google Sheet.

The results show that Sagemath has more routines than all competitors (66), closely followed by igraph (62). All other libraries are very close to each other, having about 30 routines each. Furthermore, Sagemath could be improved in the fields of neighbor similarity measures (assortativity, bibcoupling, cocitation, etc), community detection, and random graph generators. For instance, igraph contains 29 routines that are not available in Sagemath.

The second comparison analyzes the running-time of some of the algorithms implemented in the libraries. In particular, I have chosen 8 of the most common tasks in graph analysis: computing the diameter, computing the maximum flow between two vertices, finding connected components and strongly connected components, computing betweenness centrality, computing the clustering coefficient, computing the clique number, and generating a graph with the preferential attachment model. I have run each of these algorithms on 3 inputs, and I have considered the total execution time (excluding the time needed to load the graph). More details on this experiment are available here, and the results are also available in a Google Sheet.
In order to make the results more readable, I have plotted the ratio between the time needed by a given library and the minimum time needed by any library. If an algorithm was not implemented, or it needed more than 3 hours to complete, the corresponding bar is not shown.

Overall, the results show that NetworKit is the fastest library, or one of the fastest, in all routines that are implemented (apart from the generation of preferential attachment graphs, where it is very slow). Boost graph library is very close to NetworKit, and it also contains more routines. Also Sagemath is quite efficient in all tasks, apart from the computation of strongly connected components and the generation of a preferential attachment graph, where it needed more than 3 hours. However, in the latter case, the main problem was not speed but memory consumption.

In conclusion, Sagemath can highly benefit from the possibility of using algorithms from other libraries. First of all, it might improve the number of algorithms offered, especially by including igraph, and it also might improve its performance, by including Boost, NetworKit, or other fast graph libraries.

## June 04, 2015

### Michele Borassi

#### Comparison of Graph Libraries

Many times, people asked me "Which is the best available graph library?", or "Which graph library should I use to compute this, or that?".
Well, personally I love to use Sage, but there are also several good alternatives. Then, the question becomes "How could we improve Sage, so that people will choose it?".

In my opinion, graph libraries are compared according to the following parameters:
1. simplicity and documentation: people have little time, and the faster they learn how to use the library, the better;
2. number of routines available;
3. speed: sometimes, the input is very big, and the algorithms take much time to finish, so that a fast implementation is fundamental.
While it is very difficult to measure the first point, the others can be compared and improved. For this reason, in order to outperform other libraries, we should implement new features, and improve existing ones. You don't say!

However, this answer is not satisfactory: in principle, we could add all features available in other libraries, but this is a huge translational work, and while we are doing this work the other libraries will change, making this effort a never-ending story.

My project proposes an alternative: cooperating instead of competing. I will try to interface Sage with other libraries, and to use their algorithms when the Sage counterpart is not available, or less efficient. This way, with an affordable amount of work, we will be able to run all algorithms available in the best graph libraries!

As a first step, I have compared all the most famous C, C++, and Python graph libraries according to points 2 and 3, in order to choose which libraries should be included. The next posts will analyze the results of this comparison.

#### Google Summer of Code: let's start!

This blog will follow my Google Summer of Code project, entitled Performance Improvements for the Graph Module of Sagemath. The complete project is available here, and related documents with partial results will be available on the same website.
In this first post, I would like to thank my mentor David Coudert and Nathann Cohen, who helped me a lot in writing this project and understanding how the graph module of Sagemath works.
With their help, and with the help of the Sage community, I hope it will be a useful and funny work! Let's start!

## May 29, 2015

### Benjamin Hackl

#### Asymptotic Expressions: Motivation

$\def\R{\mathbb{R}}$So, as Google Summer of Code started on Monday, May 25th it is time to give a proper motivation for the project I have proposed. The working title of my project is (Multivariate) Asymptotic Expressions, and its overall goal is to bring asymptotic expressions to SageMath.

### What are Asymptotic Expressions?

A motivating answer for this question comes from the theory of Taylor series. Assume that we have a sufficiently nice (in this case meaning smooth) function $f : \R \to \R$ that we want to approximate in a neighborhood of some point $x_0 \in \R$. Taylor’s theorem allows us to write $f(x) = T_n(x) + R_n(x)$ where

$T_n(x) = \sum_{j=0}^n \frac{f^{(j)}(x_0)}{j!}\cdot (x-x_0)^j = f(x_0) + f'(x_0)\cdot (x-x_0) + \cdots + \frac{f^{(n)}(x_0)}{n!}\cdot (x-x_0)^n,$

and $R_n(x) = \frac{f^{(n+1)}(\xi)}{(n+1)!} \cdot (x-x_0)^{n+1}$, where $\xi$ lies in a neighborhood of $x_0$. Note that for $x\to x_0$, $R_n(x)$ “behaves like” $(x-x_0)^{n+1}$. In particular, we can certainly find a constant $C > 0$ such that $|R_n(x)| \leq C\cdot |x-x_0|^{n+1}$, or, in other words: for $x\to x_0$ the growth of the function $R_n(x)$ is bounded from above by the growth of $(x-x_0)^{n+1}$.

The idea of bounding the growth of a function by the growth of another function when the argument approaches some number (or $\infty$) is the central idea behind the big O notationFor function $f, g : \R \to \R$ we write $f(x) = O(g(x))$ for $x\to x_0$ if there is a constant $C > 0$ such that $|f(x)| \leq C\cdot |g(x)|$ for all $x$ in some neighborhood of $x_0$.

A case that is particularly important is the case of $x_0 = \infty$, that is if we want to compare and/or characterize the behavior of some function for $x\to\infty$, which is also called the functions asymptotic behavior. For example, consider the functions $\log x$, $x^3$ and $e^x$. All of them are growing unbounded for $x\to\infty$ — however, their asymptotic behavior differs. This can be seen by considering pairwise quotients of these functions: $\frac{x^3}{e^x} \to 0$ for $x\to\infty$, and therefore the asymptotic growth of $x^3$ can be bounded above by the growth of $e^x$, meaning $x^3 = O(e^x)$ for $x\to\infty$.

The analysis of a functions asymptotic behavior is important for many applications, for example when determining time and space complexity of algorithms in computer science, or for describing the growth of classes of combinatorial objects: take, for example, binary strings of length $2n$ that contain equally many zeros and ones. If $s_n$ denotes the number of such strings, then we have

$s_n = \binom{2n}{n} = \frac{4^n}{\sqrt{n\pi}} \left(1 + O\left(\frac{1}{n}\right)\right) \quad\text{ for } n\to\infty.$

Expressions like these are asymptotic expressions. When we consider asymptotic expressions in only one variable, everything works out nicely as a total order is induced. But as soon as multiple variables are involved, we don’t have a total order any more. Consider, for example, $x^2 y$ and $xy^2$ when $x$ and $y$ approach $\infty$. These two elements cannot be compared to each other, which complicates computing with these expressions as they may contain multiple “irreducible” O-terms.

The following univariate and multivariate examples shall demonstrate how computing with such expressions looks like (all variables are assumed to go to $\infty$):

$x + O(x) = O(x),\quad x^2 \cdot (x + O(1)) = x^3 + O(x^2),\quad O(x^2) \cdot O(x^3) = O(x^5),$

$x y + O(x^2 y) = O(x^2y),\quad (y \log y + O(y)) (x^2 y + O(4^x \sqrt{x})) = x^2 y^2 \log y + O(x^2 y^2) + O(4^x \sqrt{x} y \log y).$

Our plan is to provide an implementation based on which computations with these and more complicated expressions are possible.

### Planned Structure

There are four core concepts of our implementation.

• Asymptotic Growth Groups: These are multiplicative groups that contain growth elements like $x^2$, $\log x$, $2^x \cdot x \cdot \log x$. For starters, only univariate power growth groups will be implemented.
• Asymptotic Term Monoids: These monoids contain asymptotic terms — in essence, these are summands of asymptotic terms. Apart from exact term monoids (growth elements with a coefficient), we will also implement O-term monoids as well as a term monoid for a deviation of O-terms. Asymptotic terms have (in addition to their group operation, multiplication) absorption as an additional operation: for example, O-terms are able to absorb all asymptotically “smaller” elements.
• Mutable Poset: As we have mentioned above, due to the fact that multivariate asymptotic expressions do not have a total order with respect to their growth, we need a partially ordered set (“Poset”) that deals with this structure such that operations like absorbing terms can be performed efficiently. The mutable poset is the central data structure that asymptotic expressions are built upon.
• Asymptotic Ring: This is our top-level structure which is also supposed to be the main interaction object for users. The asymptotic ring contains the asymptotic expressions, i.e. intelligently managed sums of asymptotic terms. All common operations shall be possible here. Furthermore, the interface should be intelligent enough such that admissible expressions from the symbolic ring can be directly converted into elements of the asymptotic ring.

Obviously, this “planned structure” is rather superficial. However, this is only to supplement the motivation for my project with some ideas on the implementation. I’ll go a lot more into the details of what I am currently implementing in the next few blog posts!

## May 27, 2015

### William Stein

#### Guiding principles for SageMath, Inc.

In February of this year (2015), I founded a Delaware C Corporation called "SageMath, Inc.".  This is a first stab at the guiding principles for the company.    It should help clarify the relationship between the company, the Sage project, and other projects like OpenDreamKit and Jupyter/IPython.

### Company mission statement:

Make open source mathematical software ubiquitous.
This involves both creating the SageMathCloud website and supporting the development and distribution of the SageMath and other software, including Jupyter, Octave, Scilab, etc. Anything open source.

### Company principles:

• Absolutely all company funded software must be open source, under a GPLv3 compatible license. We are a 100% open source company.
• Company independence and self-determination is far more important than money. A core principle is that SMI is not for sale at any price, and will not participate in any partnership (for cost) that would restrict our freedom. This means:
• reject any offers from corp development from big companies to purchase or partner,
• do not take any investment money unless absolutely necessary, and then only from the highest quality investors
• do not take venture capital ever
• Be as open as possible about everything involving the company. What should not be open (since it is dangerous):
• finances (which could attract trolls)
• private user data
What should be open:
• aggregate usage data, e.g., number of users.
• aggregate data that could help other open source projects improve their development, e.g., common problems we observe with Jupyter notebooks should be provided to their team.
• guiding principles