April 13, 2020

William Stein

DataDog: Don't make the same mistake I did -- a followup and thoughts about very unhappy customers

This is a followup to my previous blog post about DataDog billing.

TL;DR:
- I don't recommend DataDog,
- dealing with unhappy customers is hard,
- monitoring for data science nerds?

Hacker News Comments

DataDog at Google Cloud Summit

I was recently at the Seattle Google Cloud Summit and DataDog was well represented, with the biggest booth and top vendor billing during the keynote. Clearly they are doing something right. I had a past unpleasant experience with them, and I had just been auditing my records and discovered that last year DataDog had actually charged me a lot more than I thought, so was kind of annoyed. Nonetheless, they kept coming up and talking to me, server monitoring is life-and-death important to me, and their actual software is very impressive in some ways.

Nick Parisi started talking with me about DataDog. He sincerely wanted to know about my past experience with monitoring and the DataDog product, which he was clearly very enthuisiastic about. So I told him a bit, and he encouraged me to tell him more, explaining that he did not work at DataDog last year, and that he would like to know what happened. So he gave me his email, and he was very genuinely concerned and helpful, so I sent him an email with a link to my post, etc. I didn't receive a response, so a week later I asked why, and received a followup email from Jay Robichau, who is DataDog's Director of Sales.

Conference Call with DataDog

Jay setup a conference call with me today at 10am (September 22, 2017). Before the call, I sent him a summary of my blog post, and also requested a refund, especially for the suprise bill they sent me nearly 6 weeks after my post.

During the call, Jay explained that he was "protecting" Nick from me, and that I would mostly talk with Michelle Danis who is in charge of customer success. My expectation for the call is that we would find some common ground, and that they would at least appreciate the chance to make things right and talk with an unhappy customer. I was also curious about how a successful startup company addresses the concerns of an unhappy customer (me).

I expected the conversation to be difficult but go well, with me writing a post singing the praises of the charming DataDog sales and customer success people. A few weeks ago CoCalc.com (my employer) had a very unhappy customer who got (rightfully) angry over a miscommunication, told us he would no longer use our product, and would definitely not recommend it to anybody else. I wrote to him wanting to at least continue the discussion and help, but he was completely gone. I would do absolutely anything I could to ensure he is a satisfied, if only he would give me the chance. Also, there was a recent blog post from somebody unhappy with using CoCalc/Sage for graphics, and I reached out to them as best I could to at least clarify things...

In any case, here's what DataDog charged us as a result of us running their daemon on a few dozen containers in our Kubernetes cluster (a contractor who is not a native English speaker actually setup these monitors for us):

07/22/2016  449215JWJH87S8N4  DATADOG 866-329-4466 NY  $639.19
08/29/2016 2449215L2JH87V8WZ DATADOG 866-329-4466 NY $927.22

I was shocked by the 07/22 bill which spured my post, and discovered the 8/29 one only later. We canceled our subscription on July 22 (cancelling was difficult in itself).

Michelle started the conference call by explaining that the 08/29 bill was for charges incurred before 07/22, and that their billing system has over a month lag (and it does even today, unlike Google Cloud Platform, say). Then Michelle explained at length many of the changes that DataDog has made to their software to address the issues I (and others?) have pointed out with their pricing description and billing. She was very interested in whether I would be a DataDog customer in the future, and when I said no, she explained that they would not refund my money since the bill was not a mistake.

I asked if they now provide a periodic summary of the upcoming bill, as Google cloud platform (say) does. She said that today they don't, though they are working on it. They do now provide a summary of usage so far in an admin page.

Finally, I explained in no uncertain terms that I felt misled by their pricing. I expected that they would understand, and pointed out that they had just described to me many ways in which they were addressing this very problem. Very surprisingly, Michelle's response was that she absolutely would not agree that there was any problem with their pricing description a year ago, and they definitely would not refund my money. She kept bringing up the terms of service. I agreed that I didn't think legally they were in the wrong, given what she had explained, just that -- as they had just pointed out -- their pricing and billing was unclear in various ways. They would not agree at all.

I can't recommend doing business with DataDog. I had very much hoped to write the opposite in this updated post. Unfortunately, their pricing and terms are still confusing today compared to competitors, and they are unforgiving of mistakes. This dog bites.    

(Disclaimer: I took notes during the call, but most of the above is from memory, and I probably misheard or misunderstood something. I invite comments from DataDog to set the record straight.)

Also, for what it is worth, I definitely do recommend Google Cloud Platform.  They put in the effort to do many things right regarding clear billing.

How do Startups Deal with Unhappy Customers?

I am very curious about how other startups deal with unhappy customers. At CoCalc we have had very few "major incidents" yet... but I want to be as prepared as possible. At the Google Cloud Summit, I went to some amazing "war storries by SRE's" session in which they talked about situations they had been in years ago in which their decisions meant the difference between whether they would have a company or job tomorrow or not. These guys clearly have amazing instincts for when a problem was do-or-die serious and when it wasn't. And their deep preparation "in depth" was WHY they were on that stage, and a big reason why older companies like Google are still around. Having a strategy for addressing very angry customers is surely just as important.


Google SRE's: these guys are serious.

I mentioned my DataDog story to a long-time Google employee there (16 years!) and he said he had recently been involved in a similar situation with Google's Stackdriver monitoring, where the bill to a customer was $85K in a month just for Stackdriver. I asked what Google did, and he said they refunded the money, then worked with the customer to better use their tools.

There is of course no way to please all of the people all of the time. However, I genuinely feel that I was ripped off and misled by DataDog, but I have the impression that Jay and Michelle honestly view me as some jerk trying to rip them off for $1500.   And they probably hate me for telling you about my experiences.

So far, with CoCalc we charge customers in advance for any service we provide, so less people are surprised by a bill.  Sometimes there are problems with recurring subscriptions when a person is charged for the upcoming month of a subscription, and don't want to continue (e.g., because their course is over), we always fully refund the charge. What does your company do? Why? I do worry that our billing model means that we miss out on potential revenue.

We all know what successful huge consumer companies like Amazon and Wal-Mart do.

Monitoring for Data Science Nerds?

I wonder if there is interest in a service like DataDog, but targeted at Data Science Nerds, built on CoCalc, which provides hosted collaborative Jupyter notebooks with Pandas, R, etc., pre-installed. This talk at PrometheusCon 2017 mostly discussed the friction that people face moving data from Prometheus to analyze using data science tools (e.g., R, Python, Jupyter). CoCalc provides a collaborative data science environment, so if we were to smooth over those points of friction, perhaps it could be useful to certain people. And much more efficient...


---



by William Stein ([email protected]) at April 13, 2020 01:38 PM

March 27, 2020

Sébastien Labbé

Computer experiments for the Lyapunov exponent for MCF algorithms when dimension is larger than 3

In November 2015, I wanted to share intuitions I developped on the behavior of various distinct Multidimensional Continued Fractions algorithms obtained from various kind of experiments performed with them often involving combinatorics and digitial geometry but also including the computation of their first two Lyapunov exponents.

As continued fractions are deeply related to the combinatorics of Sturmian sequences which can be seen as the digitalization of a straight line in the grid \(\mathbb{Z}^2\), the multidimensional continued fractions algorithm are related to the digitalization of a straight line and hyperplanes in \(\mathbb{Z}^d\).

This is why I shared those experiments in what I called 3-dimensional Continued Fraction Algorithms Cheat Sheets because of its format inspired from typical cheat sheets found on the web. All of the experiments can be reproduced using the optional SageMath package slabbe where I share my research code. People asked me whether I was going to try to publish those Cheat Sheets, but I was afraid the format would change the organization of the information and data in each page, so, in the end, I never submitted those Cheat Sheets anywhere.

Here I should say that \(d\) stands for the dimension of the vector space on which the involved matrices act and \(d-1\) is the dimension of the projective space on which the algorithm acts.

One of the consequence of the Cheat Sheets is that it made us realize that the algorithm proposed by Julien Cassaigne had the same first two Lyapunov exponents as the Selmer algorithm (first 3 significant digits were the same). Julien then discovered the explanation as its algorithm is conjugated to some semi-sorted version of the Selmer algorihm. This result was shared during WORDS 2017 conference. Julien Leroy, Julien Cassaigne and I are still working on the extended version of the paper. It is taking longer mainly because of my fault because I have been working hard on aperiodic Wang tilings in the previous 2 years.

During July 2019, Wolfgang, Valérie and Jörg asked me to perform computations of the first two Lyapunov exponents for \(d\)-dimensional Multidimensional Continued Fraction algorithms for \(d\) larger than 3. The main question of interest is whether the second Lyapunov exponent keeps being negative as the dimension increases. This property is related to the notion of strong convergence almost everywhere of the simultaneous diopantine approximations provided by the algorithm of a fixed vector of real numbers. It did not take me too long to update my package since I had started to generalize my code to larger dimensions during Fall 2017. It turns out that, as the dimension increases, all known MCF algorithms have their second Lyapunov exponent become positive. My computations were thus confirming what they eventually published in their preprint in November 2019.

My motivation for sharing the results is the conference Multidimensional Continued Fractions and Euclidean Dynamics held this week (supposed to be held in Lorentz Center, March 23-27 2020, it got cancelled because of the corona virus) where some discussions during video meetings are related to this subject.

The computations performed below can be summarized in one graphics showing the values of \(1-\theta_2/\theta_1\) with respect to \(d\) for various \(d\)-dimensional MCF algorithms. It seems that \(\theta_2\) is negative up to dimension 10 for Brun, up to dimension 4 for Selmer and up to dimension 5 for ARP.

/Files/2020/lyapunov_exponent_comparison.png

I have to say that I was disapointed by the results because the algorithm Arnoux-Rauzy-Poincaré (ARP) that Valérie and I introduced was not performing so well as its second Lyapunov exponent seems to become positive for dimension \(d\geq 6\). I had good expectations for ARP because it reaches the highest value for \(1-\theta_2/\theta_1\) in the computations performed in the Cheat Sheets, thus better than Brun, better than Selmer when \(d=3\).

The algorithm for the computation of the first two Lyapunov exponents was provided to me by Vincent Delecroix. It applies the algorithm \((v,w)\mapsto(M^{-1}v,M^T w)\) millions of times. The evolution of the size of the vector \(v\) gives the first Lyapunov exponent. The evolution of the size of the vector \(w\) gives the second Lyapunov exponent. Since the computation is performed on 64-bits double floating point number, their are numerical issues to deal with. This is why some Gramm Shimdts operation is performed on the vector \(w\) at each time the vectors are renormalized to keep the vector \(w\) orthogonal to \(v\). Otherwise, the numerical errors cumulate and the computed value for the \(\theta_2\) becomes the same as \(\theta_1\). You can look at the algorithm online starting at line 1723 of the file mult_cont_frac_pyx.pyx from my optional package.

I do not know from where Vincent took that algorithm. So, I do not know how exact it is and whether there exits any proof of lower bounds and upper bounds on the computations being performed. What I can say is that it is quite reliable in the sense that is returns the same values over and over again (by that I mean 3 common most significant digits) with any fixed inputs (number of iterations).

Below, I show the code illustrating how to reproduce the results.

The version 0.6 (November 2019) of my package slabbe includes the necessary code to deal with some \(d\)-dimensional Multidimensional Continued Fraction (MCF) algorithms. Its documentation is available online. It is a PIP package, so it can be installed like this:

sage -pip install slabbe

Recall that the dimension \(d\) below is the linear one and \(d-1\) is the dimension of the space for the corresponding projective algorithm.

Import the Brun, Selmer and Arnoux-Rauzy-Poincaré MCF algorithms from the optional package:

sage: from slabbe.mult_cont_frac import Brun, Selmer, ARP

The computation of the first two Lyapunov exponents performed on one single orbit:

sage: Brun(dim=3).lyapunov_exponents(n_iterations=10^7)
(0.30473782969922547, -0.11220958022368056, 1.3682167728713919)

The starting point is taken randomly, but the results of the form of a 3-tuple \((\theta_1,\theta_2,1-\theta_2/\theta_1)\) are about the same:

sage: Brun(dim=3).lyapunov_exponents(n_iterations=10^7)
(0.30345018206132324, -0.11171509867725296, 1.3681497170915415)

Increasing the dimension \(d\) yields:

sage: Brun(dim=4).lyapunov_exponents(n_iterations=10^7)
(0.32639514522732005, -0.07191456560115839, 1.2203297648654456)
sage: Brun(dim=5).lyapunov_exponents(n_iterations=10^7)
(0.30918877340506756, -0.0463930802132972, 1.1500477514185734)

It performs an orbit of length \(10^7\) in about .5 seconds, of length \(10^8\) in about 5 seconds and of length \(10^9\) in about 50 seconds:

sage: %time Brun(dim=3).lyapunov_exponents(n_iterations=10^7)
CPU times: user 540 ms, sys: 0 ns, total: 540 ms
Wall time: 539 ms
(0.30488799356325225, -0.11234354880132114, 1.3684748208296182)
sage: %time Brun(dim=3).lyapunov_exponents(n_iterations=10^8)
CPU times: user 5.09 s, sys: 0 ns, total: 5.09 s
Wall time: 5.08 s
(0.30455473631148755, -0.11217550411862384, 1.3683262505689446)
sage: %time Brun(dim=3).lyapunov_exponents(n_iterations=10^9)
CPU times: user 51.2 s, sys: 0 ns, total: 51.2 s
Wall time: 51.2 s
(0.30438755982577026, -0.11211562816821799, 1.368331834035505)

Here, in what follows, I must admit that I needed to do a small fix to my package, so the code below will not work in version 0.6 of my package, I will update my package in the next days in order that the computations below can be reproduced:

sage: from slabbe.lyapunov import lyapunov_comparison_table

For each \(3\leq d\leq 20\), I compute 30 orbits and I show the most significant digits and the standard deviation of the 30 values computed.

For Brun algorithm:

sage: algos = [Brun(d) for d in range(3,21)]
sage: %time lyapunov_comparison_table(algos, n_orbits=30, n_iterations=10^7, ncpus=8)
CPU times: user 190 ms, sys: 2.8 s, total: 2.99 s
Wall time: 6min 31s
  Algorithm     \#Orbits   $\theta_1$ (std)     $\theta_2$ (std)      $1-\theta_2/\theta_1$ (std)
+-------------+----------+--------------------+---------------------+-----------------------------+
  Brun (d=3)    30         0.3045 (0.00040)     -0.1122 (0.00017)     1.3683 (0.00022)
  Brun (d=4)    30         0.32632 (0.000055)   -0.07188 (0.000051)   1.2203 (0.00014)
  Brun (d=5)    30         0.30919 (0.000032)   -0.04647 (0.000041)   1.1503 (0.00013)
  Brun (d=6)    30         0.28626 (0.000027)   -0.03043 (0.000035)   1.1063 (0.00012)
  Brun (d=7)    30         0.26441 (0.000024)   -0.01966 (0.000027)   1.0743 (0.00010)
  Brun (d=8)    30         0.24504 (0.000027)   -0.01207 (0.000024)   1.04926 (0.000096)
  Brun (d=9)    30         0.22824 (0.000021)   -0.00649 (0.000026)   1.0284 (0.00012)
  Brun (d=10)   30         0.2138 (0.00098)     -0.0022 (0.00015)     1.0104 (0.00074)
  Brun (d=11)   30         0.20085 (0.000015)   0.00106 (0.000022)    0.9947 (0.00011)
  Brun (d=12)   30         0.18962 (0.000017)   0.00368 (0.000021)    0.9806 (0.00011)
  Brun (d=13)   30         0.17967 (0.000011)   0.00580 (0.000020)    0.9677 (0.00011)
  Brun (d=14)   30         0.17077 (0.000011)   0.00755 (0.000021)    0.9558 (0.00012)
  Brun (d=15)   30         0.16278 (0.000012)   0.00900 (0.000017)    0.9447 (0.00010)
  Brun (d=16)   30         0.15556 (0.000011)   0.01022 (0.000013)    0.93433 (0.000086)
  Brun (d=17)   30         0.149002 (9.5e-6)    0.01124 (0.000015)    0.9246 (0.00010)
  Brun (d=18)   30         0.14303 (0.000010)   0.01211 (0.000019)    0.9153 (0.00014)
  Brun (d=19)   30         0.13755 (0.000012)   0.01285 (0.000018)    0.9065 (0.00013)
  Brun (d=20)   30         0.13251 (0.000011)   0.01349 (0.000019)    0.8982 (0.00014)

For Selmer algorithm:

sage: algos = [Selmer(d) for d in range(3,21)]
sage: %time lyapunov_comparison_table(algos, n_orbits=30, n_iterations=10^7, ncpus=8)
CPU times: user 203 ms, sys: 2.78 s, total: 2.98 s
Wall time: 6min 27s
  Algorithm       \#Orbits   $\theta_1$ (std)     $\theta_2$ (std)      $1-\theta_2/\theta_1$ (std)
+---------------+----------+--------------------+---------------------+-----------------------------+
  Selmer (d=3)    30         0.1827 (0.00041)     -0.0707 (0.00017)     1.3871 (0.00029)
  Selmer (d=4)    30         0.15808 (0.000058)   -0.02282 (0.000036)   1.1444 (0.00023)
  Selmer (d=5)    30         0.13199 (0.000033)   0.00176 (0.000034)    0.9866 (0.00026)
  Selmer (d=6)    30         0.11205 (0.000017)   0.01595 (0.000036)    0.8577 (0.00031)
  Selmer (d=7)    30         0.09697 (0.000012)   0.02481 (0.000030)    0.7442 (0.00032)
  Selmer (d=8)    30         0.085340 (8.5e-6)    0.03041 (0.000032)    0.6437 (0.00036)
  Selmer (d=9)    30         0.076136 (5.9e-6)    0.03379 (0.000032)    0.5561 (0.00041)
  Selmer (d=10)   30         0.068690 (5.5e-6)    0.03565 (0.000023)    0.4810 (0.00032)
  Selmer (d=11)   30         0.062557 (4.4e-6)    0.03646 (0.000021)    0.4172 (0.00031)
  Selmer (d=12)   30         0.057417 (3.6e-6)    0.03654 (0.000017)    0.3636 (0.00028)
  Selmer (d=13)   30         0.05305 (0.000011)   0.03615 (0.000018)    0.3186 (0.00032)
  Selmer (d=14)   30         0.04928 (0.000060)   0.03546 (0.000051)    0.2804 (0.00040)
  Selmer (d=15)   30         0.046040 (2.0e-6)    0.03462 (0.000013)    0.2482 (0.00027)
  Selmer (d=16)   30         0.04318 (0.000011)   0.03365 (0.000014)    0.2208 (0.00028)
  Selmer (d=17)   30         0.040658 (3.3e-6)    0.03263 (0.000013)    0.1974 (0.00030)
  Selmer (d=18)   30         0.038411 (2.7e-6)    0.031596 (9.8e-6)     0.1774 (0.00022)
  Selmer (d=19)   30         0.036399 (2.2e-6)    0.030571 (8.0e-6)     0.1601 (0.00019)
  Selmer (d=20)   30         0.0346 (0.00011)     0.02955 (0.000093)    0.1452 (0.00019)

For Arnoux-Rauzy-Poincaré algorithm:

sage: algos = [ARP(d) for d in range(3,21)]
sage: %time lyapunov_comparison_table(algos, n_orbits=30, n_iterations=10^7, ncpus=8)
CPU times: user 226 ms, sys: 2.76 s, total: 2.99 s
Wall time: 13min 20s
  Algorithm                        \#Orbits   $\theta_1$ (std)     $\theta_2$ (std)      $1-\theta_2/\theta_1$ (std)
+--------------------------------+----------+--------------------+---------------------+-----------------------------+
  Arnoux-Rauzy-Poincar\'e (d=3)    30         0.4428 (0.00056)     -0.1722 (0.00025)     1.3888 (0.00016)
  Arnoux-Rauzy-Poincar\'e (d=4)    30         0.6811 (0.00020)     -0.16480 (0.000085)   1.24198 (0.000093)
  Arnoux-Rauzy-Poincar\'e (d=5)    30         0.7982 (0.00012)     -0.0776 (0.00010)     1.0972 (0.00013)
  Arnoux-Rauzy-Poincar\'e (d=6)    30         0.83563 (0.000091)   0.0475 (0.00010)      0.9432 (0.00012)
  Arnoux-Rauzy-Poincar\'e (d=7)    30         0.8363 (0.00011)     0.1802 (0.00016)      0.7845 (0.00020)
  Arnoux-Rauzy-Poincar\'e (d=8)    30         0.8213 (0.00013)     0.3074 (0.00023)      0.6257 (0.00028)
  Arnoux-Rauzy-Poincar\'e (d=9)    30         0.8030 (0.00012)     0.4205 (0.00017)      0.4763 (0.00022)
  Arnoux-Rauzy-Poincar\'e (d=10)   30         0.7899 (0.00011)     0.5160 (0.00016)      0.3467 (0.00020)
  Arnoux-Rauzy-Poincar\'e (d=11)   30         0.7856 (0.00014)     0.5924 (0.00020)      0.2459 (0.00022)
  Arnoux-Rauzy-Poincar\'e (d=12)   30         0.7883 (0.00010)     0.6497 (0.00012)      0.1759 (0.00014)
  Arnoux-Rauzy-Poincar\'e (d=13)   30         0.7930 (0.00010)     0.6892 (0.00014)      0.1309 (0.00014)
  Arnoux-Rauzy-Poincar\'e (d=14)   30         0.7962 (0.00012)     0.7147 (0.00015)      0.10239 (0.000077)
  Arnoux-Rauzy-Poincar\'e (d=15)   30         0.7974 (0.00012)     0.7309 (0.00014)      0.08340 (0.000074)
  Arnoux-Rauzy-Poincar\'e (d=16)   30         0.7969 (0.00015)     0.7411 (0.00014)      0.07010 (0.000048)
  Arnoux-Rauzy-Poincar\'e (d=17)   30         0.7960 (0.00014)     0.7482 (0.00014)      0.06005 (0.000050)
  Arnoux-Rauzy-Poincar\'e (d=18)   30         0.7952 (0.00013)     0.7537 (0.00014)      0.05218 (0.000046)
  Arnoux-Rauzy-Poincar\'e (d=19)   30         0.7949 (0.00012)     0.7584 (0.00013)      0.04582 (0.000035)
  Arnoux-Rauzy-Poincar\'e (d=20)   30         0.7948 (0.00014)     0.7626 (0.00013)      0.04058 (0.000025)

The computation of the figure shown above is done with the code below:

sage: brun_list = [1.3683, 1.2203, 1.1503, 1.1063, 1.0743, 1.04926, 1.0284, 1.0104, 0.9947, 0.9806, 0.9677, 0.9558, 0.9447, 0.93433, 0.9246, 0.9153, 0.9065, 0.8982]
sage: selmer_list = [ 1.3871, 1.1444, 0.9866, 0.8577, 0.7442, 0.6437, 0.5561, 0.4810, 0.4172, 0.3636, 0.3186, 0.2804, 0.2482, 0.2208, 0.1974, 0.1774, 0.1601, 0.1452]
sage: arp_list = [1.3888, 1.24198, 1.0972, 0.9432, 0.7845, 0.6257, 0.4763, 0.3467, 0.2459, 0.1759, 0.1309, 0.10239, 0.08340, 0.07010, 0.06005, 0.05218, 0.04582, 0.04058]
sage: brun_points = list(enumerate(brun_list, start=3))
sage: selmer_points = list(enumerate(selmer_list, start=3))
sage: arp_points = list(enumerate(arp_list, start=3))
sage: G = Graphics()
sage: G += plot(1+1/(x-1), x, 3, 20, legend_label='Optimal algo:$1+1/(d-1)$', linestyle='dashed', color='blue', thickness=3)
sage: G += line([(3,1), (20,1)], color='black', legend_label='Strong convergence threshold', linestyle='dotted', thickness=2)
sage: G += line(brun_points, legend_label='Brun', color='cyan', thickness=3)
sage: G += line(selmer_points, legend_label='Selmer', color='green', thickness=3)
sage: G += line(arp_points, legend_label='ARP', color='red', thickness=3)
sage: G.ymin(0)
sage: G.axes_labels(['$d$',''])
sage: G.show(title='Computation of first 2 Lyapunov Exponents: comparison of the value $1-\\theta_2/\\theta_1$\n for $d$-dimensional MCF algorithms Brun, Selmer and ARP for $3\\leq d\\leq 20$')

by Sébastien Labbé at March 27, 2020 01:00 PM

May 09, 2019

William Stein

Should I Resign from My Full Professor Job to Work Fulltime on Cocalc?

Nearly 3 years ago, I gave a talk at a Harvard mathematics conference announcing that “I am leaving academia to build a company”. What I really did is go on unpaid leave for three years from my tenured Full Professor position. No further extensions of that leave is possible, so I finally have to decide whether or not to go back to academia or resign.

How did I get here?

Nearly two decades ago, as a recently minted Berkeley math Ph.D., I was hired as a non-tenure-track faculty member in the mathematics department at Harvard. I spent five years at Harvard, then I applied for jobs, and accepted a tenured Associate Professor position in the mathematics department at UC San Diego. The mathematics community was very supportive of my number theory research; I skipped tenure track, and landed a tier-1 tenured position by the time I was 30 years old. In 2006, I moved from UCSD to a tenured Associate Professor position at the University of Washington (UW) mathematics department, primarily because my wife was a graduate student there, UW has strong research in number theory and algebraic geometry, and they have a good culture supporting undergraduate research.

Before I left Harvard, I started the SageMath open source software project, initially with the longterm goal of creating a free open source viable alternative to Mathematica, Maple, Matlab and Magma. As a result, in addition to publishing dozens of research mathematics papers and some books, I also started spending a lot of my time writing software, and organizing Sage Days workshops.

Recruiting at UW Mathematics

At UW, I recruited an amazing team of undergraduates and grad students who had a major impact on the development of Sage. I was blown away by the quality of the students (both undergrad and grad) that I was able to get involved in Sage development. I fully expected that in the next few years I would have the resources to hire some of these students to work fulltime on Sage. They had written the first versions of much of the core functionality of Sage (e.g., graph theory, symbolic calculus, matrices, and much more).

I was surprised when my application for Full Professor at UW was delayed for one year because – I was told – I wasn’t publishing enough research papers. This was because I was working very hard on building Sage, which was going extremely well at the time. I took the feedback seriously, and put more time into traditional research and publishing; this was the first time in my life that I did research mathematics for reasons other than just because I loved doing it.

I tried very hard to hire Bill Hart as a tenure-track faculty member at UW. However, I was told that his publication count was “a bit light”, and I did not succeed at hiring him. If you printed out the source code of software he has written, it would be a tall stack of paper. In any case, I totally failed at the politics needed to make his case and was left dispirited, realizing my personal shortcomings at department politics meant I probably could not hire the sort of colleagues I desperately needed.
UW was also very supportive of me teaching an undergrad course on open source math software (it evolved into this). I taught a similar course at the graduate level once, and it went extremely well, and was in my mind the best course I ever taught at UW. I was extremely surprised when my application to teach that grad course again was denied, and I was told that grad students should just go to my undergraduate course. I thought, “this is really strange”, instead of lobbying to teach the course and better presenting my case.

To be clear, I do not mean to criticize the mathematics department. The UW math department has thought very hard and systematically about their priorities and how they fit into UW. They are a traditional pure mathematics departments that is generally ranked around 25 in the country, with a particular set of strengths. There is a separate applied math department on campus, several stats departments, and a massive School of Computer Science. Maybe I was in the wrong place to try to hire somebody whose main qualification is being world class at writing mathematical software. This blog post is about the question of whether the UW math department is the right place for me or not.

Outside Grant Support?

My number theory research received incredible support from the NSF, with me being the PI on six NSF grants. Also, Magma (which is similar to Sage, but closed source) had managed to find sufficient government funding, so I remained optimistic. Maybe I could fund people to build Sage via grants, and even start an institute! I applied for grants to support work on SageMath at a larger scale, and had some initial success (half of a postdoc, and some workshops, etc.).

Why is grant funding so important for Sage? The goal of the SageMath project is to create free open source software that is a viable alternative to Mathematica, Maple, Matlab, and Magma – software produced by companies with a combined thousands of fulltime employees. Though initial progress was encouraging, it was clear that I desperately needed significant money to genuinely compete. For example, one Sage developer had a fantastic Sage development project he wanted about 20K to work fulltime on during a summer, and I could not find the money; as a result he quit working on Sage. This project involved implementing some deep algorithms that are needed to more directly compete with Mathematica for solving symbolic inequalities. This sort of thing happened over and over again, and it began to really frustrate me. I could get plenty of funding for 1-week workshops (just travel expenses – everybody works for free), but there’s only so much you can do at such sprints.

I kept hearing that there would be a big one-in-10-years NSF institutes competition sometime in the “next year or two”. People hinted to me that this would be a good thing to watch out for, and I dreamed that I could found such an institute, with the mission to make it so the mathematics community finally owned the deep software on which teaching and research are based. This institute would bring the same openness and robustness to computational mathematics that rigorous proof had brought to mathematics itself a century earlier.

Alas, this did not happen. I remember the moment I found out about the actual NSF institutes competition. Joe Silverman was standing behind me at a coffee break at The Arizona Winter School 2010 telling people about how his proposal for ICERM had just won the NSF institutes competition. I spun around and congratulated him as I listened to how much work it was to put together the application during the last year; internally, my heart sunk. Not only did I not win, I didn’t even know the competition had happened! I guess I was too busy working on Sage. In any case, my fantasy of creating an NSF-funded institute died at that moment. Of course, ICERM has turned out to be a fantastic institute, and it has hosted several workshops that support the development of open source math software.

Around this time, I also started having my grant proposals denied for reasons I do not understand. This was confusing to me, after having received so many NSF grants before. In 2012, the Simons Foundation put out a call for something that potentially addressed what I had hoped to accomplish via an NSF-funded institute. I was very excited again, but that did not turn out as I had hoped. So next I tried something I never thought I would ever do in a million years…

Commercialization at UW

For various reasons, I failed to get the NSF or other foundations to fund Sage at the level I needed, so in 2013, I decided to try to sell a commercial product, and use the profits to fund Sage development. I first tried to do this at University of Washington, by working with the commercialization office (C4C) to sell access to Sage online. As long as the business and product were merely abstract ideas (e.g., let’s make up a name and trademark it! let’s write some terms of service!) things went fine. However, when things became much more concrete, working with C4C got strange and frustrating for me. I was clearly missing something.

For example, the first thing C4C told me on the very first day we sat down together was they would not work with me if I made the software I wrote for this open source, and that the university would own the software. Given there was no software at all yet, and I imagined I would just whip out a quick modern web-based frontend to Sage and make boatloads of money that would go straight into a UW account to be used to fund Sage, this seemed fine to me. However, I had a nagging feeling that a pure closed-source approach to this problem was impossible, and not having that flexibility would come back to haunt me.

Naively optimistic, I found myself working fulltime at UW and at the same time trying to get a sophisticated web application off the ground by myself, with many important early users depending on it for their classes. This was stressful and took an enormous amount of time and attention. I felt like I was just part of the software, often getting warnings that things were broken or breaking, and manually fixing them. The toil was high, and only got worse as more people used the software. I would get woken up all night. I couldn’t travel since things were constantly breaking.

Every time I fought through some really difficult problem with the web application instead of just giving up, I came out far more determined not to quit.

The web application described above evolved over 6 years into what is now https://CoCalc.com; the functionality was pretty similar from day 1, but quality and scalability have come a long ways. CoCalc lets you collaboratively use LaTeX, Sage, Terminals, Jupyter Notebooks, etc., for teaching and research.

In 2014, I went on sabbatical and worked fulltime developing this web application and the feedback loop I described above only grew more intense: fix things, fight through difficult problems, be even more determined not to give up. Fortunately, I had some leftover NSF grant funds, and was able to use them to hire several students to help with development. I failed to find students who I could hire to do the backend work (and be available any time day or night), which meant that much of the stress of keeping the site running continued to fall squarely on my shoulders. And as the site grew in popularity (and functionality), the stress from it got worse.

My Sabbatical ended, and I was required to return to UW fulltime for one year, or return all the money I was paid during my sabbatical. So far, CoCalc had grown in popularity, but I had not been allowed by the “commercialization office” to actually commercialize it, so it was still a free site.
I taught at UW at the same time as being the main person trying to run this very complicated and painful production web application. Based on user feedback, I was also highly motivated to improve CoCalc. I would typically sleep a few hours, get up at 3am and write code until 8am, then prepare to teach, hope not to have any site issues right before class, and so on. One day CoCalc got hit by a massive DDoS attack minutes before a class I was teaching, while I was talking with a prospective donor to the math department.

I am the sort of person who does well focusing on exactly one thing at a time. Given the chance to fully focus on one thing for extended periods of time, I sometimes even do things that really matter and have an impact. I am not great at doing many different things at once.

In the meantime, Sage itself was growing and receiving funding, though this had nothing to do with me. For example, Gregg Musiker was putting together a big program at IMA, in the form of a ton of Sage Days workshops. Also, the huge ODK project, which was a European Union grant proposal to support open source math software would be fully funded. And closer to home, Moore and Sloane funded a major new initiative that could potentially have also supported work on Sage. I was invited to go to workshops and events involving these and other grants, but often I either said no or canceled at the last minute due to the toil needed just to keep CoCalc running. Also, I believed if I could start charging customers, then I would have a lot more money, and could hire more help.
I met with more senior people at UW’s C4C to finally actually charge people to use CoCalc. They wanted me to do some integration with their license management system, and sell “express” software licenses. It didn’t make any sense to me, and we went around in circles. I then asked about actually starting a separate company (a legal entity) that the university would have some ownership in, so that the company could take payments, etc. This is when things got really weird. They would not talk with me about creating the company due to “conflict of interest”.

I searched for other UW faculty that had commercialized remotely similar products, and found one. He told me how it went, and said it was the worst experience of his life. UW owned 50% of the company, and all of the software of the company, which they licensed under onerous terms. They refused to negotiate anything with him, instead requiring his spinoff company to hire an outside negotiator. As a result of all this, I educated myself as much as possible about relevant rules and laws, and consulted with a lawyer.

It turns out that the NSF grants I used to fund work on CoCalc explicitly stipulated that code funded by those grants had to be GPL licensed. This meant all the code for CoCalc had to be open sourced. Later the university even agreed in writing to release a snapshot of all the CoCalc code under the BSD license, and I haven’t been paid a penny by UW since the date of that release, so there is no possible claim that the company can’t use the code.

Building a company

A colleague of mine from when I was at Harvard was in town for a day, and we met for coffee. He expected we would talk about Sage and number theory, but instead I told him about CoCalc and my attempts at commercialization and starting a company. He immediately suggested a solution to my problems, which was to talk with a friend of his who had both extensive experience working with companies and deep connections with mathematics. I was confident that in the worst case I could quit my job at UW and rewrite all the software from scratch, so I took him up on the offer.
In 2015 I formed a corporation, and received some outside investment, and used that (and dramatically cutting my already-small academic income) to “leave academia”. More precisely, in 2016 (after working fulltime for a year at UW), I finally went on 100% unpaid leave from UW in order to completely focus on CoCalc development and getting a business off the ground. Also, there was no good reason to quit a tenured Full Professor job when you can go on leave; also CoCalc supports teaching in math departments, so it is closely related to my academic job. The only academic responsibilities I had were to my two Ph.D. students, who I meet with one-on-one at least once a week. At the end of two years, I requested a third year of unpaid leave, which UW granted (this is not routine). Throughout all this, the UW mathematics department was very supportive.
During these three years on unpaid leave, I’ve hired three other people who work fulltime on CoCalc. Together we have massively improved the software, and built a business with thousands of paying customers. The company is still not profitable, though the future is clearly very bright if we continue what we are currently doing. CoCalc has become a platform that an increasing number of scalable products (such as this) are being built with, and there is enormous growth potential in the next year.

At this point, it rightfully appears to the community that I have left SageMath development to focus fulltime on building CoCalc as an independent business. Indeed, I do not spend any significant time contributing to Sage, and I even switched to getting daily digests of the sage-devel mailing list.
On the other hand, as mentioned above, CoCalc is going well by many metrics (in terms of quality, feature development, customer love, market position, etc.). Most importantly, me and the other three people who work fulltime on CoCalc really, really love this job, and the potential to have a significant impact. I still don’t know if CoCalc will ever be wildly profitable and massively fund Sage development. If I were to obsess over only that goal, I would have to quit working on CoCalc (since it is taking way too long) and pursue other opportunities for funding Sage.

In retrospect, my idea from 7 years ago to start a web-based software company from scratch and build it into a successful profitable business has so far completely failed to fund Sage.

It would be far easier to work fulltime writing grants to foundations, and leveraging the acknowledged success of Sage so far. I made the wrong move, given my original goal. The surprise is that I really enjoy what I’m doing right now!

My unpaid leave is up – what am I going to do?

My third year of unpaid leave from UW is up. I have to decide whether to return to UW or resign. If I return, it turns out that I would have to have at least a 50% appointment. I currently have 50% of one year of teaching in “credits”, which means I wouldn’t be required to teach for the first year I go back as a 50% appointment. Moreover, the current department chair (John Palmieri) understands and appreciates Sage – he is among the top 10 all time contributors to the source code of Sage!

I have decided to resign. I’m worried about issues of intellectual property; it would be extremely unfair to my employees, investors and customers if I took a 50% UW position, and then later got sued by UW as a result. Having a 50% paid appointment at UW subjects one to a lot of legal jeopardy, which is precisely why I have been on 100% unpaid leave for the last three years. But more importantly, I feel very good about continuing to focus 100% on the development of CoCalc, which is going to have an incredible year going forward. I genuinely love building this (non-VC funded) company, and feel very good about it.

by William Stein ([email protected]) at May 09, 2019 02:21 PM