Working with large datasets in Python

How can you work with large datasets in Python — millions, billions, or more records?

Here is a great answer to that question:

https://www.quora.com/Can-Python-Pandas-handle-10-million-rows-What-are-some-useful-techniques-to-work-with-the-large-data-frames

  1. Pandas is memory hungry, you may need 8-16GB of memory or more to load your dataset into memory to work efficiently.

You can use large / extra large AWS cloud systems for temporary access. Being able to spin up cloud platforms on demand is only one part of the equation. You also need to get your data in and out of the cloud platform. So persistent storage and on-demand compute is a likely strategy.

  1. Work in stages, and save each stage.

You will also want to be able to save your intermediate states. If you process CSV or JSON — or perform filter, map, reduce, etc functions you’ll want those to be atomic processes.

And you’ll want to persist work as you go. If you process 100 million rows of data and something happens on row 99 million, you don’t want to have to re-do the whole process to get a clean data transformation. Especially if it takes several minutes or hours.

Better to save each stage iteratively and incur the IO cost in your ETL or processing loop than to lose your work — or have your data corrupted.

  1. Take adavantage of multiprocessing

Break work into batches that can be performed in parallel. If you have multiple CPUs, take advantage of them.

Python doesn’t do this by default, and Pandas normally works on a single thread. Dask or Vaex can work in parallel where Pandas itself cannot.
You might also consider using a Streaming processor such as Apache Spark instead of doing all your processing in a single DataFrame.

  1. Use efficient functions and data objects

Earlier I talked about saving incrementally within your processing loop. But don’t go overboard.

You do not want to open and close files every iteration in your inner loop millions of times. Make sure you fetch the data you need only once. Don’t reinitialize objects. Even someting as simple as calling len(data) can really add up. Finding out the length of an expanding list with millions of rows millions of times really adds up.

Also, consider when you want to use a list vs a numpy array, etc.

Test Automation isn’t for everyone

I once knew a guy who was a talented craftsman, he could make beautiful hand crafted furniture. His business became popular and demand increased. He got investment and built a large shop, hired a couple assistants and bought addition machinery so he could help meet the demand.

The business was a big success and grew even more. He was almost able to pay back his loan in just a couple years, but then sold his business (which is still successful today under new owners) and went back to working alone on individual orders. He still does quite well and with the proceeds from the sale has a comfortable life, but nothing like it could have been if he’d kept the company.

It turned out he wasn’t as interested in running a manufacturing business as working with his hands, alone, in his little shop in the woods.

Testers (and their bosses) should consider this before thinking about switching from manual QA to automation. It takes different skills and a different mindset, which can be learned, but may not be what you enjoy.

I happened to enjoy the change myself, but I can sympathize with those who don’t.

Challenges inheriting an existing test framework

This post started as a comment on to the following discussion on LinkedIn.

Inheriting a framework can be challenging.

First of all, because a real world framework is more complex than something just created. There is bound to be exception cases and technical debt included.

Secondly, an existing test framework was built by people on deadlines with limited knowledge. There are bound to be mistakes, hacks, and workarounds. Not to mention input from multiple people with different skillsets and opinions.

In this case, the best way to understand your framework is to understand your tests. Your tests exercise the framework. Pick one of low-moderate complexity and work your way through it, understanding the configuration, data, and architecture.

Don’t be afraid to break things. After all, you have all these tests that will let you know if you broke the framework.

Lastly, having a checklist of things to look for, good and bad practices will help you understand the framework better — and help you know what quality of framework you’re dealing with. Is this mystery function really clever or just a bad idea?

Look for standard patterns. OOP, Page Objects, etc. Also look for common problems – hard coded values, repetitious code, etc.

Test Automation Can’t Catch Everything

I remember a time, years ago, when I was working at a company at which I learned a lot about my craft.

Selenium was fairly new and I was one of the early adopters. I’d developed a pattern for structuring tests that I shared with the community and found that several others had independently developed ideas similar to my own “Page Objects.”

Agile was just beginning to penetrate mainstream development, and was at the same time attracting some healthy skipticism. Pair programming and test driven development were considered “extreme” and other patterns like Scrum were thought of as either common sense or a cargo cult.

Continuous Integration was still a novel idea, although my first job at Microsoft was essentially as part of a manual continuous integration process, no tools yet existed at the time to accomplish it. Now there were open source project like Cruise Control and Hudson (which became Jenkins) coming out.

And while I’d been involved in and an advocate of each of these — Selenium, Agile, and Continuous Integration, I’d yet to see them widely adopted and successfully implemented within an organization.

But about the time I came back from Fiji and stepped back into software develpment, all these things were starting to coalesce. At least they were for me in the mid 2000s.

We had a cross functional team of (mainly) developers, testers, sysadmins, designers, analysts, all working together We had business & customers giving feedback and acceptance criteria. We wrote user stories and assigned points. We wrote tests first and testers paired with developers. We sat with customers and took usability notes. We talked design patterns and had self organizing teams. We had fun and I learned a lot. I still keep in contact with some of my co-workers from more than a dozen years ago.

It was about a year and a half into it that we started to hit the wall. Tests were taking too long to execute. Complexity was slowing us down. We kept plugging along and using technology to fight the friction.

Parallel builds, restructuring tests, virtualized development environments. We fought technical debt with technical solutions and beat it back down.

I thought we were winning. But I had a manager, an old school QA guy, who knew that something was off. I think the rest of us were too busy churning out code, delivering features, spinning our wheels to see it.

But he saw it. He tried to alert upper management, he tried to get through to developers. We had lots of automated tests. We were deliver features every two weeks. Occasionally we’d stop and do a spike to experiment, or take some time off to reduce tech debt, refactor code, or rewrite tests. But we still had decent momentum.

Finally he got a together a group of stakeholders. Give me 15 minutes, he said, and I’ll break it. It took him maybe five. And it wasn’t that hard. And it was something users could be expected to do.

The moral of the story is: You can’t catch everything with automation. While automation is good, it frees up time to do other things, it allows feedback to occur faster, it helps testers to concentrate on finding issues instead of verifying functionality. It helps develpers to think about how they’re going to write their code, to reduce complexity, and to define clean interfaces.

Test Automation is like guard rails. It can help you from falling off the edge, or it can give you something to hold onto as you traverse a tricky slope.

It can catch obvious problems, much like the railing that helps you to not fall off the ledge (you’re not planning to do that anyway.) Anyone can walk along a narrow path and not fall off a cliff, test automation makes it so that you can run. But it won’t stop you from going off if you go past it.

In order for tests to be effective you need to have clear requirements, and then you need to explore beyond them. Test automation isn’t good at exploring. It’s good at adding guard rails to a known path.

And even if you catch all the functional problems, you need to be able to check performance, ensure security, and test usability.

If you have testers churning out tests, both automated exploratory tests, but no one else is involved, or hears their feedback, it’s not going to matter much. It’s like having guard rails and warning signs that everyone just hops over and walks right past. It’s no good finding bugs if they don’t get fixed. And eventually, if you have a bug backlog that’s too big, people will just ignore it.

You need test automation, but you also need exploratory, human testing. And you need everyone — not just testers to be concerned with quality. Quality code, environments, user experience, and requirements.

So use test automation for what it does best — providing quick feedback for repetitive, known functionality. Don’t try to get it to replace a complete quality assurance process.

There are two types of tests

Here’s a great question from a fellow test automation expert:

What are your thoughts on doing test automation by developers vs testers?

I think there’s a place for both. (That’s the politic answer.)

But in general, I categorize tests into two main types: developer focused tests, and user focused tests.

The goal of developer focused testing is to help the developer keep their code organized, to verify what they write does what they intended, and to allow them to mitigate technical debt.

Unit tests are the obvious example here. It shows that function x returns y given condition z. Which allows them to have confidence that they did what was expected, they accounted for exceptions and edge cases, and helps with establishing clean interfaces and aids in refactoring.

But there is also a lot more developer logic in front ends these days, so they may write some UI tests — particularly if they are front end developers and they are primarily concerned with the UI. Which usually requires data and user events. It’s great if they can isolate and mock this to test UI components individually, but they also need to be tested for correct integration.

So developers writing UI test automation does make sense.

But…

The other type of testing, user testing / acceptance testing / functional testing / end-to-end testing / regression testing / exploratory testing / quality assurance — whatever you want to call it (and these are overlapping buckets, not synonyms) — is based on the principal that you can’t know what you don’t know.

A developer can’t write a test that checks that a requirement has been met beyond what he understands the requirement to be. You can’t see your blind spots.

And a developer’s task is primarily creative — I want to get the computer to do this. They aren’t thinking in the mindset of what else could go wrong, or what about this other scenario.

It’s like having an editor look at your writing (I could use an editor). You’re too close to it to look at it objectively.

That’s not to say you can’t go back to it with fresh eyes later, take off your “developer” hat and put on your “tester” hat. But it’s likely that someone else will see different things. And looking at it from an object (or at least different) perspective is likely to identify different issues.

Some people are naturally (or trained to be) better at finding those exceptions and edge cases. My wife calls that type of person “critical” or “pessimistic” — I prefer the term tester.

Regardless, the second type of test — the big picture test, that looks at things from a user’s perspective, not a developer’s perspective, is — I think — critical.

And the industry has assumed that for decades. How that is accomplished, and how valuable that is has always been up for debate, but I think the general principals are:

  1. That there are two types of tests: those designed to help the developer do his work, and those that are designed to check what he has done.
  2. That there are two different roles: the creative process of making something work (development) and the exploratory process of making sure something doesn’t go wrong (testing).

Anyway, that’s my (overly) long take, summed up. I could go on about this for hours.

On Customer Development

I was recently asked about Customer Development as a process. I looked it up to see what the formal definition is, and concluded that I don’t know too much about the official “Customer Development” process, but I understand and practice the general principles of Customer Development in my own business.

Here my reply:

Do you know how to do customer development?

Not in a formal way. But I have two strategies I use for finding & acquiring customers.

  1. In areas I have expertise (such as software testing & delivery) I’ve built a strong network and reputation by publishing articles & tutorials and offering training videos and meetups.). This works great.
  2. In areas where I am not an expert where I have built products on a larger scale, I have relied on intuition and research to discover needs. I then build a customer base as I develop the product through social media and direct contact “growth hacking” with a freemium model.

With Resumelink, for example, I contacted individuals I thought would benefit, and offered the service before it was build — and manually did the steps before having a product, and got feedback on features. I initially built it for myself, but with some keyword search I realized the market had a hole.

So I think it takes an initial spark of inspiration to identify a need, with research to see if there is a market niche that can be filled and determine it’s profitability, followed up with a proof of concept that is more manual process that a product, and then targeting your perfect customer to identify ways to refine and improve the product — developing it by small feature increments as the needs become apparent (and keeping other ideas on the backlog), and then growth hacking through social media, content delivery, and community outreach.

Checking state in Selenium Test Automation

I wrote a medium length novel in response to Nikolay Advolodkin’s post about a common Selenium automation pattern. He advocates:

STOP CHECKING IF PAGE IS LOADED IN AUTOMATED UI TEST

You can read his article on his site, Ulimate QA: https://ultimateqa.com/stop-checking-if-page-is-loaded-in-automated-ui-test/

Or follow the discussion on his linkedin post: https://www.linkedin.com/posts/nikolayadvolodkin_testautomation-java-selenium-activity-6674278585743761408-Ud_p

Nikolay is a good friend, so please don’t take this as an attack. We have had many long discussions like this.

Here is my response in full:

I go the other direction on this. I use element checks to verify that the page is loaded.

My version of the Page Object pattern includes an isLoaded() method — which can be overloaded with custom checks as needed. This is to try to keep things synchronized, even though it means extra steps. In this case, I value stability over performance.  

I can understand someone making another decision however, especially when speed is important and latency between steps with a remote driver makes this more costly.

In practical terms, you could just check if the element you want to interact with is available and fail faster if it is not. The result of both success and failure would be the same, and you’d get there slightly faster — perhaps significantly faster if you have a long sequence of many page loads. But having such long test flows is a pattern I try to avoid, unless I’m explicitly testing the long flow through the UI.

Adding the sanity check helps me when it is time to debug the test or analyze the test failure. Knowing that my page is not loaded — or that I’m not on the page I expected helps me to understand the true cause of failure, the page is not loaded, rather than an element is not clicked.

However, I would not call isLoaded() repeatedly, only once, automatically when a page object is initialized, or explicitly if I have a logical reason to think that it is not — some state change.

Selenium tests (and UI tests in general) are brittle, and determining state before attempting to perform an action is one of the biggest challenges.

The challenge here is that an HTTP 200 status code doesn’t really mean a page is loaded anymore, with dynamic page creation, javascript frameworks, single page apps, and prefetch, this is hard to tell. Pages can load in chunks, dynamic elements can be added, and sometimes the concept of a “page” doesn’t even make sense.

Checking status codes or XHR ready state are meaningless (or at least misleading) in many modern web applications. But you see people trying to do this figure out the state of their app, so they can reliably automate it. This usually doesn’t work. So checking the state of the element you need to interact with makes more sense as well as saving time.

The WebDriver team dropped the ball on this — or at least punted. Selenium used to check that a page was loaded (using the methods above) but decided the decision was too complex and left it up to the user. I think this was an abrogation of responsibility — but don’t tell Jim or Simon that. It’s a less discussed detail of their least favorite topic.  

Validating state is hard, and most of the time, leaving it up to the user results in bugs. It’s even harder with mobile apps and the Appium team has had to make many difficult decisions about this, and sometimes a framework gets it wrong, or makes things unnecessarily slow.

So like most things there is a trade off between speed and reliability, and we all need to make our own decisions.

When you adopt the “page object” pattern for use on components that are only on part of a page, or may appear on multiple pages, having the explicit check that is user defined starts making even more sense — because widget.isLoaded() is a state check that can happen more than once, and not just a sanity check

But when you have a (reasonably) static page, an initial check that the page is loaded — rather than checking each element that you can safely assume *should* be there if the page is loaded can actually be more performant, as well as providing a clearer stack trace when things aren’t as expected.

Repeatedly checking if a page is loaded before performing any action is a bad idea in any case. 

public class MyPage {
 public MyPage open() {
  driver.get(this.url);
  if (isLoaded()) { return this; }
 }
public bool isLoaded() {
  get {
   try { driver.findElement(myElement): }
   catch (NoSuchElementException e) { 
    throw new MyException("page isn't loaded", e); }
  }
 }
}

The value of test automation

The real value of test automation doesn’t come from finding bugs. At least not primarily. Automation is not good at that, except in one case which we’ll discuss later.

On a consultation with a client the other day I was talking about this.

Our conversation was interrupted when my daughter came running in, crying, and told me that my 5 year old son had been trampled by horses. Earlier that day, our neighbors horses had gotten into our yard and I’d held him up to pet a horse’s mane and show him that it wasn’t scary.

Later, while the two of them were playing in our yard, picking new spring leaves from an Aspen tree, something spooked the horses and they bolted. We were lucky (and blessed) that he only received scrapes and bruises and no serious injuries and he was fine after a couple days. My boy chafed at the restrictions placed on him, but my wife is still shaken.

It got me thinking about a couple things. First, that horses are big and powerful, and while not malicious, they can be dangerous. Second that fences make good neighbors. Perhaps not coincidentally, I’ve been meaning to put up a stronger fence (mainly to keep our goats and dog in.)

Although I welcomed the horses’ visit, I wasn’t prepared for consequences. And now I have to listen to my wife’s calls for caution more.

To trivialize the incident — and draw a tenuous parallel — tests are like fences. And like other fences, they need gates. Even with a strong fence, I would have let the horses in the yard. And even with good tests that catch bugs, you can still let bugs into production. You may do this intentionally, or not.

If you don’t trust your test results, if there are always “random” errors, if tests take too long to run, or are out of date, or don’t actually check what they purport to be checking, bugs can slip through, and the consequences can be severe.

But despite what you test, there are sometimes what you deem “acceptable risks” and there is always the unknown. You can’t test everything, especially what you don’t know. So you need to be able accept that testing isn’t the solution for preventing bugs.

So what is the value of testing? Is it just snake oil? Or is it good up to a point, and not beyond that.

My belief is that test automation does have value, but that finding bugs is not the primary value. My first posit is that test automation is good at preventing some bugs — often the easy obvious issues. Like a fence.

The act of writing a test, like planning, forces you to think about a problem — to think about problems in general in trying to anticipate problems, and that this itself is good at preventing — by avoiding bugs before they are even created.

Next, tests document features, and describe what they are intended to do, at least as understood by the author. But, like writing, writing tests forces you to articulate your understanding, and then allows you to communicate that understanding to others, who can then share in your understanding, correct you if it is mistaken, or expand upon it.

When you write a test, it should be written primarily to communicate — to communicate an assumption about the software that is being tested. A test forces you to describe *specifically* what that assumption is. If the assumption is wrong, it can be corrected, if it is correct, and the code is wrong, it can be fixed, and the tested again against that assumption.

By it’s existence, a test (especially an automated test) enables youo to repeat that process. And to objectively check if the software does what is expected. Or if it changes. That helps to prevent regressions.

A test verifies that requirements are met, or at least that features are exercised.

Knowing what features are being tested and which requirements (assumptions) are being made is the second powerful value of testing, and the first explicit benefit, after the implicit value (which should not be dismissed) of deliberately thinking about the problem.

Because tests are repeatable (even manual tests), it helps to prevent regressions. And that will speed up development velocity. Developers with good tests in place will need less time to spend reasoning about the impact of their changes on the system, and not have to worry about if a change here will affect a (seemingly) unrelated outcome over there.

Software systems very quickly become too large and complex for people to reason about all at once. Testing helps to break that down. And helps to make sure that you don’t forget a check.

Repetition of test execution, like anything else, means you should get better at it. Your tests will improve over time if they are exercised (and fixed) enough to become more robust, more precise, clearer, faster, etc. So the third key benefit of testing comes from repeatedly executing your tests.

The up front cost of creating a test is non-trivial. The benefit of executing it over time increases.

Getting back to my fence analogy, like building a fence, a lot of the initial work is in surveying the area, digging and setting the posts. This is most of the work in setting up a fence, and at the end of it, you just have a few posts in the ground, and it doesn’t prevent anything from getting through.

This is the situation I was in. In the real world, you have to spend time building infrastructure, defining architecture, and laying down the foundation upon which you will build your fence. This is also true in software, but a lot of the work can be done for you with a good test framework. But you don’t have to have a full framework in place to start writing tests that provide some value.

If I had the money, I could’ve rented a machine that pounds the posts in the ground instead of digging holes with a shovel and them filling them in by hand. A few seconds with a powerful hydraulic tool, and the post is set firmly in the ground.

A good foundational framework can also help with the setup of tests too. But it takes investment in knowing how to use it. But the best framework in the world won’t help you if you’re setting up your fence in the wrong place. I mean, if your tests aren’t covering the features you need to test most.

Test automation allows testers to focus on creative work, like actively finding bugs, instead of checking for regressions around the perimeter (or in the middle) of your system.

Writing a test that can execute again without you having to think about it or do anything, is a huge time saver. This is the third value of test automation. Once it’s written, it can provide residual value over and over again. As long as you keep executing it, and keep it maintained.

But test automation is notoriously hard to maintain. UI changes, for example, can break automation that makes assumptions about the user interface. Even if they were correct at one time.

So your goal when writing automation should be to make as few assumptions as necessary. Test only one thing. Don’t depend on the UI to validate something unless you have to. That way you spend less time fixing tests — or making sure that they are testing the right thing, and more time creating new tests, testing new features, and making sure that changes don’t negatively impact the desired (and expected) behavior of the system.

Good tests written this way will help when changes need to be made later. They can act as documentation for others when they need to understand the system to fix it or add to it. This could be a new developer coming onto the project, or it could be the same person, coming back to code they wrote — and knew was working correctly — days, months, or years later.

This is the fourth value of test automation, and possibly the biggest. Not only does it verify that the system continues to work as expected over time, but it allows you to be confident that when you make one change — it doesn’t change other parts of the system without your knowledge.

Test automation can help you find bugs that are introduced when changes are made to a working system that is well tested. Initially your tests helped you prevent bugs by thinking clearly and precisely about the system, anticipating bugs, and coding around the problem. Because you wrote tests, you documented the behavior and set up checks that you can perform repeatably, repeatedly, to make sure that changes do not contradict those assumptions you made when you initially wrote the system — or they alert you to the fact that those assumptions are wrong, or the circumstances around them have changed, so a previous assumption that may have been valid in the past is no longer valid.

Tests provide their greatest value over time. There is an up front cost, which is hopefully defrayed by the initial planning and checking that prevents bugs from being added in the first place.

Unless maintaining or adapting tests costs more than the benefits they give.

Which is why, although I am generally a test first advocate, when you are doing new development on a blue sky project, testing may actually slow you down, and end up not providing enough value to justify it’s expense.

A lot of the assumptions you make early on may not be valid. And by codifying those assumptions in tests, you may be making the system too rigid and resistant to change.

Often, the first version of software is written with the assumption it will be thrown away and rewritten from scratch once you have clearer understanding of the problem and how you are going to address it. In that case, tests will be a waste of time, and any residual value they provide will never be realized.

But just as often, the quick and dirty, one-off or proof of concept project becomes production code. And it often ends up being too brittle, not scalable, or too tightly coupled to expand. And then, you may be in a situation where you need to rewrite it, but you can’t, because real world business processes depend on it or external forces demand that it stay running.

In this case, tests can again provide a value. With reliable tests you can then dissect are migrate the system with confidence, even if the authors of the original software system (and their domain knowledge) are gone.

That’s where the fifth (and perhaps final) value of testing comes in. If you have tests in place on a working system, you can then make changes to subsystems, even rewriting or removing them, if your tests can verify that your replacements don’t negatively impact other parts of the system, or the system as a hole.

In this case the tests act as a fence (or railing) for safety. To prevent you from going over the precipice or out of bounds of the system. The tests can act as a crutch or scaffold that helps protect the system from falling to pieces while you work up fixing or updating it.

But in order to do so, you need to have good tests. And by “good tests” I mean tests that are clear what they are testing. They test only one thing, so you know what is broken if the test fails.

The tests need to be easy to modify, to adapt to a changing system, and to be able to be changed, eliminated, or replaced, when the assumptions about the system change, or the way they can be tested changes.

Good tests need to be flexible. They need to not be brittle, and they need to be able to work under different conditions. Not tied to the UI, environment, or specific data (except when needed to test that specific area of the UI, depend on some aspect of that environment), or when that specific data illustrates the conditions of that test.

Above all, good tests need to be reliable. They can’t break over every minor change to the system, or for random, indeterminate reasons. Flaky tests might be worse than no tests, because they reduce your confidence in testing. So they should be robust and adaptable to changes in the system.

A good test framework should help you focus on simplicity, help you to write reliable, robust, specific tests, and help you to keep them organized, and make reporting clear, meaningful, and concise.

You should be able to run your tests alt least every day, ideally for every change. And someone should care about the results. In order for people to care about tests, they need to pass reliably, no false positives or intermittent failures.

In summary, testing provide value 5 different time:

  1. The act of writing tests forces you to think about the system and plan for possible issues.
  2. Writing tests while developing software prevents bugs from appearing because you anticipate or catch them while writing tests. It also documents your assumptions about how the system should work.
  3. Executing tests exercises the system and informs you that the requirements are being met. Repeatedly executing tests and keeping them up to date will make tests more robust and adaptable.
  4. Testing provides value over time as the system grows, it prevents regressions and allows you to reason about the system and not be slowed down.
  5. Tests act as checks that allow you to refactor the system and make changes without breaking functionality. Even when part or the whole of the system has become a black box.

Testing isn’t just for testers

I originally prepared this talk and published it as an article on LinkedIn:

https://www.linkedin.com/pulse/testing-isnt-just-testers-aaron-evans

A quick look at Eleventy and Static Site Generators

Eleventy is a Static Site Generator.

What does that mean?

Benefits of a SSG:
Speed — Faster load times — no parsing of server side scripts
Simplicity — Easy to deploy — no server configuration, just upload
Security — Can lock down — nothing exposed on website to exploit server side
Savings — Can be run on cheap shared hosting, even free services like Netlify

Let’s take a look at Eleventy:

https://11ty.dev

The first claim — “Eleventy is a simpler static site generator”
Simpler compared to what?

Some popular alternative SSGs include:

Jekyll – Ruby, you’ll need to install a Ruby development environment — and keep it up to date
Hugo – Go language, but you don’t need to have Go installed because it is a static binary — a command line application

On the Javascript site, like everything else, rather than a community coalescing around a single solution there are many alternatives. Which may be good, since it results in experimentation with different methods.

You can see an exhaustive list of SSGs at:

https://www.staticgen.com/

Javascript SSGs tend to coalesce around client-side frameworks, React, Vue, etc.

Popular React based frameworks:

Gatsby – react based
Next.js. — with server side component

React frameworks allow you to use React components and compose them with JSX and then use those components to create a static site. But with a twist that they “rehydrate” a rich client side framework (React) based application.

Vue based SSGs include:

Nuxt.js — inspired by Next.js
Gridsome — inspired by Gatsby
Vuepress — designed for generating documetation based sites, developed by Evan You, the creator of Vue.js and used for the Vue.js documentation

These Javascript frameworks are more heavyweight clients. While they are “static” in the sense that they don’t have a server side component, they’re really more of a “JAMstack” application.

JAM means “Javacript and Microservices” — or rather “Javascript, APIs, and Markup”) meaning that a web application is more like a mobile app, the presentation logic is all handled on the browser, which builds an app using the browser’s Javascript engine, and then fetches data by calling and changes the presentation by adding and subtracting components from the DOM.

The DOM is the “Document Object Model” — how the browser keeps track of your HTML programmatically and renders it. When you click a button, the browser fires an event that maps to the DOM element. Changing the DOM — adding and removing HTML elements dynamically, and attaching events such as “onclick” to elements to trigger those changes or to make AJAX calls to fetch data from the server.

AJAX is “Asynchronous Javascript and XMLHTTPRequest”
XMLHTTPRequest is the way that browsers can fetch data from a web server — like loading a page, but without reloading the page.

That’s the basis of Javascript client-side frameworks like React, Angular, and Vue. A bunch of logic, written in Javascript to execute on the browser, renders in the browser to add and substract elements, fetch data, and update status with the server — all without doing full page loads.

The benefit of doing this — at least theoretically — is that you don’t have to send as much information back and forth from the server to the client on every request. Rather than sending the tags and so forth — as well as all the other elements that don’t change, including images, JavaScript, and CSS, you just send the data that changes, and then update the DOM accordingly.

In practice, well… often the load isn’t that heavy, and static assets like images & libraries are cached on the browser anyway. And updating the DOM via Javascript events can take more time and resources than just re-rendering HTML. But still, client side frameworks give people a way to organize the logic of complex sites, and compose them programmatically.

[Aside] I’m somewhat ambivalent on client side frameworks, but that may be because I haven’t explored them deeply enough, or that I just haven’t found one that’s implemented in a way I like.

Eleventy is also a Javascript based SSG and it would be familiar to someone using Jekyll or Hugo. It was inspired by Jekyll. It doesn’t try to be a full stack Javascript solution, and that’s where the “simpler” claim comes in.

The other claim that Eleventy makes is flexibility. It is agnostic about the template language you use, for instance. You can use Liquid — the template system used by Jekyll (and Github Pages) by default, but you can swap that out for several other Javascript based templates, such has Nunjucks (very similar to Liquid), Handlebars, Mustache, HAML, Pug, or EJS. You can even use plain Javascript objects to “render” content — so you can use programmatic logic and composition to build elements or pages or partial pages. Much the same as you would building Vue or React components. It may even be possible to process React components to generate a static site with Eleventy, although I don’t know if anyone has done it.

Most people who choose Eleventy do so for simplicity and flexibility, so getting tied into a complex framework isn’t their goal. If you prefer one of those, you’re probably better off using one of the other SSGs geared specifically for your framework.

Eleventy is designed to work with static HTML files, and then add simple logic —
conditionals: “if this is true, then display it”
iteration: “render an element for each item in a list”

Another feature of Eleventy (and other SSGs like Jekyll) is that it allows you use plain text files — actually Markdown — and generate HTML pages from them. That way you can write a blog, or documentation, or whatever, by composing a simple text file, with a bit of simplified markup for headings, lists, and so forth, and then add a header, footer, sidebar, etc. and then display them as web pages without having to add all the tags. Or — and here’s the key — without having to use a content management system (like WordPress) and edit all your content in a fancy textarea — and save it to a database.

In the next post, we’ll go ahead and jump in and see how that works.