People sometimes ask how a former history major wound up spending decades inside large enterprise technology organizations. Sometimes I answer with a joke, saying that big history firms weren’t hiring in the early 1990s.
A more honest answer involves Professor Litchfield’s class on The Industrial Revolution in Early Modern England.
Using spinning jennies and mechanical looms in factories to make textiles changed everything in England. It moved people from the countryside to the city. It allowed ordinary Englishmen to own a second suit of clothes. It demolished artisans’ livelihoods. It empowered a rising class of bourgeoisie, and undermined the aristocracy. It created the surplus that underwrote an empire. Software applications and computer hardware were the spinning jennies and mechanical looms of my own era. All especially relevant as we debate the social, political and economic implications of AI.
Professor Litchfield liked to say that “Political Science has the theories. In the History Department we are custodians of the facts.” I like to know what happened; only facts can tell you that. But the facts of history are imperfect. Much of the work in formulating a business or technology strategy depends on work that feels like history — trying to make sense of incomplete, contradictory, subjective, and sometimes unreliable information.
Bob Pasker co-founded WebLogic, where he led development of the first independent J2EE application server. BEA later bought WebLogic and Oracle bought BEA. [1] Now he’s pursuing a PhD in history at the CUNY Graduate Center. To support his research, he developed a machine-learning tool he calls Roscoe to discern new insights in American legal history.
Today’s interview excites me for two reasons. It’s always good to spend time with a kindred spirit passionate about both history and technology. More importantly, using AI to discover more facts intrigues me no end. In an academic environment it allows us to understand ourselves and our world better. In a corporate environment it allows us to make better decisions and pursue better strategies.
Here’s the takeaways:
Wall Street, early internet experimentation, WebLogic and J2EE, post-acquisition chapters, then a PhD in history — technology innovation and academic aspirations can reinforce each other!
Enterprise stacks still echo mainframe-era problems, but the internet forced looser transaction models, distributed-systems humility, and resilience design given shared infrastructure and unreliable networks.
Mass digitization of historical documents means you can ask new questions at scale, but you have to cut through the “silence of abundance.”
Roscoe is semantic retrieval across collections — embeddings, ETL, metadata, re-ranking — aimed at evidence that keyword search will not find.
The hard problems ahead are precision, recall and cultural acceptance, richer analysis of hits, multimodal corpora, and partnerships with archives. Yes, the interpretive payoff is substantive, but attachment to existing methods bedevils the academy no less than the enterprise.
1. From Wall Street and the VAX to WebLogic — and back to graduate school
Bob traces Wall Street transaction systems, the Java and Usenet milieu, co-founding WebLogic and shipping the first J2EE implementation, later ventures and CTO-for-hire work, and his return to history for a PhD at CUNY.
James Kaplan: James Kaplan here with the Prosaic Times podcast. I’m very pleased to have Bob Pasker with us. He played an important role in the development of modern software over the past couple of decades, and he’s now doing exciting work as a history PhD candidate. Bob, give us a little bit of your background as a tech leader and entrepreneur, and what drove you to pursue a PhD in history.
Bob Pasker: That’s correct. Thank you very much for having me on, James.
I started way back on Wall Street, building transaction processing systems on the good old VAX, if you remember that. I got interested in moving to San Francisco because I wanted to be part of the tech community in Silicon Valley. I moved to San Francisco and started working for one of the database companies back then. It wasn’t a great company and I didn’t stay very long, so after about ten years in the computer industry I decided to go back to college.
I went back as a history student, got an undergraduate degree, and decided I was going to be a college professor. I went out to graduate school as a history student. Graduate school didn’t really agree with me at the time, so I came back to San Francisco just about when Java was coming on the scene.
I decided to do some experimentation. I built an SNMP stack in Java and was able to browse public servers on the internet completely undetected, because nobody had Cloudflare or anything like that at the time. I was active on Usenet, if you remember that — before Friendster and MySpace. It’s how people communicated with each other over the internet. I met other people who were interested in enterprise Java, and together we co-founded a company called WebLogic.
Our first products were JDBC drivers for accessing Sybase and Oracle. My interest, though, was building a transaction processing system in Java. I spent eighteen months building the WebLogic application server, which is what it became known as. That kind of launched the whole enterprise Java thing. Sun adopted our model, if you will, of having all these different services available, and they called it J2EE — and we had the only working implementation out of the gate.
The WebLogic company was acquired by BEA in 1998, and BEA was acquired by Oracle in 2008. That’s how it became part of Oracle’s technology stack. After that I started another company to do what we called the real-time internet. This was around 2000. The idea was server push instead of request-response, which is something we now take for granted. It was fairly early technology, and we didn’t make much progress because we got caught up in the dot-com bubble.
The more things change, the more they stay the same. A lot of what we were doing back on the VAX a long time ago still gets done today in enterprise systems.”
Over the next ten years I spent time at various venture firms, including Accel Partners, and I was kind of a CTO / chief architect for hire at VC- and PE-backed companies. I also spent a year at Expedia rebuilding their enterprise architecture.
Getting into the COVID years, as my kids started to get older I had more free time, and I decided to go back for my PhD in history. That’s how I wound up at the CUNY Graduate Center as a PhD student in American history.
James Kaplan: As someone who’s passionate about both history and technology, I applaud your varied career.
Sun adopted our model, if you will, of having all these different services available, and they called it J2EE — and we had the only working implementation out of the gate.
Bob Pasker: Thank you. I’ve gotten to do a lot of different things. Sometimes I feel like I’ve had five or eight different careers instead of just one.
James Kaplan: It’s a hell of a lot easier to understand the monumental changes happening now if one has a bit of historical sensibility and a historical mindset.
Bob Pasker: For sure.
2. Enterprise architecture: what changed, what didn’t, and why resilience still wins
VAX-era patterns persist, but networks are more decoupled; ACID loosens over the public internet, fallacies of distributed computing still apply, and major outages remind us nobody is immune.
James Kaplan: Before we dive into some of the research you’re doing now, any reflections on the evolution of enterprise architectures over the past twenty years? Obviously the app server was a tremendous advance. We’ve moved on to containers and so forth — but any reflections on that arc?
Bob Pasker: I think the biggest thing is that the more things change, the more they stay the same. A lot of what we were doing back on the VAX a long time ago still gets done today in enterprise systems. But I think the biggest change has been that networks and computer systems are designed much more robustly and to be much more decoupled.
We always thought about ACID transactions and the problem of doing ACID-style transactions over long distances. It was very easy to do inside the data center, but banks and other companies required that either both sides of the transaction completed or neither did.
Now we have much more flexible ideas about how that happens, and a lot of it has to happen over the internet, with the unreliability that implies. So we’ve really taken a new look at ACID transaction ideas and relaxed them enough to make it all work over the internet.
James Kaplan: I’m old enough to remember when many people didn’t believe servers could be physically remote from one another — a combination of the fragility of the application architecture and the fragility of the network. If the servers weren’t in the same facility, they didn’t have confidence in the ability to transact in a robust way.
Bob Pasker: Absolutely. I used to send people the list of the fallacies of distributed computing. One of them is that the network is reliable — and it’s not. Another is that they’re all managed by the same person — and they’re not. Whoever wrote that list was very prescient about what we needed to do to build reliable systems on the internet.
James Kaplan: You’re making an important point that belongs under “the more things change, the more they stay the same”: it’s critical to design for resiliency. You can’t assume any given component will be perfect, so you have to design a system that’s robust in the face of degradation.
Bob Pasker: Absolutely — that’s basically where we are today. We still see it every day: status.x.com and all the other status pages keeping track of what’s working or not on the internet, and we’ve taken that to heart. The biggest catastrophes are when major pieces of infrastructure go down. We’ve seen it with Cloudflare and Amazon and all sorts of companies — nobody is immune.
James Kaplan: I remember spending a lot of time on geo-resilient architectures starting around 2010 — application and system architectures that would be resilient in the face of network failure or infrastructure downtime. Let’s pivot to the research you’re doing now.
Bob Pasker: Sure.
3. Legal history, massive digitized corpora, and the dissertation problem Roscoe was built to solve
Early spreadsheet-era legal history meets an eight-million-case digitized corpus — motivating Roscoe and a dissertation on nineteenth-century courts mediating ideals of liberty against laws of slavery, suffrage, and Native policy at scale.
James Kaplan: I asked you to join because I’m fascinated by digital humanities and the extent to which we can use AI and other digital techniques to enhance historical understanding. Could you tell us a little about the historical research you’re pursuing now and what got you interested in that topic?
Bob Pasker: When I was an undergraduate and in my first attempt as a graduate student, I became interested in legal history. For me it’s a unique field. At the time there were many untapped sources of legal documents that historians had never or rarely used. Personally, law had been a longstanding interest of mine, and I like to do things that are a little off the beaten path — that’s what I wound up doing.
I wrote papers on legal history: kinship and the way people left money to their children in the eighteenth century, based on published wills; and a paper on sex crimes in Providence, Rhode Island, in the late eighteenth and early nineteenth centuries. This was all done with word processors and spreadsheets — so I was doing digital history even then, taking the documents I was reviewing, putting them into spreadsheets, tabulating, and so on.
Using Roscoe I found ninety-three cases out of 226,000 — about four in ten thousand — in the appellate court records that would otherwise have been impossible to find … There was no keyword search in the world that was going to find those ninety-three.
Fast-forward to 2023: Harvard Law Library has transcribed the entire corpus of American appellate case law — about eight million cases — and digitized them, so you can download text files of all of those cases. I decided to combine my two fields, history and computers, build a conceptual search engine for that case law, and use that search engine for my dissertation research.
The system is called Roscoe. It’s named after the first legal historian, Roscoe Pound, who lived from 1870 to 1964 — he really started the field. My dissertation is a study of how, in the nineteenth century, American courts became the venue for working out conflicts between our constitutional ideals of freedom and liberty and the actual law that permitted slavery, denied women’s suffrage, and affected Native Americans.
A lot of historians have studied these topics; they’ve only used case law as evidence. Nobody has used case law to study the court system as an institution itself — on par with the other branches of government, religious institutions, and industries. The reason nobody could do that is scale: there are 226,000 cases up through 1860, and that’s the basis of my dissertation research.
4. How Roscoe works — embeddings, collections, ingestion — and how historians react
Semantic search over multiple public-domain collections via embeddings, vector indexes, relational metadata, and re-ranking — plus the human story of academic uptake, the ninety-three-case find, and “silences” created by bad retrieval, not missing archives.
James Kaplan: Tell us a bit about Roscoe. How does it work? Take us under the hood a little.
Bob Pasker: Roscoe is a semantic search engine. The idea is to replace arcane keyword and Boolean searches, which is how most archives still work. If you want to find case law on a particular topic, you have to know the exact words they used back in the nineteenth century — and the words they used in Georgia versus New Hampshire.
If you’re interested in a concept like canal building, you’d have to look up locks, canals, waterways, and so on to surface all the relevant documents. With Roscoe, you type something like “disputes over canals,” and it surfaces documents related to that concept.
The fundamental technology is embeddings and a vector database. An embedding takes a piece of text and turns it into a high-dimensional vector. That vector can be stored in a vector database; you embed the query, look it up in the database, find the k nearest neighbors, use those neighbors to look up the specific cases in Roscoe, and hopefully those cases are conceptually similar to your query.
I’ve organized Roscoe by collections — each collection has its own vector database. The first collection was those 226,000 cases. I’ve extended it to another collection called Chronicling America, which is millions of nineteenth-century newspaper articles. I have another collection with the papers of the founders, and also the Congressional Record. These are all public domain, and each collection is available inside Roscoe.
What makes Roscoe different is that you’re not searching one database at a time with arcane keywords — you’re searching across all of them at the same time. The key is the ingestion process: an ETL layer — extract, transform, load — that takes data as it comes from the archive, tests different chunking algorithms and embedding strategies, creates an index in a vector database, and cross-references that with a relational database that holds document metadata — names, dates, location — used for filtering and re-ranking. That’s basically how it works underneath.
Version one had a very simple user interface that produced a result table with metadata. Version two has multiple collections, searches across collections, and does unified re-ranking: it takes results from the different collections and re-ranks them against each other so the most relevant results rise to the top, regardless of which collection they came from. That’s basically how version two of Roscoe works right now.
James Kaplan: What’s been the reaction from people you interact with in academic history? I ask because some academics I know are incredibly excited about what AI can do for research, and others push back — anything involving quantification, or “that’s a science way of thinking, not a humanities way.” What’s the balance of enthusiasm versus skepticism?
Bob Pasker: It’s similar to the experience I had trying to get people to use WebLogic. There’s a whole lot of people who couldn’t care less, and a very few who are really interested and see the value. So there’s a huge evangelization process — different from a startup, but still a big thing.
I’ve had professors who, when I’m writing a paper using Roscoe, say: I don’t want anything in the paper about technology — I just want a history paper. I’ve had others who are extremely helpful and excited — but to be honest they don’t really understand it. They can conceptualize the benefit, but until it becomes a public utility they can try out, with enough collections for their own work, it’s mostly curiosity rather than adoption.
Right now I’m trying to write some papers using Roscoe. I’m working on a paper about how to explain Roscoe to the community of historians, which turns out to be fairly difficult — but I’m making progress, and I hope to publish it as an independent research paper. It’s not meant to be pure evangelism; it’s meant to ground Roscoe in historiography, the process of doing history, and archival science — what it means for both disciplines.
James Kaplan: It strikes me as historiographically important. A professor described to us how certain historians were paging through records to find birth and death dates to understand lifespans in early nineteenth-century England and how the industrial revolution changed mortality — whether it increased or decreased mortality in different places. Your approach is a way to vastly increase the datasets available to historians without sending grad students to page through bound volumes by hand.
Bob Pasker: Yes — and in a sense that’s a slightly different kind of digital history: it’s tabular. There’s been a lot of work since the late fifties on tabular analysis of data, the way an economist might do. I’ve been interested in that too; I did it in those earlier papers.
Roscoe is very different. It’s for finding documents that already exist in archives but are impossible to find. My paper last year was on whether Black people could testify in courts before the Civil War in the nineteenth century. The laws were basically against it, and we don’t have much conception that it was still a possibility. Using Roscoe I found ninety-three cases out of 226,000 — about four in ten thousand — in the appellate court records that would otherwise have been impossible to find. They span from the 1790s to the 1860s across eleven different territories and states. There was no keyword search in the world that was going to find those ninety-three.
Archivists have this idea of archival silences: what archivists admit into their archives. For the most part archives contain documents they consider important and leave out what they thought marginal or uninteresting — they have to curate; we can’t save everything.
I have a different kind of silence in mind: documents that are in existing archives, useful to historians’ research, but that they can’t find because they can’t come up with the right keyword search in the user interface. My paper argues that Roscoe makes it possible to find those — that there are interesting documents that have been, in a sense, silenced by arcane interfaces. That’s what I’m trying to create: a system that surfaces many more interesting documents than a historian would otherwise find.
5. What’s next: precision and recall, multimodal search, partnerships — then evidence, interpretation
Roadmap: recall versus precision, deeper per-hit explanation, map-level multimodal search — then partnerships and “index not copy” for archives, historiography of evidence, reading the ninety-three cases, and why he prefers “machine learning” to “AI.”
James Kaplan: To push one level further — and this is a little about where Roscoe might go — ninety-three cases you could read yourself, but you can imagine a search that surfaces a thousand or fifteen hundred cases. To what extent do you think the state of the art will advance so you can use analysis to identify trends in legal thinking? Could some of these documents go into a graph so you can see how legal thinking in one set of cases influenced another? What comes next after archival search? Does that make sense?
The hardest part isn’t really the technology. It’s twofold: one, making it useful to historians in a way that comports with historiography and archival science; two, building relationships and partnerships with libraries and archives so they’re interested in doing this without feeling they’re giving up their walled gardens around these materials.
Bob Pasker: Yeah, it does.
Version three of Roscoe, which I’m already working on, will address some of this. First, on returning too many results — that’s well known in information science: recall versus precision. You want enough cases in your result set that you see everything useful, but you don’t want false positives — things returned that aren’t useful. Search engines have dealt with that for a long time. You also want precision: the cases most relevant to you should rise to the top, and the useless ones should drop out. You don’t want to leave useful cases outside the result set, and you don’t want useless cases inside it. I work on that constantly: refining the system for better precision and recall.
Second, I want deeper analysis of how each case relates to the query. In version one, as results returned, the system re-ranked them and analyzed cases more deeply to identify exactly how each case related to the query. Once you have a large result set, you can go through it more deeply with machine learning to pull out the cases specific to what you’re looking for and leave out the rest.
Another direction I’ve experimented with is visual search — my experiments have been with maps. Old maps are crude line drawings with handwritten type. In the archive you get: “Here’s the Smith map of New York City from 1823” — and that’s all; it doesn’t tell you what’s on the map until you open it. I’ve used machine learning to read the maps, identify places written on them, get latitude and longitude, overlay them on modern mapping systems, and identify features — waterways, canals, mountains, farms. That information goes into a vector database so it can be searched semantically.
So when someone searches “disputes over canals,” you get not only case law, debates in the Congressional Record, and newspaper articles, but maps where those disputes actually took place — spatial context as well as temporal context from the dates. I think you can do that for other artifacts too: paintings, sculpture, textiles — so people doing research on material culture could search catalogs, say at the Museum of Natural History or the Museum of Modern Art, and find artifacts related to their topics.
James Kaplan: What’s the toughest thing technologically — where is the technology there, and where is it harder?
Bob Pasker: I guess I’m an optimist: I think I can build something really fantastic here. The hardest part isn’t really the technology. It’s twofold: one, making it useful to historians in a way that comports with historiography and archival science; two, building relationships and partnerships with libraries and archives so they’re interested in doing this without feeling they’re giving up their walled gardens around these materials.
The good thing about how Roscoe works is it doesn’t duplicate the archive — it creates an index, the way a card catalog is an index, not the contents of everything. Those are really human difficulties more than technological ones. We’ll keep wrestling with precision and recall and the right way to visualize and display what’s useful. At this point I don’t see anything I can’t get out of the technology.
James Kaplan: It’s potentially disruptive within the history profession in the sense that, over time, techniques like this could make history even more of an empirical than a theoretical discipline — ground it more tightly in the historical record by accessing a broader set of documents easily.
Bob Pasker: How historians use evidence is itself a historiographic topic — it goes back to ancient Greece and the way Thucydides used evidence to describe what happened. That had a transformation in the nineteenth century as people became more interested in an evidentiary basis for history rather than only the stories they had told. The rational, evidence-based side of history has been developing for a couple of hundred years.
I think this extends the same trajectory as building archives, electronic card catalogs, transcriptions, photocopies, seeing old documents as images on the web. I’m a novice historian; others could expound on this much more. But I think Roscoe is doing what needs to happen given the scale of digitization and transcription at the archive level — making huge corpuses visible. I don’t think it’s disruptive; I think it’s enabling.
James Kaplan: Hearing what you just said, it’s a continuation — the next evolution in a long series of transitions over the past couple of hundred years, increasing the dataset available to historians as they do history.
Bob Pasker: Yes — and that’s what my paper this semester is about: what this deluge of information means for historians and how Roscoe will help. I call it the silence of abundance: what’s hidden in this great abundance of historical records.
James Kaplan: As you looked at those ninety-three antebellum cases, was there anything especially insightful that wouldn’t have been available if you hadn’t found them?
Bob Pasker: The fact that those cases exist at all. For the most part we “know” that Black people were not allowed to testify in court — not as witnesses, they couldn’t give evidence. But now we see: wait, that’s not completely true, even given what the laws say.
I had hoped I’d find justices who really wanted to give people an opportunity to testify on their own behalf or on behalf of something they had seen — in a positive, rights-expanding sense. That’s not really why they were allowed to testify. They were allowed to testify because nineteenth-century justices had a very specific concept of justice. It wasn’t liberty and freedom in the abstract; justice was the process of adjudicating cases.
So you had very specific situations in these ninety-three cases: someone was injured, the only witness was a Black man, everyone knew the person was injured and the defendant was guilty — but there was no witness except this one Black man. The only way for the justice system to maintain its reputation as an institution that could adjudicate cases was to let that witness testify. Otherwise it would be as if nobody had seen it, the defendant would go free, and that would violate their notion of procedural justice. It was more about maintaining institutional coherence than about a grander sense of justice. That was my conclusion.
James Kaplan: Very helpful. Anything I neglected to ask — anything else you’d like to cover?
Bob Pasker: As I’ve talked to historians — classmates, people in my department — and looked at what historians’ associations have said about artificial intelligence: by the way, I don’t use the term “artificial intelligence” because I find it unhelpful. I use “machine learning,” the technology I use, without the generative piece.
The resistance to something like Roscoe — what’s often lumped as “AI” — comes down to three concerns. One is hallucinations, which we’ve discussed. Two is teaching — how this affects pedagogy. Three is the human aspect of writing history: history is a process conducted by humans, not machines, because history gives us a sense of who we are, where we came from, the story of our path — and humans should own that, not computers.
I’m hoping something like Roscoe, which uses the same underlying technology in a different way, will have a positive impact — people will understand it and find it useful in their research. It may take ten or twenty years, another generation of scholars, before that really bears fruit. I’m enjoying being at the forefront and I’m proud of what I’ve done so far.
James Kaplan: Congratulations. Thank you so much.
Bob Pasker: Thank you. I really appreciate it, James. This is something I’ve wanted to talk about for a long time.
Footnotes
[1] For our younger readers: Before containers, app servers provided transaction management, connection pooling and a runtime environment for J2EE applications.









