Alex Kolchinski

Announcing TalkToSales, which makes websites voice-native

Today, I’m proud to announce the launch of TalkToSales (www.talktosales.com), which pioneers a new way people can interact with computers.

When you watch science fiction movies like Star Trek, people are often shown talking to computers rather than typing to them. And when visuals like maps and diagrams are useful for the conversation, the sci-fi computers show them exactly when they’re relevant.

This makes a lot more sense than the way we currently interact with software.

As humans, we’re wired to exchange information by talking and listening, not by typing and reading. But our interactions with computers to date have largely been limited to typing and reading, because until last year, computers weren’t able to have spoken conversations with us well enough that it felt natural. So, we’ve had to build software that relied on interactions with mice, keyboards, and touchscreens, instead of more natural spoken conversations.

These constraints were at their most extreme a few decades ago, when underpowered computers and displays forced us to interact with software through command-line interfaces (CLIs). But as computers and screens got better, most software quickly moved to easier-to-use graphical user interfaces (GUIs). GUIs are a lot better than CLIs, but they can still be awkward — navigating nested menus with hundreds of buttons is hard and slow. This is especially the case on phones, where small screens make it even harder to interact with complex software.

But now, just like the graphical part of the Star Trek computer was unlocked a few decades ago, making navigation through GUIs possible, the voice part was unlocked last year, with AI finally able to have natural spoken conversations with people. And now that the “Star Trek computer” is finally possible, we have the opportunity to reimagine how software should work — both how we can use it by talking to it, and how the graphical parts of it should look now that they can be focused on displaying information instead of showing buttons and other things for users to click on and type into.

But so far, we’ve barely begun exploring what has become possible with voice-controlled software.

The AI apps themselves — primarily ChatGPT — have allowed people to experience talking naturally to AI. And a number of companies have begun automating business phone calls with AI systems as well. But very little has been done so far in integrating rich, graphical software with voice AI in ways that realize the vision of the “Star Trek computer”.

We believe one major place this needs to happen is the Web.

Of course, desktop software still exists and would be far easier to use with voice controls, and smartphones would be much nicer to use with smart voice control as well. (Hurry up, Siri 2.0!) But ever since the Internet and cloud software revolutions happened, much of humanity’s interactions with software — and our interactions with each other, when mediated by software — happen through the Web.

So, the question we set out to answer with TalkToSales has been — “How should the Web work now that it’s possible to talk to computers?” This has led us down some paths that were quickly obvious, and down others that we only uncovered along the way.

Initially, like many entrepreneurs today, we were focused on automating business phone calls. This is a niche where voice AI can create value very quickly by replacing “press 2 for customer support, press 3 for returns” type systems and handling repetitive calls that human call center workers would otherwise have to spend time on.

But we realized that in many cases, these phone calls are only happening in the first place because people aren’t able to get their needs met through a website. This could look like anything from an e-commerce customer with a question about a product needing to dial a support number, to a businessperson shopping for software needing to schedule a sales call to learn enough to evaluate a product. In any case, this kind of “wait on hold/schedule a call” flow is awkward, time-consuming, and frustrating.

And in the case of a website telling visitors to call a phone number to accomplish something, it’s also bizarre to see a 1990s technology (websites) have to redirect users to an 1800s technology (telephones) to get something done.

All too often, the hassle of getting on a call with a person – and the knowledge that it’ll be impolite to leave early even if the call is a waste of time – deters people from getting on a call at all, leading to unfulfilled needs for them and lost sales for businesses.

We realized something here wasn’t right — why were we working on using AI to automate the phone calls that people were often placing when a website couldn’t meet their needs, when we could work on making the phone call unnecessary in the first place?

Why shouldn’t we instead turn the website INTO the call?

Thus, TalkToSales (T2S) was born.

What we are launching today is the first product that can make a website have a real conversation with visitors. The conversation can use any AI voice you want, and even a photorealistic video avatar (our demo uses my face and voice), an animated character, or no video representation at all if you want to keep it simple. T2S can also be used in text input and/or output mode for users who are in quiet places or want privacy.

T2S works with desktop browsers as well as with mobile ones — where it can be particularly useful in helping users have a rich interaction with a website despite the limitation of a small screen.

T2S bolts on seamlessly to existing web pages and can give users live tours of a page during a conversation, scrolling and navigating around to show content as it becomes relevant. It can even navigate around complex web apps while talking to users, allowing for live customer support, onboarding, etc. — or other use cases like helping users comparison shop on an e-commerce page.

As well as navigating around websites, T2S can show off dynamic slides, including visual assets like images, videos or even things like animated 3D diagrams, as they become relevant to the conversation. The materials available as visual assets can be customized to each use case.

T2S is also configurable with custom knowledge bases, style guides, and more, so that the AI can execute faithfully on any company’s goals and stay on-brand. This includes the degree of “improvisation” allowed, so that companies in sensitive industries can ensure the AI doesn’t make anything up, but less sensitive use cases can allow more freedom for a wider-ranging conversation.

T2S also includes the ability to book conversions right inside a conversation. For a software product, a conversion might be a prospective customer booking a follow-up call with a salesperson. For e-commerce, it might look like an immediate purchase. In any of these cases, a customer can commit to a call, purchase, or other conversion event without even leaving a T2S conversation. Time kills deals, and T2S ensures that appropriate conversions are offered proactively to customers before they have a chance to leave the page and forget to follow through.

T2S also supports qualifying leads right through the conversations it has with visitors. It would be counterproductive to encourage every single visitor to book a sales call, and waste a sales team’s time with low-intent or poor-fit leads. It would also be counterproductive to talk to a prospective investor as though they were a prospective customer, or to talk to a prospective recruit as though they were a journalist.

T2S supports asking visitors relevant questions to learn what kind of interaction it makes sense to have with them, and driving them to the conversion event that’s most appropriate for them, whether it’s an immediate purchase, a follow-up call, a mailing-list signup, or nothing at all. Websites are one-size-fits-all, but a T2S interaction – including the dynamic content as well as the conversation itself – can be completely customized to each visitor.

Moreover, the dynamic, flexible nature of a T2S conversation doesn’t just make for a better experience for web visitors and more conversions — it can also be hugely beneficial from an analytics perspective. Traditional web analytics are extremely limited by the constrained nature of a user’s interactions with a traditional website. When a user uses a GUI web app, the main trace they leave is the pages they visit, the time they spend on them, and where and when they scroll and click. That can only tell you so much about what people are actually hoping to get from your website, and if they’re getting it or not.

The advantage of T2S is that having a real conversation with someone is inherently much more informative than recording how they read a “brochure” style web page. And T2S offers full transcripts of users’ conversations – updated live as they happen – so it’s possible to see exactly how visitors are interacting with a page in real-time. By seeing what visitors ask about, seeing what answers and content are offered to them in exchange, seeing if they express satisfaction or frustration, and why, it’s possible to get a deep, rich view into customer preferences and needs, including which needs are being unmet.

T2S AI is also able to automatically identify trends in users’ conversations and surface problems, unanswered questions, and unmet needs, offering a constant finger on the pulse of a business. In this way, T2S essentially becomes an always-on user research interview with every single user who opts into the conversational experience on a business’s website, yielding insights for the business’s leaders that are comprehensive, nuanced, and always up-to-date.

In addition, T2S offers one other feature that we believe will change how people interact with each other over the Internet: when a web visitor is having a conversation with T2S, a representative from the company behind the web site reading the live transcript can choose to “call” the visitor. If the visitor accepts (in voice-only mode, or with video on), the company’s representative replaces the AI avatar and is able to have an audio or video call with the visitor instantly. In this way, the company’s founders, salespeople, etc. can actually jump in live and interact with users as they browse the website.

This is exactly what physical-world shopkeepers have always done, but it has never before been possible with online businesses.

Now, customers will be able to make an instant connection with the people behind a T2S-powered website, which we expect will hugely increase the number of customer conversations businesses can have. For businesses for whom the rate at which web visits convert to conversations is an important factor in the performance of their sales funnel, we expect this functionality will significantly increase the number of web visitors that make it to a conversation, then all the way down the funnel to a purchase.

Among the features of TalkToSales, we see the AI-powered conversation functionality and the ability to transition to a human conversation as being complementary to each other. Web visitors are often deterred from attempting to talk to a real person because getting through to someone is often hard, and even when it isn’t, it’s a big mental jump to go from passively browsing a web page to interacting with a real live person who you have to maintain a certain degree of composure in front of. It’s also intimidating to get on a call with a person that you don’t know you’ll want to stay on, since it’s impolite to hang up suddenly.

Talking to an AI is much easier to initiate, and has much lower social barriers, since there are lower stakes to the interaction and it’s not impolite to hang up suddenly. But if an AI conversation goes well, it can be much more natural to transition to talking about the same thing with a human, compared to going straight from browsing the web to a human conversation. In this way, T2S can create a smooth on-ramp from web browsing to the human conversations that drive closed deals.

In fact, the benefits go both ways: web visitors can get a better sense of a business’s offerings by talking to the AI before committing to talking to a person, and the company’s representatives can use the live transcripts of AI conversations as a filter to decide which web visitors are worth allocating a human to talk to. In this way, we see T2S as not just an AI guide, but also as a way to let people connect with each other to do business over the Internet more smoothly and efficiently.

We think TalkToSales can be useful for a number of use cases out of the box.

The one we’ve focused on the most to date — and which you can see in our demo — is for websites of businesses that are selling something, where the website is a significant touchpoint in customers’ purchasing decisions. This could be a software business, where T2S could help users understand the software product and give them a live tour of it — just like what you see in our own demo. Or it could be an e-commerce site, where T2S could help users comparison shop — imagine a Home Depot man in an orange apron helping you shop on their website for specific tools or parts. Or it could be an insurance webpage, where T2S could help users compare policies and get to a purchase — maybe the avatar could be Flo for Progressive or the gecko for Geico. It could even be something else entirely, like an apartment building’s website, where users could take virtual tours of different apartments and ask questions, while being “shown around” by a T2S avatar.

T2S could also be useful for post-purchase situations — one we’ve talked a lot about is using it as a guide for complex software products. Imagine Microsoft Clippy (remember that?), but that actually works, and can show users how to use Excel, or Salesforce, etc., and even take actions for the user based on spoken requests — “Can you please make this row bold?”, etc.

The sky is the limit!

Today, you can try a fairly simple example for yourself on our website – www.talktosales.com

Our demo uses a video and audio clone of me, Alex. It qualifies you as a lead for Talk to Sales, and directs you to an appropriate conversion event depending on how it qualifies you. It shows you around our simple landing page as product details become relevant to the conversation, and it shows dynamic slides with images and video sourced both from our library and from Internet stock photos, as relevant to the conversation. For real use cases, we’d expect to be using company-approved libraries of assets rather than stock photos, of course, but bear with us for the demo :).

We’ve been working on this demo for a few months now, partially as a way to prove to ourselves that our vision was really possible with today’s technology. It was very hard to pull off with today’s tech, but it’s working, and it’s quite a compelling experience if you ask me — see for yourself! What else is encouraging is that all of the underlying AI technology is improving at a blistering rate, so what you see today is the worst T2S is ever going to be.

Now that we’ve proven that this approach works, we’re moving on to commercializing it. Specifically, we’re now looking for three VIP early customers with whom we can deploy TalkToSales on their landing pages (or in a specific flow on their site, e.g. a checkout experience). Our goal is to choose three companies for whom there’s high potential of increasing revenues with a richer, more personalized web experience, and a smoother on-ramp from web visit to human conversation and/or purchase.

Since T2S is a brand-new product, we have no data on revenue lift yet — so we’re extremely motivated to make our first three customers insanely successful by lifting revenues by as much as possible.

We’re expecting these first implementations to be very hands-on — we’re willing to customize T2S to your specific use case to a significant degree, and we’re happy to handle all the technical dirty work ourselves. We’ll also make our initial implementations low-risk from an economic point of view (let’s talk details if you’re interested) and from a technical one as well — for example, we can show the T2S module to just a small percentage of your web visitors initially, and ramp up from there. And if T2S ever goes down, visitors will still be able to use your website exactly how they could before.

If you’re interested in exploring being one of our first VIP customers, please reach out at alex@talktosales.com or book a call at http://book.kolch.in

And if you or your company aren’t the right fit, but you know someone who might be interested or even just curious, we’d really appreciate it if you show them this post and/or our demo!

We believe that augmenting websites with natural, conversational experiences that include smooth on-ramps to reaching a human or making a purchase will measurably lift revenues for many companies that do business online.

We’re looking forward to proving it with our first customers.

March 28, 2025
The “strategic reserve” exposes crypto as the scam it always was

Today, President Trump announced that the US Government would begin using taxpayer dollars to systematically buy up a variety of cryptocurrencies. Crypto prices shot up on the news.

This is revealing, as crypto boosters have argued for years that cryptocurrency has legitimate economic value as a payment system outside of the government’s purview.

Instead, those same crypto boosters are now tapping the White House for money — in US Dollars, coming from US taxpayers.

Why?

Crypto has been one of the biggest speculative bubbles of all time, maybe the single biggest ever. Millions of retail investors have piled into crypto assets in the hope and expectation that prices will continue to go up. (Notice how much of the chatter around crypto is always around prices, as opposed to non-speculative uses.)

However, every bubble bursts once it runs out of gamblers to put new money in, and it may be that the crypto community believes that that time is near for crypto, as they are now turning to the biggest buyer in the world — the US Government — for help.

This shows that all the claims that crypto leaders have made for years about crypto’s value as a currency outside of government control have been self-serving lies all along: the people who have most prominently argued that position are now begging the White House to hand them USD for their crypto.

It also reveals how much crypto has turned into a cancer on our entire society.

In previous Ponzi schemes, the government has often stepped in to defuse bubbles and protect retail investors from being taken in by scammers.

But in this wave, not only has the government not stepped in to stop the scam, it has now been captured by people with a vested interest in keeping it going as long as possible.

Our president and a number of members of his inner circles hold large amounts of cryptocurrency and have a vested interested in seeing its value rise — Trump’s personal memecoin being a particularly notable example. And many other people in the corridors of power in Washington and Silicon Valley are in the same boat. “It is difficult to get a man to understand something, when his salary depends on his not understanding it”, and so some of the most prominent people in the country are now prepared to make any argument and implement any policy decision to boost the value of their crypto holdings.

How does this end?

Once the US taxpayer is tapped out, there’s not going to be any remaining larger pool of demand to keep crypto prices up, and in every previous speculative bubble, once confidence evaporates, prices will fall, probably precipitously. Unfortunately, as millions of people now have significant crypto holdings, and stablecoins have entangled crypto with fiat currency, the damage to the economy may be widespread.

The end of the crypto frenzy would, in the end, be a good thing. Cryptocurrency has a few legitimate uses, like helping citizens of repressive regimes avoid currency controls and reducing fees on remittances. But it has also enabled vast evil in the world. Diverting trillions of dollars away from productive investments into gambling is bad enough, but the untraceability of crypto has also enabled terrorist organizations, criminal networks, and rogue states like North Korea to fund themselves far more effectively than ever before. I’ve been hearing from my friends in the finance world that North Korea now generates a significant fraction, if not a majority, of its revenues by running crypto scams on Westerners, and that the scale of scams overall has grown by a factor of 10 since crypto became widely used (why do you think you’re getting so many calls and texts from scammers lately?)

I hope that the end of this frenzy of gambling and fraud comes soon. But in the meantime, let’s hope that not too much of our tax money goes to paying the scammers, and that when the collapse comes it doesn’t take down our entire economy with it.

Thanks to Alec Bell for helping edit this essay.

March 3, 2025
I’m looking for a cofounder
Summary

I’m looking for a cofounder for my next company. I want to work on AI-powered B2B workflow automation software, but I’m not committed to a specific direction yet.

I’m currently working on a workflow automation product in the insurance space that just crossed $10K/mo in revenue. In the past, I’ve been the CEO of a YC-backed startup, a PhD student at the Stanford AI Lab, an APM at Google, and a software engineer.

I’m looking to either stay CEO and join forces with a technical CTO, or to become the CTO to an exceptional CEO. For more details, read on!

If you’re interested, or know someone I should talk to, please reach out — I’m at alex@kolch.in

My Background

I grew up programming, “turning pro” when I sold a Flash game in high school. I worked full-time as a software engineer for a year between high school and college, then earned a BS/MS in computer science at UChicago, concentrating on AI.

After college, I worked at Google as an APM, then went back to Stanford for a PhD program. Initially, I was planning to focus my research on AI-powered tutoring software, but came to the conclusion that the underlying AI technology wasn’t powerful enough yet to build what I wanted to build. So, I pivoted to doing AI research in natural language processing and generative models, publishing three papers.

I then got excited about startups generally and automating food service specifically, and dropped out of Stanford to co-found Mezli, which I led as CEO. We launched a popular autonomous restaurant, but went out of business after our Series A fundraise fell through. I raised $4M from investors including Y Combinator (but failed to raise the additional ~$10M we needed), led a team of ~30, and ran several functions including finance and marketing. Google Mezli for news, reviews, etc., and you can see a video of the tech here – https://www.youtube.com/watch?v=DV2I9XwcEZE

I spent the latter half of 2023 shopping around Mezli’s IP and shutting down the company. While doing that, I used the newfound free time on my hands to build and launch a couple of products solo. One is a B2C utility app (www.readtome-app.com); the other is a workflow automation product in the insurance space that just crossed $10K/mo in revenue.

I’m now evaluating whether to keep doubling down on this product or to pivot to a different niche that might be a faster place to grow to $1M+ in annual revenue. As I start that discovery process, I’m also on the hunt for a cofounder.

My Skills

While the previous section probably gives you an idea of what I can do, here’s a more specific breakdown of my skills:
- Entrepreneurship: I’ve launched multiple products as CEO or sole founder, one reaching $20K in monthly revenue and another reaching $10K/mo. Between those experiences, several other exploratory projects, and spending a lot of time in the startup ecosystem, I’ve developed a pretty good sense for what it takes to grow a company from an idea to meaningful revenue — and more importantly, how to quickly discard the many ideas that prove unviable.
- Software engineering: I’ve built a wide variety of software over the last 22 years (time flies…) and I’m very good at picking up new technologies quickly and getting things shipped. This includes the latest wave of AI tech — I’ve leaned heavily on modern generative AI for my last two products. However, I’m very much not a VP Eng who institutes best practices in a large team — I’m more on the “incur tech debt to get to product-market fit” side of the spectrum than the “pay down technical debt to make a product long-term maintainable” side of it.
- AI research: I spent three years of my life largely focused on AI research at Stanford. While I haven’t trained a custom model in a few years, I still have a good understanding of what today’s AI can do, and of what’s likely to be possible soon. I actually don’t think most startups should be in the business of conducting AI research or even of commercializing techniques directly from research papers, but if that changes, I have the relevant background!
- Sales, fundraising, and recruiting: I group these together because in a sense, they’re all manifestations of sales. I’ve run a sales or sales-adjacent process many times, including pitching hundreds of investors to raise $4M for Mezli, recruiting a number of Mezli’s team members, and now selling B2B software in the insurance space.
- Marketing and PR: I’ve run marketing campaigns for Mezli’s robotic restaurant and more recently for the ReadToMe app. The Mezli campaign included local and national media hits and drove ~1M views and ~10K purchases for our brand. I’ve also had success reaching a broad audience with my blog, including multiple front-page posts on Hacker News and ranking #1 on high-volume Google search terms.
- Finance: I have a good understanding of how the numbers work that make businesses tick, and of how the financial markets work as well. This has come from a variety of classes in college and grad school, an internship on Wall Street, and running the financial side of Mezli.
What I want to work on

In short: AI-powered B2B workflow automation software.

Like many others in Silicon Valley, I think that the current wave of progress in AI is a generational shift — we’re likely at the beginning of a change as big as the advent of the Internet.

Until now, computers have largely been able to transmit and process information only in rigidly-defined ways: humans have had to define exactly how data could be input into programs, stored and processed by them, and given back to users.

That’s now changing in a spectacular fashion. Thanks to transformer-style models, many tasks involving unstructured data (natural language, images, video, audio, etc.) that were previously the exclusive domain of humans can now be partially or fully done by software.

This includes a broad swathe of repetitive white-collar work that accounts for many trillions of dollars of value created per year. With AI tooling, humans in the affected industries — and that includes most industries — will be made vastly more productive. An analogy might be the transition from paper spreadsheets to electronic spreadsheets a few decades ago, or paper mail to email more recently.

That a lot of this automation is about to happen, and that a lot of economic value is about to be created, I haven’t heard anyone seriously contest. The more uncertain question is which companies are going to create that value and capture a share of the value they create.

Some important sub-questions:
- How much of the action is going to be captured by incumbents vs. startups?
- Which parts, if any, of the new landscape are going to be owned by a fragmented assortment of companies, vs. a small number of huge players?
- How will the domains of different companies be carved up — by industries served, type of technology used, both, or neither?
- Will some companies playing in the new AI-enabled landscape have significantly better economics than others, which might in turn look more like services businesses? How will that be determined?
The answers to most of these questions are currently unknowable, so my inclination is to jump in, get to $1M+ with a narrow “wedge” product that’s quick to sell, then grow outwards from there, responding to inevitable shifts in the competitive landscape of the software industry, including advances in the capabilities of AI, as they transpire.

I’m inclined to start out by solving a fairly narrow problem in a specific industry to begin with, as opposed to building tools for other companies to use “horizontally” across industries. This is for a few reasons:
- A well-constrained customer profile can make the sales learning curve faster early on, allowing for a faster ramp-up in revenue.
- I’m seeing way more founders jumping into the tooling layer right now than into the application layer. This likely implies less competition in at least some vertical-specific product areas.
- The landscape of what’s possible with AI, and which tools are needed to do it, is changing at a very fast pace. Many tooling companies whose products are useful and popular today might be completely obsolete tomorrow. A company whose product is solving an application-layer need is more likely to be able to make use of improved AI tooling seamlessly and continue serving its customers instead of being replaced.
I’m now on the hunt for the right vertical, and the right niche in that vertical, to get started. It’s possible that my current product in the insurance space will be that initial product; it’s also possible that I’ll find something else that I like better.

As I start that discovery process, I’m on the hunt for someone to do it with, so that we can shape the product direction of the company together.

Who I want to work with

I’m in the relatively unusual position of having a background in both the business and product side of entrepreneurship as well as in the technical side, both in software engineering generally and AI specifically.

However, rather than continuing as a solo founder, I’m looking for a cofounder for a few reasons. With the right cofounder,
- Starting and running a company together is more fun than doing it alone.
- Two heads are better than one.
- Many hands make light work.
As far as who I’m looking to work with, I can see one of two arrangements working well:
- I stay CEO and a CTO joins me who has a history of shipping software quickly. I focus on selling; the CTO focuses on building and I help build when appropriate. In this arrangement, we’d go through the discovery process together before committing to a direction for the company.
- I become CTO to an extraordinary CEO. The CEO sells, I build. Because I’ve been the CEO of a startup that had some temporary success, I have a pretty high bar for this one: the CEO would either have to have had a previous exit as a startup CEO, or be a veteran of an industry that they can immediately start making sales in — or both.
Either way, personal and professional compatibility are key — we should get along famously, and working together should feel like much more than the sum of its parts.

Other preferences

There are a few other things I should mention about my preferences:
- I feel very strongly about working in-person together, most days of the week, most weeks of the year, in or near San Francisco. I want to stay in-person forever, and I want to stay in the Bay Area indefinitely, with the possible exception of if we end up serving a customer base that’s heavily concentrated in a different city.
- Founding a startup requires a level of intensity that’s alien to most people — when done right, it doesn’t leave room in your life for much else that takes proactive effort. You need to be prepared for this. However, some founders grind to the point of negative marginal returns, and that’s not a good idea either. I strive for, and expect from cofounders, a level of intensity that’s far beyond a 9-5 job, but that does (at least periodically) leave room for recovery and perspective.
- At this stage of the game I’d be looking to split equity equally, with one extra share to the CEO to break ties. I also prefer a longer vesting schedule than 4 years to align founder incentives for the long-term.
Interested, or know someone I should talk to? Please reach out — I’m at alex@kolch.in
February 27, 2024
Announcing ReadToMe
Today, I’m announcing the public launch of the ReadToMe app, which turns paper books and other printed text into high-quality audio.

I originally built the app as a present to my fiancée, who has a reading disability but loves books. Often, she listens to audiobooks while following along in the same book on paper, but some books, especially older and less popular ones, are only available in paper form and don’t come in audiobook or even e-book form.

We looked for a way for her to turn paper books into audio, but all of the apps we found didn’t do a very good job. Many of them were very good at turning e-books and other digital text into high-quality audio, but made many mistakes when scanning paper books. Common issues included inserting page numbers and footnotes into the middle of sentences and getting words wrong or missing them completely. Overall, there ended up being so many mistakes that the books were very hard to listen to — and that was for the better apps.

As a Christmas present, I wrote an early version of ReadToMe for my fiancée. When she found it useful, and I had spare time on my hands while shutting down my last company, I built out the app into its current version — which is what you see here.

The app lets you scan up to 20 pages at a time and turns them into high-quality audio, with very few mistakes. The app costs $9.99/month for up to 250 pages/month — sorry it can’t be free; I estimate the $9.99/month will cover the costs of the pretty expensive AI technology it uses on the back end.

Known issues that I’ll be working on fixing if enough people end up using the app include:
- Scans can take a few minutes to come back as audio, especially when scanning multiple pages at once.
- Rarely, scans will fail to come back as audio at all and have to be retried.
- The AI I’m using on the back end will sometimes “correct” the wording of a book, especially when an author deliberately uses incorrect grammar.
I expect the app might be useful for a couple of types of user, including:
- People with reading disabilities or just age-related far-sightedness that makes reading hard.
- People who want to seamlessly bounce between reading something on paper and listening to a few pages, e.g. while driving.
I’m also looking forward to seeing who else might find it useful.

If that’s you, or you know someone who might find ReadToMe useful, please give it a try and/or let them know! And I’d appreciate any feedback on things that the app does well or poorly — you can reach me at alex@yaksoft.net
February 4, 2024
2023 Reflections
This year, I had to shut down my autonomous restaurant startup Mezli — despite a successful launch — after I was unable to raise the money we needed to scale up.

This year, I’ve also re-entered the world of AI, which I’ve been away from since leaving Stanford in 2020. I’m now working on a new workflow automation startup and have gotten very excited about the opportunities created by recent advances in AI.

However, the emergence of true artificial intelligence is also shaping up to be the fastest-ever large economic and social change in human history, and tracing how it may reshape business and society is both intellectually interesting and strategically important for entrepreneurs (and just about everyone else, for that matter).

While chewing on the lessons I’ve learned from shutting down a hardware startup and re-entering the world of AI, I’ve summarized my thoughts into three essays, which I hope can be useful to the startup community and spark an interesting conversation:
A summary of my arguments is as follows:

It’s likely that AI is now only a small leap away from becoming as good as humans at most intellectual tasks. Even if that’s not the case, its current capabilities are already good enough to do broad swathes of knowledge work that are currently done by humans.

So, we’re likely entering an era of knowledge work automation that may parallel the last 200 years of physical work automation, but it’ll likely happen much faster this time. We’ll likely see far fewer people doing analytical work over the next ~20 years, just as fewer and fewer people did agricultural and other manual work over the past ~200.

This automation of knowledge work is likely to reduce the software industry to a shadow of its former self, as programming will become largely automated.

However, there are many opportunities to build software startups right now, and software presents a uniquely good opportunity for first-time founders to build companies that reward good execution and provide quick feedback.

Hardware companies, on the other hand, are very difficult for first-time founders because their unavoidable capital requirements make them vulnerable to running out of money, and because their slow cycle times make for a slow education in entrepreneurship.

So, I highly encourage today’s aspiring entrepreneurs to focus on software, especially because we may be entering the last era when the software industry, and software entrepreneurship, exists in its present form.

That said, given the rapid changes in AI capabilities, I also think it’s especially important for today’s AI founders to keep an eye on the long-term competitive advantages their companies develop, as technology built on top of today’s AI may become infinitely easier to build on top of tomorrow’s AI, negating the technical defensibility of much of the work being done by startups today. Other forms of defensibility are likely to be more important in the future.

I welcome all comments, especially those that point out things I missed or refute a claim I’ve made. Hopefully we can all learn from the debate.
December 11, 2023
Founders, Beware Hardware
This essay is part of a 3-part series:
Hardware startups are sexy. Building flying cars, nuclear reactors, or electric cars is more tangible and in many ways more interesting than writing software. It’s also possible to tackle a wider range of important problems with hardware than with software alone, from global warming to food security.

For these reasons, many founders and would-be founders dream of starting hardware companies, and some of them actually do. Occasionally, that decision works out reasonably well, but most of the time, it ends in tears, especially for founders without previous successful outcomes. This is due to factors inherent to hardware companies, which the startup community is somewhat aware of but perhaps not enough. In this essay, I’m hoping to make those factors clear, with the hopes of saving other founders future pain.

This warning comes out of my own story.

I’m from a software background — I started programming as a teenager, studied computer science and AI in college and in grad school, and have worked in different parts of the software industry since high school. But in grad school, while studying AI, I became enamored with the idea of automating food service.

I realized that automating fast-casual food service (think Chipotle, Sweetgreen, etc.) could bring down the costs of building and operating restaurants by so much that high-quality meals could be sold at half the price they’re sold for today. To realize this vision, I co-founded Mezli with two friends from Stanford, and a year and a half later, we successfully launched a fully-robotic restaurant. However, despite selling thousands of meals at near-100% uptime and significantly better unit economics than human-powered restaurants, we were unable to raise more money and were forced to shut down.

Some of this was due to the downturn in the funding market — it was much harder to raise money in 2022 than it had been even a year or two earlier. But in hindsight, just being a hardware company made us much more fragile than an equivalent software startup would have been.

One crucial factor that I underestimated at the beginning of Mezli was the mandatory requirement for raising escalating amounts of capital to keep making progress with a hardware startup. People sometimes talk about hardware startups being “capital intensive”, but this is only part of the picture — plenty of software companies also invest hundreds of millions of dollars into engineering, sales, and marketing before they become profitable. The difference is that most software companies can adjust the amount of capital they invest in product development and growth as conditions change. If investor money becomes unavailable, a software company can usually lay people off and coast on revenues until the economy improves — or never fundraise again.

This is simply not possible for most early-stage hardware companies. Hardware products typically require substantial scale to reach profitability, which means huge up-front investments are necessary before turning a profit becomes possible. This means needing to raise numerous rounds of external funding, with each being do-or-die. The immediate implications of this dynamic are bad enough, but there are second-order impacts as well — even if an investor likes a hardware company’s prospects based on its technology and economics, well-founded concern that a single missed fundraise at any point in the future would be enough to kill the company can easily be a significant-enough factor to torpedo the current fundraise as well — a sort of “Keynesian beauty contest” dynamic that sounds academic but is very real.

Even a hardware success story like Tesla suffered from this dynamic and would have folded if not for repeated bailouts by Elon Musk. And for every Tesla, there have been countless hardware companies with equally-promising products but without the deep-pocketed backers needed to see them through lean times.

A second crucial drawback of hardware startups, which I think is often underappreciated, is the glacial speed of iteration compared to software. A software company can launch and update its product(s) and go-to-market motion near-instantaneously, allowing for very fast iteration. This is a huge advantage to a startup, allowing for fast contact with the market and evolution of its product(s) in the direction of customer pull — essentially the essence of the “Lean Startup” philosophy. Many software companies have found product-market fit and grown large on the back of this approach.

But the value of the ability to iterate quickly is not only to the benefit of the startup itself; it’s also to the benefit of its founders personally. When it’s possible to experiment on a daily cadence, to see efforts succeed or fail, and then to try again the next morning, the rate at which it’s possible to become a more-skillful entrepreneur is very high.

Hardware companies present a completely different picture than this when it comes to the speed of iteration. Hardware products typically take months or even years to design, manufacture, and ship, so that much more has to be guessed up-front that is only proven right or wrong by the market a long time later. And with only one or several iterations possible before the company proves a success or failure, the founders have limited opportunities to learn from the experience of bringing a product into contact with the real world. This slower learning curve is especially bad for first-time founders, who need to spend years learning lessons they would have learned in weeks with a software product.

And one final disadvantage of hardware is that even if a hardware product proves a huge success, the slow cycle times of hardware mean that scaling up that success and enjoying its fruits takes far longer than it would for a typical software company.

Of course, there are some exceptions to these rules — hardware companies where mandatory requirements for capital and long cycle times are less of a factor. A common example are companies that use off-the-shelf hardware to deliver a product differentiated by software. This includes companies that use off-the-shelf drones accompanied by computer vision models to inspect bridges and pipelines, companies that use AI models to get robot arms to pack boxes or weld metal, and many such others. By not having to design and build their own hardware, these companies are subject to fewer — but still many — of the issues that dog hardware startups.

Notably, both of the factors that make hardware startups a risky move for first-time entrepreneurs — capital requirements and slow learning curve — are less of a problem for repeat entrepreneurs or industry veterans. People who have amassed capital and experience by starting successful software companies or playing significant roles in existing hardware companies are often the best-positioned to tackle a hardware problem with a new startup.

And one-in-a-billion technical experts whose niche knowledge presents a necessary edge in a “hard tech” category may also find that it makes sense for them to start a hardware company. Such cases are especially common in biotech, but can also be seen in fields like nuclear energy and aerospace. For someone who’s spent twenty years becoming the world’s leading expert in a space like gene therapy or nuclear fusion, starting a company in that space can make sense — though that company will still be subject to the risks of high capital requirements and slow cycle times.

But for first-time generalist entrepreneurs without experience and capital, it’s hard to beat the advantages of software startups. There’s a reason why Silicon Valley startup culture came into being with the rise of the software industry, and why the vast majority of founders who go from nobodies to big successes make it big initially with software companies. (Dalton and Michael from YC made a great video about this.)

The software industry, and the startup ecosystem that’s part of it, are going to undergo big changes in the coming decades under the influence of AI, and may soon present fewer and fewer opportunities for entrepreneurs to make a mark. But until that happens, my advice to aspiring entrepreneurs who are looking for the most promising problems to work on is to strongly prefer founding a software company over a hardware company.
December 11, 2023
The End of the Software Industry
This essay is part of a 3-part series:
The AI revolution now taking place is poised to change the software industry in fundamental ways, and likely, to shrink the number of people employed in it. Historically, our industry has consisted of smart people painstakingly telling computers how to do specific tasks in the arcane languages that computers understand. We call this process writing software. AI is upending that model.

Already, generative AI tools have substantially increased the productivity of programmers, allowing for more niche or lower value software to be written than previously would have been justifiable economically. But even bigger changes are afoot.

We’re moving to a future where AI can conduct arbitrary information-processing tasks based on natural-language instructions from any reasonably intelligent human who understands the problem they’re trying to solve. Manually instructing the computers exactly what to do will no longer be necessary. Essentially, the aims of the no-code movement are coming into focus, though in a different way than might have been anticipated a few years ago — general-purpose AI models are making hand-coded Lego-style no-code tooling obsolete for many use cases.

What this means is that there are likely to be a lot fewer programmers soon than there used to be. There may, however, be an increasing number of people doing work along the lines of what product managers and salespeople currently do — understanding and solving customer problems, but with the details of the solutions being realized by the computers themselves instead of by teams of programmers.

This parallels the trajectory experienced by many other revolutionary industries, like railroads — after drawing in millions of people in an initial boom, it’s not uncommon for an industry to settle into a relatively stable long-term economic position while relying on fewer and fewer workers to sustain its position.

If the software industry experiences the same dynamic due to more and more of its work being done by AI, this development will also have significant implications for the startup ecosystem. The ability of smart, often young, people with little or no money to start businesses that grow to huge revenues in a decade or less has only really ever been possible in the software industry, which in turn has only existed in its current form for about 50 years. That era is likely to be coming to an end in the near future.

This is because a software startup is an organization in which a small group of smart people create new value in the market by building and selling new software — that is, by manually telling computers how to solve a specific human problem. As building software manually becomes less and less necessary, software startups are going to either change in fundamental ways or go extinct entirely. This extinction event, if it happens, will also take the venture capital industry down with the startup ecosystem that feeds it.

Of course, how long it takes until AI technology advances to the point where manual software engineering is as antiquated as programming in machine code is today remains to be seen — it could be next year; it could be 50 years from now. But it’s almost certainly coming this century, and even until the full extinction event is complete, the nature of what software is and what it does is going to be constantly changing.

This leaves software entrepreneurs, myself included, in a precarious position now. There is more value to be created with software right now than at any point in recent memory, perhaps ever. Trillions of dollars worth of business processes can now be automated, creating huge efficiencies, and trillions more will likely be possible in the next few years. Building and selling software to meet those needs still requires teams of highly-skilled people manually writing code, so there’s a bonanza taking shape in the startup ecosystem — many entrepreneurs are racking up eye-watering revenues with AI products this year, and I expect that to grow substantially in the near future.

But many of those startups are likely to be made obsolete almost as quickly as they came into existence. If AI technology advances to the point that a business need can be solved with little to no work on top of general-purpose AI models, startups’ products painstakingly built with manual effort on top of earlier generations of AI technology will quickly lose their value, and those companies will quickly bleed revenue and wither if they haven’t accumulated other advantages in the meantime.

Of course, it’s entirely possible that some companies founded in the current AI wave will build more durable assets than quickly-obsolete technology, like owning valuable data or setting themselves up as hard-to-circumvent intermediaries between other companies. But many others are likely to earn huge revenues for a few years by building and selling extremely valuable software, then see those revenues quickly dwindle to zero as their products become trivial to replace due to further advancements in AI.

I believe it’s wise for today’s AI entrepreneurs to take this possibility into account. In particular, the possibility of huge revenues up front, followed by a rapid decline, introduces a new dynamic into the cost/benefit calculus of fundraising. In the past, fundraising was usually necessary to get a software product off the ground, because of the significant amount of up-front engineering work normally required to build something that could be sold for meaningful revenue. However, once that revenue was achieved, it was generally durable, because differentiated and valuable business software does not usually become obsolete overnight. Thus, VC funding has been an appropriate source of capital for software companies, which have required risky up-front investment but which have then generated large and durable revenue streams, and commensurate liquidity events, if successful — providing payoffs down the road for both investors and founders.

This dynamic may be flipped for many of today’s AI companies. Many previously infeasible or impossible to solve high-value needs in business software are now suddenly solvable in weeks to months with a good team, thanks to recent advancements in AI, and those solutions can quickly generate hundreds of thousands or even millions of dollars in revenue. But that revenue is likely to only last for a few years until the underlying product becomes trivial to replace, unless the company selling the product creates a long-term competitive advantage in ways that are more resilient to rapid advancements in AI.

Unfortunately, that dynamic presents significant risks for founders taking venture capital. A founding team able to generate tens or hundreds of millions of dollars in revenue over 5-10 years with a product that then vanishes in a puff of smoke will see very different outcomes depending on whether or not it takes venture capital. Without external investment, those revenues, net of costs, will be the founders’ to keep. With external investment, the founders will be able to take a relatively meager salary, but will then be expected to plow those revenues into future growth, which may not be available if the product direction is obsolete. In this way, raising even small amounts of venture capital can turn a multimillion-dollar outcome into a zero for the founders involved.

My conclusion is that for teams building in today’s AI landscape, it may be wise to forego fundraising entirely, at least at first, and instead to focus on generating as much revenue as possible, as early as possible. In many cases, it may be possible to get to millions of dollars in revenues with a small, self-funded team, then swing for the fences and attempt to create a huge and durable company from there. But if that proves not to be possible, at least the founders of such a company will have a sizable, if temporary, profit stream to hedge the risk of a dead end.

And for the software founders involved, being able to guarantee personal financial security may be more important now than ever, as our ability to generate economic value is likely to decline or even vanish in the decades to come as AI becomes better at building software than we are. Now is the time to gather our rosebuds while we may — and to reduce the risk of a zero outcome if possible.
December 11, 2023
The End of Knowledge Work
This essay is part of a 3-part series:
For almost all of human history, the vast majority of people had to rely on their brains and brawn to survive, through hunting and gathering and then agriculture. Animal power provided some assistance, but things started to change much more fundamentally once people learned to harness non-biological energy: wind, water, and then most impactfully steam.

With the advent of more plentiful energy and the development of more capable mechanical technology, the world changed. Jobs that used to be done with muscle power became easier to do with machines, and the roles of people in society changed as well.

Up until the 19th century, most workers’ jobs were simply to make food. But over the following century, agriculture became highly mechanized, and the fraction of people in industrialized countries that worked as full-time farmers plummeted:

At first, this agricultural revolution freed people to become other kinds of manual workers — especially in manufacturing and transportation, where they made the wide variety of physical implements needed for the new industrialized economy, and moved them to where they were needed.

But over time, increasingly capable technology has caused the non-agricultural blue-collar part of the economy to require less and less labor as well. A factory that took 1000 people to run might now run with 30 technicians keeping an eye on the machines doing most of the work; a train that used to take 10 people to operate might now run with one.

Instead, in the industrialized world, people have increasingly become economically useful for their brains rather than for their brawn. A majority of US workers are now in white-collar fields, where they are paid for their knowledge and intellectual skills rather than for their strength and dexterity.

This process has had a wide range of effects on society. Physical strength and skill used to be a necessity for survival for most people. Now, an unusually strong American is as likely to be a hobbyist weightlifter as a manual laborer. Muscles used to be a tool; increasingly, they’ve become an ornament.

Instead, in developed countries like America, most people’s brains have become their most economically valuable assets. We’re becoming an economy where our interactions with the physical world are carried out by increasingly capable tools that require less and less physical effort from us — but where the management of an increasingly complicated world rests more and more on highly capable human brains.

Even our identities, to a significant degree, are tied up with our intelligence. We are homo sapiens, the wise human — we ascribe our preeminence as a species to our ability to understand and analyze the world around us.

It is that aspect of our existence that is now starting to change in the same way that our physical interactions with the world have been changing for the last 200 years.

Much like with draft animals before the Industrial Revolution, we’ve had some external assistance for our minds before now. Technology for communicating knowledge, from books to the Internet, has become more and more advanced over the centuries, giving us a longer and longer lever for our intellectual efforts. And computational technology, from abaci to today’s computers, has become more and more capable of doing routine but voluminous informational work quickly for us.

This has already changed the composition of the white-collar labor force: human “calculators” have been replaced by spreadsheets, and draftsmen by CAD programs. But bigger changes are now afoot.

We’ve reached a point in the development of artificial intelligence where its capabilities are extending into previously unimaginable territory. Computers used to need precise instructions for every task they could carry out, and they could only accept those instructions in their own arcane languages. Over the past year, this has changed — computers can now accept instructions in human language, written or spoken, and carry out those instructions successfully for even quite complex tasks.

Those capabilities are still far from perfect, but even if the abilities of AI were to be frozen at today’s level, white-collar work being performed by the full-time equivalent of many millions of people can, and will, be automated in the years to come using the technology that’s already available.

And future advances are inevitable as well. To date, most improvements in AI capabilities have come from deploying increasingly large amounts of computational power to train AI models on increasingly large amounts of data. Roughly speaking, every meaningful increase in AI capabilities has resulted from multiplying the amount of data and computational power used to train AI models by a factor of 10.

It’s likely that this approach will continue to bear fruit for at least one more “10x” cycle, which will likely take one to several more years. After that, data availability is likely to become a bottleneck — at that point, the latest AI models will have been trained on most extant codified human knowledge (books, the Internet, etc.) Feeding another “10x” cycle by synthesizing fake but realistic data, or gathering much more data from the real world, appears possible, but will likely be much slower than the current approach of using data that’s already available.

But in theory, it should also be possible to massively improve the data efficiency of training AI systems. Human brains have about 100 billion neurons, and humans can acquire a respectable education by learning information equivalent to the contents of a few hundred to a few thousand books. But state-of-the-art AI models, still far less capable than human brains, require many GPUs to train, and each GPU contains roughly as many transistors as the human brain contains neurons. And those state-of-the art models are being fed the equivalent of millions or even billions of books, far more than any human could ever read.

Thus, it seems likely that advances are possible both in the architectures and training methods we use to make AI models, as well as in the silicon hardware we train them on.

So, if and when significant advances in training efficiency happen, we’re going to be able to train much more capable AI models using the data, and likely the hardware, that’s already available. And soon enough — whether it’s in one year or twenty — the capabilities of AI in all analytical tasks are going to exceed those of most or all humans.

The consequences of this revolution — even extrapolating the capabilities of the technology that’s already available today, let alone that of more advanced systems — are going to be enormous, and are going to happen much more quickly than the Industrial Revolution. Unlike physical machines, AI technology can be deployed instantly to the whole world, meaning each advancement can be adopted as quickly as people can figure out how to use it, rather than being limited by the speed at which physical machines can be built and deployed.

What this implies is that much of the white-collar work that currently consists of analytical tasks — essentially, manipulating information — is going to be automated in the blink of an eye. A million people might work in data entry today — next year, that number might be zero. In a few years, when the capabilities of the technology advance, chemists or programmers might be similarly affected, and the process will continue until humans are only doing a small fraction of the analytical work that we do today.

In reality, in most lines of work, the AI won’t fully automate an entire profession — rather, it’ll reduce the amount of human effort required in that line of work by 10x, or 100x, or 1000x, just like what physical automation achieved in factories.

As a consequence, we’re going to see the raw information-processing abilities of human brains become less and less economically valuable. Technical specialists like programmers and traders, who work with self-contained purely-informational tasks, are going to see some of the biggest changes as soon as AI’s abilities exceed theirs. But most white-collar jobs contain a significant component of information processing, and are going to see that quickly handed over to the machines.

Interestingly, it seems that interpersonal relationship-oriented work is likely to be much less affected. Jobs ranging from bartender to bond salesperson rely heavily on interpersonal relationship-building, and much of the value people create in those jobs is dependent on them being people and not machines. In addition, high-end jobs that have as much to do with brokering ownership of valuable assets, power, and influence — think politicians, investment bankers, and influencers — are going to continue to be done by the people who care about accumulating that ownership, power, and influence, although those people are going to be increasingly assisted in their work by powerful AI, just as they are currently assisted by human workers.

Physical work will also be initially unaffected by the current revolution in AI — rescuing someone from a burning building or making a bed is still only possible with human hands. But the current AI revolution is increasingly looking like it’s going to unlock advances in robotics that will have a significant impact on the automation of physical work as well.

Currently, most physical-work automation is done by highly-specialized machines — walk into a reasonably-modern factory and you’ll see a number of large and expensive machines custom-tailored for jobs like bending rods or forming cans. This kind of special-purpose automation is only feasible for tasks where huge scale can be achieved in a single place — easier for tasks like manufacturing and agriculture than for tasks like house-cleaning and food-service, the need for which is naturally dispersed throughout the physical world and cannot be performed in a single place.

That said, mechanical technology is already advanced enough to make it possible to automate many of these highly-dispersed physical tasks that are still being done mostly by humans. The missing factor standing in the way of automating those tasks has rather been the intelligence available to the machines — robots have just not been smart enough to be able to do things like reliably navigate a house and manipulate a wide variety of objects.

That is all likely to change soon as the capabilities of AI continue to advance, and while the consequences will take longer to be realized in physical-world automation than with knowledge work, as they involve the manufacture and deployment of physical robots, the process will likely be faster than people expect.

This is because AI is going to unlock the wide use of a new type of automation: automation done by general-purpose hardware, whether that hardware takes the shape of humanoid robots, robot arms mounted on quadrupedal platforms, or something else entirely. And when one-size-fits-all robotic platforms powered by AI become broadly capable, they will be much cheaper and faster to manufacture and deploy than a heterogenous variety of specialized machines. This is exactly the same process thanks to which cars are now so cheap and ubiquitous — when you’re building a billion of something, you get a lot of efficiencies of scale. And when you’re building a billion general-purpose robots, which can do a wide variety of work thanks to AI, you can automate a lot of that work very quickly.

In this way, much of the work being done by people today — physical as well as analytical — is going to be increasingly done by machines. This process has already started, and is likely to accelerate over the next decade or two.

Where that leaves us humans in the economy is another question. It’s possible that a parallel process to the last 200 years may take place. Since industrialization, the physical abilities of people have become steadily less in-demand as machines have taken our places in manipulating the physical world, but through that process, the demand for our brains has only increased, causing more and more people to make a living by knowledge work as fewer and fewer make a living by physical work.

In a similar way, we may see less and less economic demand for our brains as analytical machines over the next decade or two, but to see more and more demand for our ability to relate to each other. In that kind of world, most of us will have jobs talking to each other about our needs and looking for ways to solve them — everyone a salesperson or therapist. In a world like that, brainpower will still be important, but more as a way to relate to each other than as a tool to solve complex puzzles. Incidentally, one theory of human evolution states that we evolved big brains as a result of social selection pressures that favored the survival and reproduction of those of us who were better at relating to other people — after a few hundred years of using our brains in more-analytical ways than they evolved for, we may return to a world in which our brains become useful primarily for the interpersonal tasks that they were honed by evolution to do.

But it’s also possible that AI will do such a good job at facilitating and even replacing human interactions and relationships in economic contexts (sales, customer support, etc.) that the number of humans needed for that kind of work will be far less than the number of people who will need to earn a living. In one sense, that’s a scary world — most of us will no longer be able to sustain a decent standard of living through the work we can do. But if that’s the way the world develops, the only politically feasible outcome I foresee, whether in democracies or dictatorships, is a welfare state where the work done to fulfill human needs is mostly done autonomously, and people are able to enjoy a high standard of living without working. Instead, most people would be able to spend their time on things that might not have paid the bills before, whether taking care of their families, competing at sports, traveling, painting, or any number of other things that mostly exist outside of the market economy today.

Which of those two scenarios comes to pass, or whether another one entirely will transpire, only time will tell. And whether the whittling-down of the need for knowledge work by humans takes half a decade or half a century will remain to be seen as well. But our need to work with our minds is about to go through the same winnowing process as our need to work with our bodies has experienced for the past 200 years — and it remains to be seen what the consequences will be.
December 11, 2023
How to talk to ChatGPT through Siri

Recently, I wrote the post How to: Talk to GPT-3 Through Siri, describing how to significantly upgrade Siri using OpenAI’s recent davinci-003 model.

The iOS shortcut in that post is already a big upgrade to Siri, but davinci-003 isn’t as capable as OpenAI’s latest models, which are also what is powering ChatGPT behind the scenes.

Until now, those models weren’t available for API access, but today, OpenAI opened API access to their gpt-3.5-turbo model, and I’ve updated the shortcut so you can now talk to the equivalent of ChatGPT directly through Siri.

gpt-3.5-turbo is about 10x cheaper to use than davinci-003, and seems to return better answers for some questions, but worse answers for others. Your call on which to use! I’ve switched to gpt-3.5-turbo, personally.

You can download the shortcut here – to start using it, you’ll just need to input your OpenAI API key. Then, talk to ChatGPT by saying “Hey Siri, GPT Mode” (or whatever you rename the shortcut to), then your question.

If you want to have Siri consistently read the responses out loud, it’s also best to change Siri’s settings in Settings->Accessibility->Siri->Spoken Responses to “Prefer Spoken Responses”:

If you want more detailed instructions on how to get this working, please see my previous post.

PS – No pressure, but if you’ve found this shortcut useful, I’d appreciate it if you buy me a coffee!

March 1, 2023
How To: Talk to GPT-3 through Siri
Note: it’s now possible to talk to a newer OpenAI model (gpt-3.5-turbo) through Siri – if you want to use the newer, and much cheaper, version, see my updated post here. The new version seems to do better on some questions but worse on others.

Like many others, I’ve been incredibly impressed with OpenAI’s ChatGPT and how far language models have come since I was working on natural language processing research a few years ago.

But, also like many others, I’ve been regularly frustrated with Apple’s Siri and how it often fails to give useful answers to even the most basic of questions. This week, after a few too many unsatisfying Siri answers in a row, I started to wonder if it would be possible to solve the problem once and for all by querying ChatGPT directly through Siri.

It turned out that while the ChatGPT API isn’t officially available yet to pose questions to programmatically, the recent GPT-3 model is, and it’s also very powerful.

It also turned out that several people have written Siri shortcuts to interface with GPT-3, but I had a couple of issues with them when I tried them out:
- Siri wouldn’t always read the answers out loud
- The answers often started with a stray question mark, which was distracting when Siri did read the answer out loud, starting with “Question mark…”
I tried fixing the first issue by making a shortcut that included steps to explicitly read the answer out loud, but it turned out that a far simpler solution was just to change Siri’s settings in Settings->Accessibility->Siri->Spoken Responses to “Prefer Spoken Responses”:

Make sure you turn on “Prefer Spoken Responses” if you want to use this shortcut and have Siri read the answers out loud to you!

The stray leading question marks were an easier fix – I just modified an existing shortcut with a step that strips them out.

With those two fixes, the shortcut started working very seamlessly – I can now tell my phone “Hey Siri, GPT Mode”, then a question, and quickly get a response from GPT-3 read back to me by Siri.

You can download the Siri shortcut here to add it to your phone (and you can see the original shortcut that I modified it from here).

The shortcut itself is free to use; you’ll just need to create an OpenAI account here, create an OpenAI API key and paste it into the text field in the shortcut that says “Replace this with your OpenAI API key!” You can see more detailed instructions for how to do this in this article explaining how to use a similar shortcut. Right now, OpenAI is offering three months of API credits for free when you sign up for an account, and they don’t charge much once you run out of your free credits.

Once you’ve installed the shortcut onto your phone and pasted your OpenAI API key into it, it’s ready to use! You can use it under the name I gave it (“GPT Mode”) or rename it to your taste, to anything that doesn’t conflict with Siri’s existing commands.

To use the shortcut, say “Hey Siri”, then “GPT Mode” (or whatever you renamed the shortcut to), then say whatever you want to ask GPT-3. Siri will then, after a delay, display the answer and read it out loud.

Much like with ChatGPT, the answers aren’t always right, but they usually are, and I’ve found that the ability to ask my phone complex questions and get reasonable answers back is incredibly useful as long as I keep the imperfections of GPT-3 in mind.

Hopefully, the day is coming soon when Siri will have this kind of functionality built in natively! In the meantime, there’s this Siri shortcut.

PS – No pressure, but if you’ve found this shortcut useful, I’d appreciate it if you buy me a coffee!
February 3, 2023