Table of contents
Introduction
Honestly, ever since Whisper first came out and people realized you could do proper and really good voice translation, there has been no shortage of audio apps on the web. They all claim to do the same thing: think out loud, speak whatever you want for as long as you want, and eventually get a clean result, whatever "clean" might mean.
None of them have particularly caught my eye, except for this app called Voicenotes.
So, a few months ago I said this.
I usually avoid reviewing or using AI apps until they've been out for six to ten months. Many either go defunct or get acquired, causing data loss and changes in UX. They also claim to be in beta and frequently update, making early reviews seem unfair. However, this app appeared polished at launch, and as you can see from the title, I have mildly positive reviews for it. With its lifetime offer running, it seemed worth reviewing now.
Onboarding
The first thing it gets right is that when you open the app, you can directly record a note. It displays all the basic features right there. You can record notes, create summaries, or modify specific notes or groups of notes. We'll discuss that later. Additionally, you can use the AI features, which include intelligent suggestions. However, some of these points are a bit vague and don't really show up in the initial walkthrough or usage of the app.
Press the record button and start speaking. When done, press stop. A new pop-up with a play button will appear, indicating your voice is being transcribed. The interface is clean and free of intrusive pop-ups or CTAs, something I'd highly recommend to other apps.
Using the app
Code-switched recordings
Here's something to know about me: I'm mostly bilingual and kind of trilingual. I primarily think in English and Hindi. For work, I think in English, but when reminiscing about my life or day, I often switch to Hindi. This is called code-switching.
There's something unique about Hindi and English compared to, say, French and English. There's a secondary language called Hinglish, a romanization of Hindi using the Latin alphabet. Hinglish is even available on the Facebook app. It's a blend of Hindi and English, commonly used for chatting online because English keyboards are more prevalent.
Example: I drink water in Hindi is मैं पानी पीता हूँ which in Hinglish is written as main paani peeta/peeti hoon
With that context, you'll understand how I'm testing this app. You cannot have separate workspaces here. Since I can't separate my personal Hinglish thoughts into a different workspace, I tried 3 code-switched recordings. Each recording is transcribed differently, making it messier than I had hoped.
In one recording, everything is transcribed to Hindi, including my English speech and English words in Devanagari characters. In the second recording, there's a mix of Hindi and English characters, but it inconsistently uses English characters for English words, which is odd and not very useful. In the third recording, everything I said in Hindi was translated into English.
As a speech and language researcher working with LLMs and speech models, I know this isn't an issue with VoiceNotes specifically but inherent to Whisper modes. However, there are ways around it. Users could choose between languages, selecting up to three for transcription. The main option they’ve provided is to always have transcriptions in a certain languageShow information for the linked content , which is often messy and inaccurate, contrary to the claims in the FAQ.
Note Features
Overview
Clicking on a note gives you around 10 options. You can add to it as a thread, a recently released feature, or edit the transcript. Tagging notes, which we'll discuss later, is also possible. You can create a version with specific instructions or use default versions. Sharing the note publicly makes it appear in the sidebar for quick access. Additionally, you can regenerate the title and transcript since the audio is stored and can be replayed.
Add a threaded note
The app now lets you add to a thread, linking one note to another, but I'm unsure of its actual functionality beyond seeing them in a delve view, like "Oh, I was thinking about X and Y." I'm uncertain how that helps solve a problem. One specific thing to note is Voicenotes' Related Notes feature, which we'll discuss later. It seems to either malfunction when adding to a thread or inherit the parent’s Related Notes.
The "Add to thread" feature has a bug where any new note shows a popup saying "Adding to XYZ thread," even if it's not being added. You need to click the "X" button to close out of the popup.
The not-good tagging system
Remember how I mentioned you can apply tags to these notes? It's true—you can define your own tags, and they show up in the sidebar, as you can see in the screenshot here. However, tags are probably the weakest point of this app experience.
The founders claim tags are obsolete because all necessary information is in the transcription modality. I disagree. People still value organization, glanceable views, and not having to chat with an AI. An AI-run system struggles with tens of thousands of notes. With tags, you know you're accessing all the information you've created in the app.
My disappointment with tags in this application stems from two main issues: the lack of auto-tagging based on predefined tags and the inability to scope based on tags. You can't click on a tag to ask questions solely based on that tag or a collection of tags, nor can you perform boolean queries on tags for searching or chatting. This feature would integrate with the AI mindset while giving users control over filtering. It would also mitigate the absence of a workspace component, allowing tags for personal, work, or multi-use purposes. Nested tags would enhance organization, especially if added automatically, while still letting you decide the scope of tags for different situations.
Create something from your note
In the video below, I'll demonstrate the create option's capabilities: generating summaries, extracting main points, a handy copy feature, creating tweets, and following custom instructions.
The custom instructions option isn't highlighted in the note’s create menu. You need to go to the toolbar at the bottom to find it, which isn't obvious. I only discovered you could add custom options by toggling all the buttons during this review.
Using the bottom toolbar to create something is a cool feature that lets you select multiple notes at once, perfect for creating a to-do list, for example. However, you'll notice the absence of filtering and sorting options due to the lack of a tag ecosystem, which I sorely missed during testing.
Search
The search functionality is not semantic, which is disappointing, especially since you already have representations stored for related notes. I expected a hybrid search mechanism to bring up related notes. For example, when I searched for "headache," a note mentioning "migraine" didn't appear in the results. I was expecting some kind of semantic search or an option to choose between keyword versus semantic search. The default expected option is for people to use "Ask My AI," but it doesn't feel like a UX I want to use. I'm still accustomed to using search boxes and entering short queries rather than formulating questions for a chatbot.
AI Features
Related Notes
The related note option is quick, which is great. One thing to test over time is how well it works with over 500 or 600 notes, especially since you can't use tags to refine the model's understanding. Customization for defining related notes would be useful, addressing either a customization issue or a general problem with the related note architecture.
I feel the app relies too much on basic cosine similarity over entire content or asking a smaller model to pick relations, neglecting hybrid or more nuanced matches. Semantic similarity isn't effective for longer texts, which they should improve. Additionally, the related notes feature fails with bilingual notes, not matching or calculating relations correctly. They should consider using an English vector representation for all notes to enhance this feature. Hopefully, this will improve in the coming months.
Ask Me Something
The "Ask me something" call to action at the top of the navigation toolbar is meant for journaling, where the app asks you a question to answer as a recorded note. However, this isn't clear within the app. It would be better to replace it with a pop-up prompt like "Suggest something for me to talk about." The current phrasing is too similar to asking the app something and may not be perceived as the app asking the user a question.
Ask my AI
The Ask AI feature is the main selling point, similar to Google Notes but with AI capabilities. Unlike Notion's AI, you can dictate notes using audio instead of typing. It works well for English notes but struggles with bilingual or code-switched notes, posing a significant issue for tasks involving two languages simultaneously.
When the "Ask My AI" feature works, it's cool because it shows the sources it used. However, the sources are often scattered, highlighting weaknesses in the related notes feature and the overall retrieval and re-ranking processes. These basic issues need fixing to make the app more viable, contributing to my negative experience.
If you ask a question through voice, you have 20 seconds, and the answer is spoken back to you, which is a cool interaction I'd like to see in more apps. If you type something, you read it in text; if you speak something, you might just want to hear it back, and I really like that. However, the answers are pretty generic and cover many irrelevant sources.
The Ask AI feature includes a chat history similar to Notion's Ask AI. While this history isn't searchable using AI or semantic search, which is a bit of a miss, it does allow you to revisit past conversations. One notable omission is the inability to save the model's response as a note in a database, especially since notes are meant to be voice notes. It would be beneficial to add context to a thread or use the new "add to a thread" feature to include text notes, similar to how Create adds content to a note. I wish you could also attach AI responses to those notes.
Settings
The language setting option lets you choose a default language for transcription.
The setting option lets you modify AI responses. While I'm unsure if it affects suggested questions, the “what would you like your AI to know about you” option influences persona-based answers, like “Suggest me something to do for the week.” The "names to remember" feature improves transcription accuracy by recognizing specific terminology or names, as shown in the screenshot.
Commentary on FAQs and Release Notes
FAQs
The FAQs offer valuable insights into how the founders envision users engaging with the app.
The shorter, the better. Shorter notes tend to focus on one subject and are often easier and more enjoyable to rediscover in the future.
See here
They say "the shorter, the better," which confuses me, especially since one benefit of subscribing or opting for a believer plan is the ability to record notes longer than one minute. I would expect them to appreciate or have a way to process longer notes rather than condensing them into a single vector.
Are my notes really private?
- Yes. My wife and I use the app for our most personal notes, so we designed the platform to be private knowing we'll have to hire team members in the future.
- All notes are secured on the cloud, not used for AI training, and only retrieved upon authenticated user requests.
I'm not a fan of how they've written this privacy policy. It tries to be conversational, but privacy, especially with AI, needs specific addressing. They ensure they won't train AI on these systems, but they should mention that this data isn't processed using a secure cloud-hosted whisper. Instead, it's sent to OpenAI for transcription. They later discuss which models they use, so the data is transmitted to those models, which they haven't included in their privacy promises.
What AI models do you use?We use the best-in-class (and therefore most expensive) models such as GPT-4 and Claude Opus for most tasks. We sometimes use a faster model like Claude Haiku for features such as Related Notes which do not require high intelligence.
See here
This highlights why the related note features are weak. Claude Haiku isn't effective for matching across multiple long notes, especially in different languages. They'd need a Cartesian product of all notes to determine related metrics. Using Gemini Flash or a vector store with better embedding models and a hybrid search would be more beneficial. Or decomposing into important topics and adding auto tags would significantly improve the related note section.
How does Voicenotes compare with apps like X and Y?
- The honest answer is that we don’t know. I’m sure there are some great products for specific use-cases, but Aleesha and I set out to build Voicenotes because we couldn’t find anything that we would use ourselves. And our use-case was very simple: take quick notes and make sense of it when we need them with minimal effort.
- Our goal is to build the most delightful product for our daily use.
I don't like this answer. Apps should always be compared with others in the same genre to help users decide why they should choose yours. Highlight the unique features your app offers. Sindre does this well, using FAQs to address competitors directly. For example, they can say, "Our competitor is AudioPen, which doesn't offer semantic search and is just for transformation," or "Your Android speech recorder doesn't do XYZ or have a state-of-the-art whisper model." While it's challenging to keep up with evolving software, you're already tracking competitors, so make this information visible to your users.
Release Notes
They've made some cool updates since the first release. The UI is better, and they finally added dark mode, so you don't blind your eyes at night. The ability to pause recordings is nice. However, I don't see the point of keyboard shortcuts on the web app; I'd prefer a menu bar app since you still have to click on the webpage to use keybindings. They added custom instructions in April, which is a nice touch. Related notes were released in May, and I hope this feature gets more refined as the app matures.
They released Android and iOS apps with an offline feature, but it's unreliable. When I used them, they lost multiple voice recordings and couldn't transcribe them, no matter how many times I tried. This feature feels very much in the alpha phase and isn't worth buying the app for.
Comparison to Other Apps
AudioNote apps are dime a dozen, and your phone likely has built-in functionality for this. Major non-native recording apps like AudioPen or TalkNote focus on quickly capturing thoughts and transforming them into readable formats like text blogs, quotes, or tweets. VoiceNotes stands out with its integration of personal knowledge management (PKM) features, such as related notes and AskMyAI, offering enhanced audio notes with an inbuilt retrieval system.
The main difference compared to general recording apps like Google Recorder is that Voicebnotes lacks multi-speaker diarization options, so you can't have speaker labels. This is fine since it's designed for personal use rather than multi-speaker scenarios. It doesn't handle transcriptions well, especially for closed captions, so you won't get accurate phrases like "Oh, noise about this." Scrolling through a transcription can be messy, which ties back to the founders' preference for shorter pieces. However, I believe transcriptions should be as long as you need, as speaking out loud should capture all your thoughts rather than limiting them.
While Google Recorder struggles with accents and code-switching, Whisper handles them better. Google Recorder also has a default language limitation, whereas VoiceMemos currently lacks transcription until the new iOS version in September 2024. VoiceMemos' keyword search feature is a bit unreliable, but Google Recorder excels in this area. These are the main differences you'll notice as compared to the usual recording apps on your phone. Additionally, the PKM-like retrieval in Voicenotes is a nice bonus.
Conclusion
Even after two months, I keep coming back to this app because I genuinely enjoy using it. The UX is fantastic. I'm tempted to buy the believer plan, but I'm unsure if I'd use it enough to justify the $50 cost. I love how easy it is to use and the helpful features throughout the platform, like sources in AskMyAI and specific instruction for transformations.
I feel it's not for me or many others. If your needs are simple and you do well with keyword-based searches, the Google recorder app or voice memos will suffice. However, if you seek something more powerful and enhanced, it lacks enough features for me to recommend it widely.
For example, there's no auto-tagging, and the related notes feature isn't very useful. I tested it with specific paper readings and tried to find notes or audio notes. It's quite generic, so you might be better off using a recording app on your phone, exporting the transcriptions to Notion or Google Drive, and then using NotebookLM or Gemini with extensions to search or reason over your notes.
This app nails the UX but leaves me uncertain about its overall appeal. However, it’s a hit on Twitter, especially among those who write newsletters or tweet to build communities. For journaling or research ideation, it feels both underpowered and overpowered, yet still not enough for me. However, it hits all the right spots in taste and design, which is a bummer because I love the design. I'm not usually a design person, but the UX feels snappy and well thought out for this application.
I do app reviews for fun and would never put them behind a paywall. If this helped you in anyway, please consider buying me a coffee here: