Knowing about Knowing, for Fun and Profit
In an information environment that is somewhere between actively hostile to indifferent to providing you with knowledge that is true and useful, you have to take a defensive posture.
This is the first post in my new series/theme about knowledge, knowledge management, and related topics.
In my previous post, I introduced the new reading and writing theme that I'm taking on over the coming year-ish. I've already read a couple of books related to this theme (Knowing What We Know by Simon Winchester and We Have No Idea by Jorge Cham and Daniel Whitesun). This post is the first of around half a dozen that I have planned for the new series. In it, I want to provide motivation for engaging with the rest of the series; basically, what can be some of the practical benefits of reflecting on how we know things and what we can do with knowledge? Well, as I tried to make clear in the segue from the Industry series to this one, acquiring, disseminating, and applying knowledge is integral in our ability to make stuff. This is true not only at the individual level but at the societal level too: from patents to periodicals, there's a lot of social technology aimed at promoting knowledge transfer. In this post, I aim to explore some other practical considerations around Knowing: ways to learn things, the current information landscape that we have to deal with, and how to keep track of knowledge.
Within each section below, you'll find some brainstorming on these practical aspects of Knowing. This is the beginning of exploring this topic, not the last word. I hope it sparks some helpful ideas of your own, and illustrates why I felt like this was a worthwhile and timely theme to take on.
Acquiring Knowledge
This series is about taking a deliberate approach to how we think about knowledge. Something that is very empowering as an adult who is no longer in formal education (modulo workplace trainings and maybe an occasional conference) is taking a deliberate approach to learning. Therefore learning seems like a great place to start this post.
When setting out to learn something, you should consider what kinds of information are available about it, as well as what is your most effective learning style (visual, auditory/verbal, hands-on) for that sort of thing.
Reading is often a good place to start, because there's more information available (and findable in a search) in written form than pretty much any other format. A lecture or demonstration might be better but those are few and far between compared to books and blogs and such. Not only books: browsing a magazine or forum can be a great starting point for getting into new hobbies. While reading about a subject, if a source gets mentioned in multiple things you read, try to read it too. You also shouldn't rely on just one book/source; consulting a few different ones can give you a sense of the points of agreement and active debate within a field.
Here are a couple of Tweets that I think are insightful about "doing the reading":
I've found throughout my entire life that two or three hours of intense research in a domain isn't enough to displace an expert – I couldn't step into his job tomorrow – but it's more than enough to get the correct answer about a narrow, specific problem you're interested in
if you're deliberate about "doing the reading", whatever that means for any domain, within a year it's possible to be in the top ~5% in the scene. This is because 95% of people don’t do the reading
One thing that has become very useful for learning things (especially skills with a physical component) in recent years is YouTube. Samo Burja (I'm working on a post about his ideas as part of this series, by the way) has a good article about this:
Before video became available at scale, tacit knowledge had to be transmitted in person, so that the learner could closely observe the knowledge in action and learn in real time — skilled metalworking, for example, is impossible to teach from a textbook. Because of this intensely local nature, it presents a uniquely strong succession problem: if a master woodworker fails to transmit his tacit knowledge to the few apprentices in his shop, the knowledge is lost forever, even if he’s written books about it. Further, tacit knowledge serves as an obstacle to centralization, as its local transmission provides an advantage for decentralized players that can’t be replicated by a central authority. The center cannot appropriate what it cannot access: there will never be a state monopoly on plumbing or dentistry, for example.
Another approach to learning new skills is simply to pick an entry-level project and dive in. You'll make mistakes, for sure, but when you consult videos or books later on for advice, the instructions will make a lot more sense because you'll be able to relate them to personal experience. This is the approach I took when trying to learn how to use fibreglass, for example. Another way to provide conceptual anchors for learning things is by travelling, as Tyler Cowen has pointed out.
Getting into more formal and higher-effort forms of learning, you can try attending lectures and presentations about the topic of interest. Or seek out mentoring/apprenticeship relationships. Discussion groups (ideally in-person, but online can be an option too) are pretty great too. The famous Oxford Tutorial is a time-tested format for high-quality education and it incorporates these features:
- Weekly
- 1-4 students plus the tutor
- "Tutorials normally last about an hour, during which the tutor will give you feedback on prepared work on a particular topic [which was set in the previous tutorial]."
- In between tutorials, students do readings (assigned list, plus finding other relevant material in a library) and write an essay of ~5-10 pages
- For STEM subjects there are problem sets of equivalent length
- The tutorial involves presenting and defending your position and getting feedback and constructive criticism
- Aside from tutorials, students also have seminars, lectures, and labs
Finally, it's frequently remarked that one of the best ways to really learn something is to teach it to someone else. Closely related to this is writing about it. Paul Graham has an essay expressing concern that the rise of LLMs will lead to fewer people writing; he sees writing as being intricately connected to thinking deeply and clearly:
So a world divided into writes and write-nots is more dangerous than it sounds. It will be a world of thinks and think-nots. I know which half I want to be in, and I bet you do too.
This situation is not unprecedented. In preindustrial times most people's jobs made them strong. Now if you want to be strong, you work out. So there are still strong people, but only those who choose to be.
It will be the same with writing. There will still be smart people, but only those who choose to be.
Certainly one of my major reasons for blogging is that I see this benefit. Writing helps me learn about and think through various topics I'm interested in.
The Current Information Landscape
There are two features of the current information landscape that I want to draw your attention to. The first is the sheer volume of information we have access to. An unprecedented firehose at our finger-tips of facts, claims, trivia, news, debate, and entertainment; text, pictures, audio, video. The second is that almost all of this content comes from sources and/or across channels that are out to get you—they want your attention, your engagement, to get you to buy something or at least watch an ad, to recruit you to their ideology, to turn you into a vector for their ideas. This is not unprecedented; there's a long history of newspapers having political alignments or even affiliations and of authors writing books and articles to persuade not merely inform—I wouldn't even call this sort of thing illegitimate. However, because of the first feature, this is all happening at a very large scale. And with very efficient tools and often without deliberate human action (e.g. automated A/B testing and algorithmic fine-tuning of feeds that often doesn't care about the actual content, just about optimizing certain engagement metrics).
(When you read my blog, am I out to get you? Well, as described in the other sections of this post, writing a blog is part of my process for learning and for knowledge management, so I'm doing it in part out of a personal motivation and having readers is a bonus. I don't monetize this blog in any way: not ads, nor subscribers, nor affiliate links. But I do hope my writing is persuasive about things I believe and perspectives I hold, along with being informative, of course. So you shouldn't accept everything you read here uncritically, but you should be able to expect it to be free of clickbait and AI slop.)
In an information environment that is somewhere between actively hostile to indifferent to providing you with knowledge that is true and useful, you have to take a defensive posture. But I believe you still can extract signal from the noise. In this section I'll share some ideas on how to do so (both of my own and links to advice that I've come across) along with discussion and examples. I believe this is one of the crucial issues of our day.
First, the firehose. This video about what it would take to print out Wikipedia is a good illustration:
Some pages on Wikipedia that are good examples of the vastness of information that is readily accessible to us now are the Outline of Knowledge and List of Lists of Lists. And of course Wikipedia is just one website; there are plenty of other sources of information both online and offline. One that has exploded in popularity over the past few years is generative AI, especially large language models (LLMs). People are increasingly getting their information from LLMs. This is very much in my mind as I write this series. LLMs aren't a bad source of information, but they represent a big change and need to be used with an awareness of their strengths and weaknesses.
It's still early days for this new technology and I don't think anyone really knows the full impact it will have on our relationship with information and on society more broadly. Revolutions in communication technologies can have a huge impact, so generative AI could plausibly be as much of a disruption to the status quo as the printing press or television. Here are a few things I think we should watch out for:
- AI hallucinations/confabulations. A good example of this was a newspaper that published a list of books to read this summer, except only 5 out of 15 were real.
- Deepfakes: AI-related deceptions include deliberate false output along with mistakes.
- Whoever can alter the prompt can alter the output (example).
- Feedback loops where LLM-output starts getting scraped and used in future sets of training data.
- "Segmentation" and the "semantic apocalypse" are side effects worth pondering.
- On the positive side, they make searching for information much more powerful. Personally, I've recently been able to find old half-remembered cartoons and quotes from books that I was able to describe in a fuzzy way but not with the precision that conventional search engines required. You do need to know that something exists, but LLMs enable retrieving it with much more tenuous keys.
Aside from the new element of LLMs, another notable feature of the current information landscape is that it exists in the context of a very polarized political landscape. So what are some good strategies for gleaning true and useful information from media that is trying hard to score points for their side?
I think this article about Bounded Distrust and a series of responses and follow-ups (and responses to follow-ups) to it are a good place to start. The core idea—with debate on how to operationalize it and on where the lines now are—is that while most of the media is trying to push a narrative, they still play by rules. If you have a good sense of what those rules are, you can still learn something from biased news stories.
The point is, there are rules to the "being a biased media source" game. There are lines you can cross, and all that will happen is a bunch of people who complain about you all the time anyway will complain about you more. And there are other lines you don't cross, or else you'll be the center of a giant scandal and maybe get shut down. I don't want to claim those lines are objectively reasonable. But we all know where they are.
As the third article linked in the previous paragraph clarifies,
The point is: the media rarely lies explicitly and directly. Reporters rarely say specific things they know to be false. When the media misinforms people, it does so by misinterpreting things, excluding context, or signal-boosting some events while ignoring others
The general approach is that things can be slanted or framed in a misleading way, but statements that can be checked won't be bald-faced lies. For example, a common phrasing like "an anonymous source claimed {X}" means that someone said {X} but is indifferent as to whether {X} is true. An attributed quotation is better, in that there's someone who can publicly confirm or deny whether they said it, but still can't be taken to the bank as to whether the contents of the quotation are true; journalists have a lot of degrees of freedom in composing their stories, and who they share quotes from is an important one. Basic objective details, however (who, what, where, when) have much less wiggle-room. There's famously two (or more!) sides to each story. The distrust part of "bounded distrust" means that you don't have to accept the side that a news article subtly (or not so subtly) promotes in how they present it; the bounded part of "bounded distrust" means that the event in question probably did happen and did involve any named individuals.
Here are some other heuristics that I try to use:
- An admission against interest (where a source acknowledges facts that are inconvenient for their side) is more likely to be true. Sometimes when I'm not sure if a story is fake, I look to see if it has been reported in an outlet with the opposite partisan alignment to where I heard it first.
- Looking for multiple independent sources is useful regardless of alignment, as long as they're truly independent from each other, not merely multiple repeats of the same claim from Twitter or an AP wire reprint.
- The concept of "Skin in the game". I've noticed that the financial press (Bloomberg, Wall Street Journal, etc.) is often a notch above other segments of the media. This is because their readers are often making high-dollar-value decisions on the basis of the information they're taking in, and thus need accurate information more than they need their own biases flattered.
- Going to international sources provides a different bias, maybe orthogonal to political dividing lines in your own culture. For example, we like watching Palki Sharma sometimes.
- Be aware of Gell-Mann Amnesia.
Another useful tool for making sense of a firehouse of information of varying levels of reliability/truthfulness is keeping track of predictions. Someone who is consistently wrong in their predictions or badly wrong on important questions either doesn't have good sources of information themselves, or has decided to be a poor source of information for you. Applying this in a practical way has quite a lag period, but it's better than nothing. One example that's still a work in progress is pundits who've said that if Trump won the election it would be the end of American democracy. If elections in 2026 and 2028 occur normally, you should take these people less seriously in the future; conversely, if those elections are cancelled or have a lot of irregularities, you should take them more seriously.
Here's another example (I had originally written this as a forum post, but I think it's a good example of checking how predictions panned out and judging the credibility of the source based on that): Back in 2020 I was staying in an AirBnB in Nova Scotia and there were some books there for guests to read. One of them was The Maritime Book of Climate Change, by Richard Zurawski. It was published in 2008, and made some predictions for 10 years, 25 years, and more distant times in the future. Since the first milestone has already passed (and we're 2/3rds of the way to the second), I thought it would be handy to see how the predictions panned out. The author considered two scenarios: Good and Bad/Ugly. In the good scenario, he expected that CO2 would have risen to 430 ppm by 2018, and that the sea level would have risen by 12 to 18 cm. The rate of ice melting in Greenland would have risen to 1,000 cubic km annually (quintupled from its 2008 rate) and the northeastern seaboard of North America would be, paradoxically, cooling as the Gulf Stream grew sluggish. He also makes qualitative predictions, like more frequent forest fires and pressure to accept climate refugees. 25 years out (i.e. 2033), but still in the Good scenario, he predicts the sea level will have risen 30 - 60 cm and, "Charlottetown and other cities with softer, low-incline shores, have now introduced a policy of resettlement." Also, atmospheric CO2 is now at 750 ppm under this scenario. In the Bad/Ugly scenario, the author thought 2018 CO2 levels would have reached 450 ppm and the ocean would have risen 30 cm from 2008 levels. Outside of eastern Canada, the Great Lakes region would be baking and the Great Lakes would be at their lowest levels in recorded history. In 2033 in the Bad/Ugly scenario, there are 10 billion people in the world and conflict over diminishing resources is intensifying. Russia has the rest of Europe at its mercy due to their dependence on it for food and energy. The global average temperature has risen 5 degrees Celsius and geoengineering is starting to look pretty good. So what actually happened in 2018? Atmospheric CO2 levels were at 408 ppm and the sea level in the Maritimes was around 10 cm higher (based on the Bedford NS gauge). The latest numbers (with 2033 less than a decade away) are 421 ppm (as of 2023) and around a 15 cm sea level increase (as of 2024). I haven't checked how typical Mr. Zurawski's predictions from 2008 were. He did state that he believed the IPCC report was watered down due to political pressure, though I don't think he would have been alone in that view in the discourse of the time. But doing this exercise makes people with less alarmist views look like they're more worth listening to going forward.
Following AI and extracting useful information from biased media, I want to discuss misinformation and censorship. I'll start with a couple of examples from abroad.
The first one is "Baghdad Bob" proclaiming Iraqi victories, except they occurred ever closer to the heart of Baghdad. You can see that censorship leads to worse decision-making (emphasis added):
US intelligence analysts later concluded that Al-Sahhaf confidently made false statements because he genuinely believed in what he was saying. As the American forces approached Baghdad, the Iraqi army falsely reported that they had successfully counterattacked US forces, destroying numerous tanks and killing hundreds of American troops. Army Col. Steve Boltz, the deputy chief of intelligence for V Corps, expressed that they held the belief that Al-Sahhaf sincerely held the information he reported to be true. Boltz theorized that because Saddam's regime was known for frequently punishing those who delivered bad news, military officers would fabricate reports about the battlefield situation. This systemic self-deception within the Iraqi hierarchy led to a surprising lack of awareness when the Americans entered the capital, with some captured Iraqi officers later bewilderingly admitting that they had no idea that the US forces had been so close.
In China, they apparently have internal reports called neican/NAY-tsahn in an effort to keep the leadership better informed than citizens. However, a culture of censorship leads to even the internal party communication channels becoming filled with yes-men and no longer as informative:
Chinese journalists and researchers file secret bulletins to top officials, ensuring they get the information needed to govern, even when it’s censored.
But this internal system is struggling to give frank assessments as Chinese leader Xi Jinping consolidates his power, making it risky for anyone to directly question the party line even in confidential reports ...
the risk is ill-informed decision-making with less feedback from below, on everything from China’s stance on Russia’s invasion of Ukraine to its approach to the coronavirus.
“Powerful leaders become hostages,” said Dali Yang, an expert on Chinese politics at the University of Chicago. “They actually are living in cocoons: protected, but also shielded from information that they should be open to.”
To continue on the discussion of misinformation and censorship, I'll share something I wrote on Facebook (lightly edited here):
Facebook has announced it's rolling back some censorship it was doing. This blockage of outbound links (to be clear, I don't blame FB for this; they are responding logically to a dumb government policy) isn't exactly censorship (it's content-neutral and only affects a certain type of post, i.e. those with outbound links to anything that counts as "news") but it has a similar effect. It also has similar work-arounds: obliquely referencing instead of directly citing, sharing articles as screenshots, etc. We're at a critical juncture as a country. On issues such as our relationship with the United States, but also housing, immigration, defence, and more, what happens in 2025 could shape things for a long time. So this would be a great time for robust conversations at all levels of society about what we want the future to look like. Instead, haha nope, you can't share news articles and related links on one of the most widely-used communications platforms. Now, no one owes you a platform or an audience (and I have a blog that I pay for the hosting of in part because I think it's valuable to be the customer and not the product on at least some of one's communication channels) but I think Canadians overall are poorly served when the public square or its digital progeny are overly constrained in what you can say. Especially in times of crisis or upheaval.
At the start of the year, I referred to the "Scylla and Charybdis of misinformation and censorship". This allusion was deliberately chosen. In the Odyssey, these are two hazards of navigation in a strait that is too narrow to avoid them both. Scylla is a sea monster that eats sailors and Charybdis is a whirlpool that can swallow whole ships. Odysseus chooses to sail closer to Scylla; he loses six of his men, but the rest of them make it through. Similarly, misinformation can eat people alive (I mean this almost literally). People have had their power disconnected because they believed "decrees" from "Queen" Romana. Others have gotten mental illnesses via social contagion or iatrogenic vectors. Still others have sterilized themselves thinking it will fight climate change or fascism or whatever. Nonetheless, while we can and do lose people to misinformation, censorship will sink the whole ship. No system can stay on track for long without feedback. When information channels are too constrained they cease to provide useful feedback. Democracy and free speech are messy but they have a demonstrated ability to out-survive authoritarianisms, and I believe this is a big part of the reason why. When the latter face new challenges, they don't receive useful feedback about {citizen sentiment, conditions on the ground, outside-the-box ideas for solutions, etc.} and eventually the system gets so imbalanced it collapses.
I have a few more assorted thoughts to close out this section. These aren't a conclusion, as this series is just getting started and I expect to think and write more on this important theme.
- I think we'll have to get back to treating trustworthy eyewitnesses as the gold standard of evidence (over pictures or videos, that will be susceptible to fakery).
- Consider the aphorism that, "If you are not paying for it, you're not the customer; you're the product" in the communication channels you use.
- Epistemology is a contact sport now.
- Hard copies (or at least off-line copies) can't be retroactively altered.
Finally, there are times when you just have to accept that you'll have a fuzzy knowledge (at best!) of what's going on. When it's about things that you don't have to make any decisions about, this is fine. The nice thing is that your ability to cross-check/verify new information is strongest in issues or events that are closest to you. So focus on the integrity of information that affects decisions you need to make, and worry less about the rest. For instance, you'll never know what's going on in a war on another continent where you don't speak the language of any of the belligerents with the same clarity that you can—with a little effort—know about the outcome of a bylaw vote in your own city. And that's okay. The latter is something that will probably affect your life more, and is certainly more within your ability to influence. For things that are far away and that you can't affect the outcome of one whit nor even really validate the truth of what you're hearing, I think it's healthy to hold your opinions with low or moderate confidence.
Knowledge Management
After you've managed to acquire knowledge, sifting through the chaff in the current information landscape along the way, you next need a way to keep track of it so you can retrieve or apply it as needed.
I've started using personal knowledge management (PKM) software, OneNote at work and Obsidian at home. I haven't yet gotten to the point of having copious links between notes, which is supposed to be where these tools really shine. But even having your notes be searchable so easily is helpful. Of course, writing a blog has a lot of these features too. I have frequently gone back to old blog posts when I'm trying to remember a good quote from a book I've read or find some estimates from Vaclav Smil for some back of the envelope calculations.
Offline, there are also lots of ways to keep track of knowledge and ideas. Personally, I keep notes in several different ways; it would be more organized if I had a single unified system, but it takes a lot of discipline and mental overhead to set something like that up and stick with it, and the sort-search tradeoff discussed below suggests that striving for perfect organization can be overkill. My mix looks something like this:
- A notebook that I use to keep track of goals, plans, and notable events and occasions. This is mostly chronological, aside from a section at the front with longer-term goals and ideas. It's handy to flip back through when I'm trying to remember when something happened.
- Other notebooks that I've taken on trips, or to conferences I've attended, or for notes from papers I read for continuing education/professional development.
- Single sheets of paper with notes from lunch-and-learns or other things I wanted to take notes about (or just think about and jot down some ideas). These often get stored in a mail tray organizer, but sometimes get sorted into binders (binders allow for after-the-fact topical sorting whereas notebooks are almost always in chronological order).
- Single sheets of paper with short-term (up to a ~year) plans (e.g. an outline for a writing series like this one on my blog). I try to purge these annually once they've served their purpose.
- Notes written in the margins of books. Also highlights and margin notes in scientific papers I've printed off to read.
- A filing cabinet. This is used more to organize paper documents I receive (e.g. information about loyalty accounts, quotes for work on my house, etc.) than notes I write myself, but does have some room in it for the latter.
These paper-based methods of knowledge management have some different strengths and weaknesses compared to digital ones. I find things like hand writing and sticky tabs at a certain place in a book (that itself sits at a certain place on your bookshelf) stick in your memory better. Digital notes have the advantage that they can be searched more easily, and can also have relevant segments copied with ease into something you're writing. They can also be filed (or tagged, or linked) in more than one place.
While I find highlights in ebooks and notes that are in a text file or PKM software are more streamlined with my writing workflow, there's definitely an aesthetic appeal to knowledge management from the physical organization of a home bookshelf/library. This video has a really nice example:
In our modern world that is going more and more digital, I find it enlightening to think about older ways of organizing knowledge via the physical arrangement of stuff. Libraries and filing cabinets are the most obvious examples, but a lot of analog tools/instruments embodied a lot of knowledge in their design. A smartphone is the opposite of this: it's an omni-device but all the various functions require using the right software/app. Analog measurement and calculating tools, such as scale balances, slide rules, or sundials, do only one thing, in contrast, but they physically encode knowledge about they quantities they deal with. And they build intuition for those quantities as a natural side-effect of learning to use them.
Digital organization of data started off following the same format as organization of records in the physical world: files inside of folders. However, over time these digital systems took more advantage of not having the constraints of a physical filing cabinet and have embraced features like tagging and linking. Now it's becoming possible to do fuzzy AI-enabled searches even on your own files; as long as you can remember that something exists, it's getting easier to find it even if it isn't well organized. I appreciated this description of the tradeoff between sorting and searching from Algorithms to Live By (and also appreciated their tip about applying this to sheets of paper by keeping a single stack in "last touched" order) in my mini-review of that book:
Sorting: it "involves steep economies of scale". The Mergesort algorithm is discussed. Something I found interesting and useful from this chapter was the Sort-search tradeoff: "err on the side of messiness. Sorting something you will never search is a complete waste"; An application of this is that searching beats sorting for email ("as the cost of searching drops, sorting becomes less valuable") and indeed in the past five years or so I've found it's more productive not to consume time sorting my email as I can almost always find what I'm looking for with the right search terms. Another interesting thing from this chapter was comparing sorting algorithms to tournament structure (e.g. comparison counting sort :: round-robin). They make the point that sorting is easier with measures than merely ranks which reminded me of what Pascal had to say on the value of legible hierarchies.
Caching: "in the practical use of our intellect, forgetting is as important a function as remembering" = key idea. A suggested algorithm was to drop the Least Recently Used (LRU) item from short-term memory/storage. A practical tip was the Noguchi Filing System: Keep files/stuff in a pile and always return most recently used to the top.
Finally, because of the firehouse of information that I spent some time discussing in the previous section, I think a key aspect of knowledge management going forward is going to be curation. Discarding the chaff and highlighting the gems is going to matter as much or more than just acquiring information.
That brings us to the end of this first post in my new series. Yeah, there's a lot here, and this post is all over the place. This is a broad topic and I'm trying to get my ideas on paper as part of thinking through them more deliberately. Hopefully as I continue to read some books and write some other posts I have planned on this topic, I'll get some further ideas and be able to refine some of the ones I've shared already.