Show HN: Chat with 19 years of HN
app.camelai.comHey HN
We loaded a BigQuery dataset of all of Hacker News, every comment, story and user, into camelAI.
You can ask questions like:
• “When does dang tend to comment during the day?”
• “Which domains have gained the most submissions since 2015, year-over-year?”
• “How has average comment length changed each January since 2007?”
• “Top five users who link to arXiv papers the most.”
It's behind a log-in to prevent abuse but free to use for 10 messages. No payment info required. We use OpenAI o3 or Claude sonnet 3.7 for the agent which can be really expensive.
Would love feedback especially around graph/chart quality and o3 vs sonnet.
“Favourite” Programming Languages on Hacker News - Key take-aways
Rust is the most talked-about language
2 327 stories – the highest volume
57 212 total points – the highest aggregate karma
Go comes a very close second in volume (2 259 stories) and total score (45 511).
Python and JavaScript still dominate discussion but are edged out by Rust & Go this year.
Smaller but passionate followings
Lua & Erlang generate the highest average score per story, indicating highly-engaged niche audiences.
Swift and Elixir also punch above their weight on a per-story basis.
Classic staples (C++, Java, Ruby, PHP) remain active but draw less relative excitement.
Quick ranking by story count
Rust – 2 327
Go – 2 259
Python – 2 029
JavaScript – 1 927
Highest average karma per story
Lua – 51.8
Erlang – 36.5
Swift – 29.3
Elixir – 25.9
Rust – 24.6
Interpretation: Rust and Go are currently the “favourite” languages on Hacker News by sheer attention and total karma, while Lua and Erlang have smaller but very enthusiastic communities
- Next time any Rust supporter telling you Rust is not popular on HN or Ada gets mentioned a lot of Zig gets similar attention as Rust. You may point them to this post.
> Rust and Go are currently the “favourite” languages on Hacker News by sheer attention and total karma
Of course, the statement must be consumed with a few NaCl because frequency of discussion (especially within an obsessive subgroup) does not represent effective implementation. Even less so do "attention and karma".
By actual work being done and bills paid and new, non-trivial projects begun, some ordering of Python, ECMAscript (JS), Java, C, C++, C# would be good Family Feud-style ranked bets.
One thing I noticed is that projects written in Rust always mention it the title (there’s one on the front page right now), compared to other languages that don’t. That probably adds to the numbers
lowest average, yet ranks so high, wich mean it gets helped by some secret algorithm ;)
if you browse HN daily, you start to notice patterns, there is a _real_ bias towards rust, even more obvious when you dig at the YC companies and what they seem to promote
I suspect something went wrong here with Typescript not being mentioned as a favourite. My own recollection is that when discussions of favourite programming languages come up, Typescript is often one of the top contenders, and it's extremely rare for people to prefer Javascript of all languages.
Perhaps this is folding Javascript in with Typescript.
People don’t talk about typescript. They’re busy getting shit done.
I say this as someone who likes Rust very much and gets paid for Typescript.
> It's behind a log-in to prevent abuse ... Would love feedback ...
Use a captcha instead of a log-in wall?
This is impressive! Some interesting (and seemingly accurate) insights on my own behaviours :-)
Caveat: I didn't try this on desktop. On mobile (DDG Browser) I couldn't actually see any charts on the questions I asked. Whilst the display of the tables (dataframes?) is nice, my suspicion is a general user would prefer a graph or table _by default_. I needed to prompt specifically to get the workflow to output a graph for me.
Thanks for the feedback! We've noticed o3 doesn't tend to make graphs when it should but sonnet makes too many graphs... We'll have to keep tweaking this. Mobile definitely needs some work but I'm glad it worked for you.
Haha, i was able to dox myself by asking "what is the real name of user mnky9800n". TBF, i don't hide my real name from this username. but still, it just churned until it decided it was me.
These privacy policies and terms of service for all these AI sites give me such a gross feeling. It it opportunism at its max, likely due to our business ecosystem, but regardless. I don't want to engage in any serious manner. I don't think it's good for society at all.
Or just nonsensical:
“1.5. Prohibited Uses:
Without limiting Section 1.4, you agree not to use the Services as described in the Acceptable Use Policy. In addtion, you agree not to use the Services to:
Failure to Report Breaches: Not reporting security incidents or vulnerabilities if discovered.”
Maybe just collect the answers to all those interesting questions and publish them as a blog post?
Good idea. Will do that
"What do you think about user XYZ?" or "What do you think about the comments of user XYZ?"
It starts a whole lot of SQL queries that find and aggregate data & statistics
It must have a very interesting and well written system prompt for this type of questions.
(gives me second thoughts about my personal approach to privacy)
> "What do you think about the comments of user XYZ"
Wow that is really scary. Never did I ever think someone would actually go through all my old comments, analyze them in detail and then judge me based on them (my real account, not this throwaway).
Yes I knew it would be theoretically possible, but you'd have to be a total stalker and real creep to actually do it. Now anyone with an LLM can just do it without a second thought.
And it'll only get worse from here on. I'm sure there is at least 1 comment somewhere on the internet by me where I wasn't too nice, or a like / upvote on a questionable opinion or something.
If it's in any way connectable to me future AI tech is going to find it. Probably even across accounts, matching writing styles and whatnot.
I seriously think I'm going to stop posting on the internet for good.
Wouldn’t surprise me if some throwaways could be linked to real afcounts, and if real accounts could be linked to other real accounts, Both ones on HN and elsewhere on the intenet, from Reddit to usenet.
I suspect doxing with AI would be quite easy too, judging the way accounts talk in the same way things like gait recognition can work, link the accounts, narrow down the person, build a profile. Suddenly it becomes user abc123 is linked to (list of 30 accounts from discord to flyertalk), based on these posts about flying on us airways a lot in 2015 and these posts about Las Vegas and these about a morning flight and this picture from linked Twitter account the person worked in this industry and lived in this location from this time to that time and is likely this person on linked in.
Anonymity is dead. Historically as well as in the future. But HN still think governemt is the problem and the gdpr is bad because it disincentivises holding onto data.
I don’t see where the data set Is. I login it asked me to connect stores. I skip and then I only see three data sets. None of them are about hacker news
Go to app.camelai.com/hn/ or click the title link!
It would be great if we could use our own openAI/claude accounts and pay a smaller subscription...this may be cool, but it's too expensive, I'd just like to play around...
It doesn’t even surprise me anymore that someone is charging a subscription fee to use an off the shelf LLM with scraped data from a public site. The gold rush can’t be over soon enough.
The problem is that you can't host a free LLM based service the way you can host a website, without being exposed to cost spikes the moment it becomes popular (or misused). Lots of smaller apps need a better cost pass-through mechanism; this is even more of a problem for hobbyists/non-profit projects than for commercial ones. We can't keep going with a free trial (costs eaten by developer) + subscription for every little thing.
A better solution is to allow the user to provide their own API key if they want to use it without limits (and really the needed solution is authentication and authorization that provides access to the appropriate API accounts without manually passing around a key). Subscriptions are a tool to generate revenue, not to purely pass on costs.
Not scraped. HN itself publishes a live dataset to bigquery. The product is meant to connect to your own database but I thought this was fun to connect to hn
Yea would be great if OpenAI implement some sort of "Login with ChatGPT” for Frictionless API Billing
You should make a page with some browsable pre-generated pages and post again to spark interest
Nice.
Ask it about the estimated capabilities of the NSA according to all posts/comments.
Very enjoyable discussion history graph.
Should anyone from camelAI be present, a quick note: the page at https://camelai.com/data-sources currently renders blank—none of the data sources appear.
EDIT: oh, I guess that'd be you, vercantez :)
Thanks for catching that! Should be fixed once the cache invalidates
I hit my free limit. It was fun while it lasted.
I've always wondered what the copyright status is for comments on Hacker News.
They say this:
Commercial Use: Unless otherwise expressly authorized herein or in the Site, you agree not to display, distribute, license, perform, publish, reproduce, duplicate, copy, create derivative works from, modify, sell, resell, exploit, transfer or upload for any commercial purposes, any portion of the Site, use of the Site, or access to the Site. The buying, exchanging, selling and/or promotion (commercial or otherwise) of upvotes, comments, submissions, accounts (or any aspect of your account or any other account), karma, and/or content is strictly prohibited, constitutes a material breach of these Terms of Use, and could result in legal liability.
My comments are under CC-BY-SA for humans, but any incorporation of my comments into an AI model entitles me to 5% of your company's common stock.
very impressive
Uh, how do I opt out from AI impersonating myself? Is the goal of these products that I stop contributing to the Internet?
Also I find quite distasteful that you get free data without explicit approval and try to sell it back to the same audience.
Hacker News publishes this dataset freely to the bigquery data marketplace
https://console.cloud.google.com/marketplace/product/y-combi...
This product was built to connect to your own database but I thought it was fun to connect to the HN dataset
This is not relevant to your point but I want to say that's an entirely third party project and we didn't even know about it for a long time. We don't publish data to them except in the sense that we publish it to everybody: https://github.com/HackerNews/API.
I think their page gives a misleading impression that the project is somehow official, when it's not (https://news.ycombinator.com/item?id=43850991).
Thanks for the clarification dang. I was misled by the listing which lists the author/publisher as "Y Combinator". Thanks for offering the official API.
Data is unable to regurgitate a comment in my style and pass it for something I have written. If a person were to do that, that'd be quite rude, but if it's AI it's perfectly fine? I do not think so.
In public you have no control over how someone uses a picture of your likeness
That's not true in the case of impersonating someone based on public recording of them. You'll quickly run afoul of Right of Publicity laws. It's one thing to simply record people in public where they don't have an expectation of privacy. It's quite another to impersonate them.
Never do anything publicly that you don’t want to be public.
And don't mistreat peoples public data, and expect them to like or support you.
Unfortunately “mistreat” is highly subjective. No matter what you do someone will be angry at you. I once got yelled at for taking a picture on a public beach where there was some family picnicking maybe 50 meters away. I think I was reasonable, the gentleman most emphatically did not.
Control what you can control. If you object to being a small data point in someone else’s documentation of a public experience, don’t put yourself in that situation.
Don't disagree, just saying it goes both ways. In your case you didn't care (not judging, I probably wouldn't either) about the opinions of the randos at the beach. However, in business, reputation does matter.
I'm fine with public records of stuff I have posted, I'd not be fine if you were impersonating me, nor I am if it's a piece of software doing that.
Can we take the ethics of AI seriously? I feel it's about time.
Agreed, role playing as real people is unacceptable for both AI and other real people.
“Tell me what Bernie Sanders might say about…” is fine, so long as the response is in the form “based on his past statements”. “Pretend to be Bernie Sanders and talk about” is not ok to prompt, nor ok for the model to respond to with an impersonation.
>Can we take the ethics of AI seriously?
If you're not suggesting a law to do so, then no. 35+ years of using the internet tells me that ethics is not included, nor at this point in the game should be expected.
You can ask ChatGPT its opinion about you as HN user, and you will see they trained on the whole content of HN.
I had another experience; even though it roasted me pretty thoroughly, that was only until I turned on web search. Before that, it was just pulling random, plausible shit out of its ass.
https://chatgpt.com/share/682a275e-0cb8-8013-8365-b896bfa171...
[flagged]