News

Tech

  • Hacker News 21

  • Hacker News: Best Comments 10

    • New comment by 0x3f in "We have a 99% email reputation. Gmail disagrees"
      I'm a Font Awesome subscriber and yes, for the record, they spam me with annoying marketing and probably deserve their Gmail woes. They also use that silly dark pattern where they alternate sending out marketing emails from {David,Harry,Sam,Janet,every other person at the company}@fontawesome.com.
    • New comment by Youden in "We have a 99% email reputation. Gmail disagrees"
      How do you get email addresses? Do people freely and explicitly choose to sign up to your mailing list, or is it baggage that you're forcing on them without their consent? I notice that when I go to https://fontawesome.com/ and click "Start for Free", I'm asked for my email address. This isn't necessary for me to use the icons. I just need a page that tells me to add the necessary tags for cdnjs [0]. I think your problem is dissonance between what you think your users want and what they actually want. If I had to sign up for a mailing list in order to use every frontend development library I've ever used, and their emails actually made it past my spam filter, I'd never see anything else. I think Google's doing the right thing here. You need to separate your newsletter and product updates from people who just want to set up the icons and move on with their lives. [0]: https://cdnjs.com/libraries/font-awesome
    • New comment by Avicebron in "AI Will Be Met with Violence, and Nothing Good Will Come of It"
      I feel like if people keep using AI as a blanket term for "inequality" and "inequality accelerants" then yeah, it's "AI"'s fault. When in reality the whole thing needs to be decoupled.. "Gleefully taking away people's livelihoods will be met with violence, and nothing good will come of it." - fixed.
    • New comment by sunaurus in "Anthropic downgraded cache TTL on March 6th"
      Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes. I keep getting the sense that people feel like they have no idea if they are getting the product that they originally paid for, or something much weaker, and this sentiment seems to be constantly spreading. Like when I hear Anthropic mentioned in the past few weeks, it's almost always in some negative context.
    • New comment by senko in "I run multiple $10K MRR companies on a $20/month tech stack"
      If this sounds like basic advice, consider there are a lot of people out there that believe they have to start with serverless, kubernetes, fleets of servers, planet-scale databases, multi-zone high-availability setups, and many other "best practices". Saying "you can just run things on a cheap VPS" sounds amateurish: people are immediately out with "Yeah but scaling", "Yeah but high availability", "Yeah but backups", "Yeah but now you have to maintain it" arguments, that are basically regurgitated sales pitches for various cloud platforms. It's learned helplessness.
    • New comment by ggillas in "How We Broke Top AI Agent Benchmarks: And What Comes Next"
      This is a phenomenal paper on exploits and hopefully changes the way benchmarking is done. From the paper: We achieved near-perfect scores on all of them without solving a single task. The exploits range from the embarrassingly simple (sending {} to FieldWorkArena) to the technically involved (trojanizing binary wrappers in Terminal-Bench), but they all share a common thread: the evaluation was not designed to resist a system that optimizes for the score rather than the task.
    • New comment by kilpikaarna in "Small models also found the vulnerabilities that Mythos found"
      Wasn't the scaffolding for the Mythos run basically a line of bash that loops through every file of the codebase and prompts the model to find vulnerabilities in it? That sounds pretty close to "any gold there?" to me, only automated. Have Anthropic actually said anything about the amount of false positives Mythos turned up? FWIW, I saw some talk on Xitter (so grain of salt) about people replicating their result with other (public) SotA models, but each turned up only a subset of the ones Mythos found. I'd say that sounds plausible from the perspective of Mythos being an incremental (though an unusually large increment perhaps) improvement over previous models, but one that also brings with it a correspondingly significant increase in complexity. So the angle they choose to use for presenting it and the subsequent buzz is at least part hype -- saying "it's too powerful to release publicly" sounds a lot cooler than "it costs $20000 to run over your codebase, so we're going to offer this directly to enterprise customers (and a few token open source projects for marketing)". Keep in mind that the examples in Nicholas Carlini's presentation were using Opus, so security is clearly something they've been working on for a while (as they should, because it's a huge risk). They didn't just suddenly find themselves having accidentally created a super hacker.
    • New comment by tptacek in "Small models also found the vulnerabilities that Mythos found"
      If you cut out the vulnerable code from Heartbleed and just put it in front of a C programmer, they will immediately flag it. It's obvious. But it took Neel Mehta to discover it. What's difficult about finding vulnerabilities isn't properly identifying whether code is mishandling buffers or holding references after freeing something; it's spotting that in the context of a large, complex program, and working out how attacker-controlled data hits that code. It's weird that Aisle wrote this.
    • New comment by johnfn in "Small models also found the vulnerabilities that Mythos found"
      The Anthropic writeup addresses this explicitly: > This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs through our scaffold. Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings. While the specific run that found the bug above cost under $50, that number only makes sense with full hindsight. Like any search process, we can't know in advance which run will succeed. Mythos scoured the entire continent for gold and found some. For these small models, the authors pointed at a particular acre of land and said "any gold there? eh? eh?" while waggling their eyebrows suggestively. For a true apples-to-apples comparison, let's see it sweep the entire FreeBSD codebase. I hypothesize it will find the exploit, but it will also turn up so much irrelevant nonsense that it won't matter.
    • New comment by epistasis in "Small models also found the vulnerabilities that Mythos found"
      > We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. Impressive, and very valuable work, but isolating the relevant code changes the situation so much that I'm not sure it's much of the same use case. Being able to dump an entire code base and have the model scan it is they type of situation where it opens up vulnerability scans to an entirely larger class of people.

Weather

  • Wetterochs Feed 1

    • Wetter - bis Dienstag Luftmassengrenze mit vielen Wolken und einzelnen Regenfällen
      Hallo! Am Sonntag liegt bei uns eine Luftmassengrenze. Bis in 2000 m Höhe fließt mit schwachen nördlichen bis nordöstlichen Winden kühle Luft zu uns. In 2000 bis 2500 m Höhe gleitet von Südwesten her mildere Luft auf diese kühle Luft auf. Dadurch bildet sich eine geschlossene Wolkendecke, aus der immer wieder mal Regen fällt. Groß ist die Dynamik nicht, die Regenfälle sind also nicht besonders ergiebig (5 mm). Die Temperaturen schwanken um 10 Grad. Am Montag wandert die Luftmassengrenze etwas nach Westen, d.h. wir kommen mehr in die Warmluft. Die Wolkendecke bleibt weiterhin meist geschlossen. Hier und da fällt mal leichter Regen. Die Temperaturen steigen auf 14 Grad. Der Wind ist sehr schwach. Am Dienstag kommt die Gegenbewegung, die Front bewegt sich wieder nach Osten. Mit in Böen frischen Nordwestwinden hält die kühlere Luft wieder Einzug, d.h. die Temperaturen sinken auf 10 Grad. Es kann stärker regnen mit Mengen bis 10 mm. Wann genau uns das Regenband überqueren wird, ist aber noch unklar (je nach Wettermodell 1. oder 2. Tageshälfte). Von Mittwoch bis Freitag herrschen nur geringe Luftdruckgegensätze und es überwiegt wahrscheinlich leichter Hochdruckeinfluss (vertikale Luftbewegungen in der Atmosphäre eher absinkend). Das deutet auf wechselnd wolkiges Wetter mit Zwischenaufheiterungen hin. Einzelne Regenschauer und Gewitter bilden sich bevorzugt nachmittags. Die Tageshöchsttemperaturen liegen knapp unter 20 Grad. In Schauernähe frischt der Wind auf, ansonsten ist er sehr schwach. Die nächtlichen Tiefsttemperaturen liegen bei ca. 7 Grad. Frost ist im Vorhersagezeitraum nicht zu erwarten. Ein kurzer Blick zurück: Am Samstagnachmittag war es wolkenlos, d.h. die Wettermodelle behielten Recht. Nach den Lidar-Messungen kam der Saharastaub in Höhen von 3000 bis 4000 m zu uns, also viel zu niedrig für die Bildung der Saharastaub-Schleierwolken. OK, da hatte ich vorher nur die Karten der Universität Athen angesehen, die die Menge des gesamten Staubs liefern, aber nicht dessen vertikale Verteilung. Daher die Fehleinschätzung. Hier gibt es mit dem deutschen ICON-ART Modell inzwischen tatsächlich bessere Möglichkeiten, das abzuschätzen. Noch einfacher wäre es natürlich, wenn die Ergebnisse gleich in die normalen Wettermodelle einfließen würden, dann bräuchte ich mir da den Kopf nicht mehr selbst zu zerbrechen. Aber dafür ist die Zeit noch nicht reif. Den spärlichen Informationen zu dem Thema entnehme ich, dass derzeit die Berücksichtigung der Staub-Prognosen unter dem Strich die Ergebnisse der normalen Wettermodelle verschlechtern und nicht verbessern würde. Wetterochs Bitte unterstützen Sie die Wetterochs-Wettermail durch eine Spende!

Development

  • Adam Argyle 1

    • Why AI Sucks At Front End
      and nerdy.dev written on it." height="674" width="1920" /> AI is a sycophantic dev wannabe that skimmed a shitload of tutorials. You get the results of a probabilistic guess based on patterns it saw during training. What did it train on? Ancient solutions, unoriginal UI patterns, and watered down junk. I'm about to rant about how this is both useful and lame. # AI loves the boring stuff. It thrives on mediocrity. If you want some gloriously unoriginal UI, it has your back 😜 Scaffolding: Generic regurgitation of patterns it's seen, done. Tokens: Migrating tokens or mapping them out? It eats this tedious garbage for breakfast. Outlining features: Generic lists ✅ Lying to your face: Confident hot garbage on a silver platter. It'll hand you a snippet, dust off its digital hands, and tell you it finished the work. It did not finish the work. Aka: If it's a well-worn pattern, AI is there to help you copy-paste faster. Which, for a lot of programming, is totally the case. I'm genuinely finding a lot of helpful stuff in this department. # Pixel perfection & bespoke solutions… what are those? The exact second you step off the paved road of unoriginality, it faceplants. Bespoke solutions & custom interactions: Try asking it for some scroll-driven animations or custom micro-interactions. It will invent a CSS syntax that hasn't existed since IE6. Layout & Spacing: Predicting intrinsic/extrinsic page properties? It's already bad at math, how could it get this rediculously dynamic calculation correct. Spacing? Ha, seems reasonably to expect symmetry, but it's terrible at the math. Combined states: Pinpointing where to edit a complex component state makes it cry. Accessibility: It throws aria-hidden="true" at a wall and hopes it sticks. Performance: It will give you the heaviest, jankiest solution unless you explicitly ask it to be for a specific (apparently "indie") performance solution. Tests: Writing good tests? Good, no. A lot, yes. And the absolute best part? The more complex the component gets, the slower and dumber the front-end help becomes. Incredible how it can one shot a totally decent front-end design or component, than choke on a follow up request. Speaks to what it's good at. # # It lacks modern training data. It has an excessive reliance on standard templates because that's what the internet is full of. Modern CSS? It's barely aware of it. # It's an LLM, not a rendering engine! It's notoriously bad at math, and throwing screenshots at it means very little. It's stabbing in the dark. This leads to the classic UI interaction: AI: "I'm done! Here is your perfectly crafted UI." Me: "There's a gaping hole where the icon should be, fix the missing icon." AI: "You're absolutely right. Let me fix that for you." # It doesn't understand the "why" behind our architectural decisions. SDD, BDD, or state machines might help guide it, but the models weren't exactly trained on those paired with stellar solutions. We're asking a giant text-predictor to make new connections on the fly. We can get it there, but there's so much to consider we have to spell it out before it starts making the connections we want. # It doesn't control where the code lives. It can write annoyingly amazing Rust, TypeScript or Python, but those have the distinct advantage of a predictable (pinnable!!! like v14.4) environment the code executes in. That's not how HTML or CSS work, there is no pinning the browser type, browser window size, browser version, the users input type (keyboard, mouse, touch, voice), their user preferences, etc. That's complex end environment shit. The list goes on too, for scenarios, contexts and variables the rendering engine juggles before resolving the final output. The LLM doesn't control these, so it ignores them until you make them relevant. Even prompting in logical properties, you have to ask for this kind of CSS. These should be CSS tablestakes output from LLMs, but it's not. And even when you ask for it, or provide documentation that spells it out, it's not guaranteed to work. The place where HTML and CSS have to render is chaotic. It's a browser, with a million different versions, a million different ways to render, a million different ways to interact with it, and a million different ways to break it. It's a moving target, and LLMs are terrible at moving targets. # We're a LLM combinatorial explosion. We're wildly unpredictable targets. We change our minds, we switch viewports, we change theme preferences, we changes devices, we change browsers, we change browser versions, we switch inputs, we change our everything. We're not a static target. We're not a pattern that can be learned. There is a "human mainstream" of behaviors, preferences, and expectations where LLMs can be genuinely helpful; but our "full potential" matrix will be exploding LLM output patterns for a long time to come. IMO at least. unless we Borg.
  • Ben Nadel's Web Development and User Experience Feed @ BenNadel.com 1

AI