Goldman Sachs Hires AI Expert to Enhance Developer Skills

Last week, Goldman Sachs quietly (but loudly for anyone who builds software for a living) signaled that it’s serious about using AI to level up developers—not just to automate tickets, but to improve how engineers learn and ship. And honestly? That’s the part I care about. “AI for productivity” is everywhere. “AI that actually helps devs get better” is rarer.

Below I’m breaking down what’s being reported, what it likely means for a team of ~12,000 developers, and a set of AI tools I’d actually consider using (with real workflows and the stuff that can go wrong).

📢 BREAKING NEWS

Here’s what caught my attention this week—focused on concrete details you can verify, not vague “AI is coming” talk.

AI Colleague, Not Replacement
Who/what: Goldman Sachs is rolling out an AI “hybrid worker” concept by hiring/bringing in Devin (a viral coding expert figure) to support developers.
Scale: The reporting frames it as support for Goldman’s ~12,000 developers.
Core promise: It’s positioned as a way to assist developers in improving their skills while aiming to avoid outright job replacement.
Why it matters to builders: If this is done well, the “win” isn’t that AI writes code end-to-end—it’s that it shortens the feedback loop. In practice, that means faster debugging, better code review suggestions, and quicker onboarding for new engineers.
AI Agent Markets Are Heating Up
When: The article says July 15 is the key date.
Who’s involved: AWS is partnering with Anthropic.
What’s changing: It’s described as an AI agent marketplace designed to help startups access “enterprise wallets” more easily—so more companies can buy and deploy agentic workflows without rebuilding everything from scratch.
Why it matters: We’re moving from “cool demos” to “purchaseable capabilities.” If you’re a developer, that affects how you evaluate vendors, how you integrate agents into existing systems, and how you manage permissions and audit trails.
Robinhood Founder Strikes Gold (Again)
Who: Vlad Tenev, the Robinhood founder.
What’s new: He started an AI company called Harmonic.
Valuation signal: The piece pegs it near $875 million (with an “approaching a billion-dollar valuation” framing).
Focus area: Advanced math tools aimed at problems tied to AI misunderstandings.
Why it matters for engineers: The more AI gets embedded into real workflows, the more you’ll see demand for tools that reduce errors—especially “confidence without correctness” failure modes.

🤖 BEST NEW AI TOOLS

I’m not going to pretend every new app is amazing. What I look for is boring-but-important stuff: what inputs it accepts, what it can’t do, how fast it is, and whether the output is “useful enough to ship” or “cool enough to post.”

Stacks Best for: keeping notes, links, and references in one place without turning your brain into a bookmark graveyard.
Inputs it supports: links + text notes (typically what you’d paste from docs, tweets, or issue trackers).
What to expect: faster retrieval when you’re writing code or docs, especially when you’re reusing prior research.
Quick workflow: paste a link → add 2-3 bullet “why it matters” notes → tag it by project → when you’re stuck, search tags instead of hunting through browser history.
YouTube to Doc Best for: turning long videos into structured docs you can actually skim and cite.
Inputs it supports: YouTube links (and often transcripts, depending on the setup).
Limitations I’d watch: if the transcript is messy, the summary can inherit the mess. I usually verify key claims in the video timestamps before using the doc as “source.”
Quick workflow: paste URL → generate outline → export as a doc → add timestamps for the sections you’ll quote.
MicroTWT Best for: turning rough thoughts into cleaner posts for X—without staring at a blank page for 40 minutes.
Inputs it supports: short notes or drafts you write yourself.
Accuracy limits: it can improve phrasing, but it won’t magically make your facts correct. If you’re referencing a library version or a benchmark number, double-check.
Quick workflow: paste your thought → generate 3 tweet variations → pick one → tighten the “claim” sentence → add a link or screenshot reference.
SJinn Best for: generating visuals from descriptions when you need something “good enough” for a landing page or prototype.
Inputs it supports: text prompts that describe scenes, products, or concepts.
What I notice: the output quality depends heavily on prompt specificity (subject, lighting, style, and composition). Vague prompts = generic images.
Quick workflow: write a prompt with 4-6 details (subject + style + setting + color) → generate options → choose the closest one → iterate on composition before you export.
Smith.ai Best for: handling “front desk” conversations like FAQs and scheduling—so you don’t lose time to repetitive calls.
Inputs it supports: inbound questions and appointment intents (usually via a chat or phone integration).
Real limitation: it can mis-route edge cases (refund policy exceptions, unusual scheduling constraints). I always recommend giving it a clear “when to escalate to a human.”
Quick workflow: define your FAQ answers → set scheduling rules (time zones + buffers) → add escalation triggers → test with 10 real scenarios before going live.
Socialaf.ai Best for: creating AI character-style content that places your products into different contexts.
Inputs it supports: product details/images + a prompt describing the scenario.
Accuracy limits: it can nail the vibe, but it may “interpret” product appearance. If brand consistency matters, you’ll want a style guide and a review step.
Quick workflow: upload product photo → describe 2-3 scenes (e.g., “cozy desk setup,” “outdoor lifestyle,” “night mode aesthetic”) → generate → pick the best 1-2 → export for your next campaign.
Model Playground AI Best for: comparing model behavior quickly when you’re choosing an AI for a specific task.
Inputs it supports: prompts you can reuse across models to compare outputs side-by-side.
Why it’s useful: you stop guessing. You test. You see which model is better at summarization, code generation, or classification for your exact prompt style.
Quick workflow: create 5 prompts (your real use cases) → run them across models → score outputs for correctness + clarity → keep a short “best model for task X” list.

📝 PROMPT OF THE DAY

Instead of a template prompt that never really gets used, here’s a prompt I’d actually run for a developer workflow—plus what I’d measure after.

Worked example prompt (copy/paste):

Act as a senior software engineer. I’m building a small internal tool that turns customer support emails into structured bug reports for our Jira project. Task: Propose a practical workflow using an LLM. Include: 1) The exact input format I should send (example email → extracted fields like summary, steps to reproduce, expected vs actual, environment, severity). 2) A JSON schema for the output that I can validate automatically. 3) A “human-in-the-loop” review step: when the model should ask follow-up questions vs when it should proceed. 4) A short prompt strategy (system + user prompt) that reduces hallucinations. 5) A mini test plan with 10 sample emails (I’ll paste them later). Tell me what success looks like and how I’ll score it (accuracy, missing fields rate, time saved). Output requirements: - Provide a ready-to-implement approach, not theory. - Mention at least 3 failure modes and how to mitigate each. - End with a checklist I can follow before deploying this to production.

How I’d measure success (so it’s not just “the model sounds smart”):

Field completeness rate: % of bug reports with all required fields populated (e.g., steps, expected/actual, environment).
Escalation accuracy: % of times the model correctly asks follow-up questions instead of guessing.
Reviewer time: average minutes per report before vs after (I’d aim for a 30–50% drop).
Jira rework rate: how often a report is rejected or needs major edits.
Hallucination checks: count of fabricated facts (versions, browser details, dates) caught by reviewers.

If you want, tell me your niche (devtooling, marketing ops, customer support, recruiting, etc.) and I’ll tailor the prompt + metrics to your exact use case.