$22M With Public Data And AI?

Disclaimer: These notes discuss business concepts and the ethical organization of publicly available information only. Business involves risk. Results are not typical. Most people who try to make money online make nothing. Always respect privacy laws, robots.txt files, and website terms of service. Educational only.

Today we are diving into a business that is doing over $20 million a year selling data that is pretty much available online anyway, if you know where to look. It is a very straightforward business model, and it is almost as if it was built for AI.

The whole game is this: find specific data, package it as a digital product or software, sell it for a premium. Forget endless content writing. We are using AI to find data that already exists, identify the people who would benefit from having it organized, and package it up.

TL;DR. The Whole Video In 10 Points.

  1. BuiltWith makes $22.6M a year with 4 employees and zero outside funding. They sell publicly visible website tech-stack data. That is the model in one sentence.
  2. The internet is one giant pile of disorganized public data. The money is in finding it, cleaning it, and selling it to people who need it.
  3. You don’t need to be a programmer. AI now does in minutes what used to take a team of engineers weeks.
  4. This business is older than the internet. The Yellow Pages, the SRDS, the InfoUSA CD-ROMs in 1995. Same model. Different delivery.
  5. Google is already your free database. Operators like site:, intitle:, inurl:, filetype: turn the search bar into a query language.
  6. Stack the operators. One operator finds noise. Three stacked operators isolate exactly the data nobody else collected.
  7. Hand the search to AI. Tell AI to run the search, visit each result, extract specific fields, return a clean spreadsheet.
  8. There are 4 ways to monetize the data. One-off product. Membership. SaaS tool. Or content engine for ads and affiliates.
  9. Niche down hard. You will never beat ZoomInfo. You can absolutely beat the blank space they ignore.
  10. Be ethical or don’t bother. Public data only. Just because AI can find something doesn’t mean AI should find it.

Part 1. The Big Idea: Data Is Everywhere

Right now, millions of people are trying to make money online by writing blog posts, launching Shopify stores, or becoming influencers. They are all competing in the same crowded spaces.

Here is what most people miss: the internet is one giant, messy pile of public data. Every business that builds a website leaves digital footprints. Clues about what software they use, whether their site is broken, what ads they run, and whether they even have a website at all.

This information is not hidden. It is sitting in plain sight. The problem is that it is completely disorganized, and that is exactly where the money is. Google did not invent new content. They organized the world’s existing information, and now they are worth trillions.

Aha moment. You don’t have to be creative. You don’t have to be a genius. You just have to find data that someone already wants, and structure it. That is the entire business. Tiny teams are doing this for tens and hundreds of millions a year.

The Major Players In The Data Business

This is not a clever idea waiting to be tested. It is a proven, multi-billion-dollar industry. Some are public companies worth billions. Some are 4-person bootstrapped shops printing eight figures from a laptop.

Spotlight: BuiltWith. $22.6M a year. 4 employees. $0 funding.

BuiltWith scans websites and records what software is on them. That is it. They take publicly visible information, store it in a database, and let salespeople search it for $295 to $995 a month. Four people. Twenty-two million dollars. From organizing data anybody could see.

Company Revenue Employees Funding Notable
ZoomInfo $1.214 billion ~3,500 Public (NASDAQ) IPO 2020
Semrush $376.8M ~1,000 Public (NYSE) First profitable year in 2024
Similarweb $282.6M ~900 Public (NYSE) Bought SimilarTech for $1,500
Ahrefs $149.1M ~171 $0 (bootstrapped) Crawls 8 billion pages a day
Wappalyzer ~$18M small $0 Sold for €65M in 2024
BuiltWith $22.6M 4 $0 Most efficient business in the industry
Bombora ~$50 to 80M ~200 VC-backed Tracks buyer intent
SpyFu $2.3M 21 $0 PPC and SEO competitive intelligence
PublicWWW ~$1 to 5M 1 to 2 $0 Indexes 514M pages of source code

Look down that funding column. Half of these took zero outside money. They were started by regular people who saw data nobody had bothered to organize, and organized it.

You are not competing with ZoomInfo’s billion-dollar machine. You are competing with the blank spot in the market they will never bother covering. The niche too small for them, too specific for them, or too new for them. That is your whole opening.

Solo And Tiny-Team Operators

The companies above can be intimidating. Here are the ones doing the same thing at solo creator scale. These are the realistic models.

  • Nomad List. Pieter Levels built a single spreadsheet of cities scored by cost of living, wifi, weather, and safety. Charges remote workers $30 a month. Public data, one organized place.
  • Starter Story. Pat Walls finds founders, structures their stories, shares the actual numbers. Started by hunting public startup data on Google.
  • GetLatka. Ranks for “OpenAI revenue”, “Ahrefs revenue”, “SaaS companies”. Public data about company financials, organized as a directory and database. Free tier plus bulk data upsell.
  • Trends VC. Curated reports on emerging niches and AI trends. About 1,500 paying members. Roughly $500K a year.
  • Numbeo. One-person site collecting public cost-of-living data. Millions of monthly visitors. Six figures a year purely from display ads.
  • StatMuse. Public sports stats turned into searchable, shareable content. Acquired for tens of millions. (Be careful with sports data licensing.)
  • VC Sheet. Newsletter with surveys, salary benchmarks, and project manager data. Aggregated public info that operators in the field actually pay to access.

The pattern in every single one is the same: pick one specific audience, find data they need scattered in public, organize it once, charge them monthly. None of these required a developer team. None required funding.

How I First Realized Data Was A Business

When I first realized data was a business, I was a beginner marketer back in the day. I got a CD-ROM of all the businesses online. It was like a Yellow Pages, but for your computer. The thing was slow because we didn’t have fancy computers like now.

Having that data, I was able to build a business. That is how I built my SEO company. I would contact different limousine companies and other local businesses, pitch them, and make it work. That data was very valuable to me. I have been selling data ever since. 2002 was the first time I started selling data about websites and earnings.

Part 2. How AI Makes This Possible For You

Five years ago, finding and organizing this data required you to be a master computer programmer. You had to write complex code to scan websites and sort the data.

Today, AI has leveled the playing field. You don’t need to write code. You just need to give the AI instructions. You can tell an AI: “Go look at these 100 local dentist websites. Read the pages and tell me which ones have broken links, and which ones don’t have online booking.” The AI does it in seconds.

You are not replacing your brain with AI. You are replacing your hands. You decide what data is valuable. The AI does the heavy lifting.

This Is Not New. The Yellow Pages Taught Us Everything.

Before the internet existed, there was already a booming industry built entirely around selling organized data about businesses. The Yellow Pages was not just a phone book. It was a database. And once it existed, smart entrepreneurs realized they could take it, reorganize it, and sell it in new ways to people who needed it.

  • 1886. The Yellow Pages is born. A printer in Wyoming accidentally uses yellow paper for a business directory.
  • 1972. Vin Gupta starts manually typing the contents of every Yellow Pages directory in America into a computer. He starts in his garage.
  • 1984. His company, American Business Information, releases the first electronic product. Floppy disks and CD-ROMs sold at Best Buy for $19 to $99 each.
  • 1990s. The CD-ROM data gold rush. PhoneDisc, SelectPhone, ProPhone, BigYellow flood in. ABI alone hits 500,000 customers and nearly $100M in revenue from data already public in every phone book.
  • 1995 to 2000. Niche data wins. ABI releases “517,000 Physicians and Surgeons” and “1.1 Million Professionals” targeting luxury goods sellers. Niche data commands premium prices.
  • 2000s. The internet kills the CD, not the business. The same data now lives on a website. InfoUSA, ZoomInfo, BuiltWith move the model online.
  • Today. AI puts this in your hands. What Vin Gupta did by hand for years, AI can now do in minutes.

Vin Gupta took free public data from phone books, organized it, and built a $100M company. He did it with computers and phone calls. You can do the same thing today, with AI, in an afternoon.

How To Find The Data: Google Is Already Your Database

You don’t need scrapers. You don’t need software. You don’t need a budget. Google already has it all. Every business, every blog, every product page, every PDF list someone uploaded by accident. The whole public internet, indexed and searchable, for free.

Almost everybody uses Google like a tourist. We are going to use it like a data miner.

Step 1. Type like a tourist.

Search: pickleball

A billion results. News articles, ads, Wikipedia, big brand sites. Useless for collecting data.

Step 2. Add a city.

Search: pickleball clubs Orlando

One small change. You just narrowed the entire internet down to every pickleball club in Orlando. That is already a list. That is already data.

Step 3. Use Google’s hidden filters.

Google has commands you can type right into the search bar that act exactly like database filters. They are free. They are built in. Almost nobody uses them.

  • site: Only show pages from one specific website. Example: site:flippa.com pickleball
  • intitle: Only show pages with a specific word in the page title. Example: intitle:"income report"
  • allintitle: ALL words must be in the title. Example: allintitle: income report drop shipping
  • inurl: Only show pages with a word in the URL. Example: inurl:reviews pickleball
  • "exact phrase" Quotes force Google to match exact wording. Example: "pickleball coach near me"
  • filetype: Only show specific file types. Example: filetype:pdf "real estate investor list"
  • -word Exclude something. Example: pickleball -wikipedia
  • OR Either of two things. Example: "hair salon" OR "barber shop" Orlando

Step 4. Stack them. This is the trick.

The real power is stacking operators. Each one is another filter on the dataset. The more you stack, the tighter and more valuable your list.

  • site:linkedin.com "real estate investor" "Florida" → targeted lead list of Florida real estate investors with public LinkedIn profiles.
  • site:flippa.com inurl:listings "monthly revenue" → every business for sale on Flippa with revenue listed publicly. Free market intelligence.
  • allintitle: income report drop shipping → every income-report blog post about drop shipping. Build a “1,000 verified drop shipping income reports” data product from this.
  • filetype:pdf "marketing plan" "small business" → real marketing plans people uploaded to the internet as PDFs.

You just turned Google’s search bar into a query language. The same way a database engineer pulls records from a table, you are pulling records from the entire indexed internet. Free. No tools. No code.

Step 5. Hand It To AI.

Old way: run the search, open each result, copy the relevant info, paste into a spreadsheet, repeat 500 times. Weeks of work.

New way: hand the AI your search query and your goal in one prompt. AI runs the search, visits each result, pulls exactly the fields you asked for, returns a clean spreadsheet. Minutes.

The Master Prompt Pattern

Run this Google search: [YOUR STACKED OPERATORS]

Visit every result on the first [N] pages.

From each result, extract:
  - [FIELD 1]
  - [FIELD 2]
  - [FIELD 3]

Return a clean spreadsheet sorted by [SORT FIELD], [ASCENDING/DESCENDING].

Real Working Example

Run this Google search: site:flippa.com inurl:listings "monthly revenue" pickleball

Visit every result on the first 5 pages. From each listing, extract:
  - Title
  - Asking price
  - Monthly revenue
  - Niche
  - URL

Return a clean spreadsheet sorted by monthly revenue, highest first.

The key is that you are telling AI to go do a task, not to recall something from memory. That keeps hallucination very low because the AI is fetching live data, not remembering.

Your job stops being “do the work.” Your job becomes “ask the right question.” That is the entire skill of this whole business.

How I Used Manus AI To Find 44,000 Sites In One Afternoon

I just used this same approach with Manus AI to build a real product. We started with about 5,100 websites that AI found in a couple of hours. Now we are up to 44,000 websites.

This is the first month I have spent close to $1,000 on Manus AI, but it is absolutely worth it. My outsourced employees cost more than that a week, and they cannot work that fast. Manus does the heavy data gathering, then I hand it to my team.

The cool part is what I add on top. Instead of just listing the sites, I get the niche, the platforms they use (whether they are running Taboola, what ad networks, what tech stack), and then I break the niche down even further. That extra layer is what makes the data valuable.

Part 3. The 4 Ways To Monetize Organized Data

Model 1. The One-Off Digital Product. $50 to $500 per sale.

Easiest way to start. Use AI to compile a specific list, package as a PDF or spreadsheet, sell as a one-time download. Example: 500 local businesses with Facebook pages but no website. Package as “500 Warm Web Design Leads” for $99.

People on Gumroad and Twitter sell lists of “1,000 Angel Investors” or “500 Podcasts Accepting Guests” for $50 to $150 a pop. Build the list once, sell it hundreds of times.

Model 2. The Membership Site. $29 to $99 a month, per user.

Provide fresh data every week or month. Charge a subscription. Predictable, recurring income. Example: real estate flippers paying $99 a month for a fresh weekly list of motivated sellers (abandoned, probate, etc.) pulled from public records.

Nomad List charges $30 a month for organized remote-work city data. The data is public. The organization is worth a monthly fee.

Model 3. The SaaS Tool. $99 to $1,000+ a month, per user.

Don’t just sell a list. Build a simple online tool where people can search the data themselves. With AI coding assistants today, you can build these tools without writing code yourself.

BuiltWith does exactly this. They scan the internet to see what tech websites use. 4 employees, zero outside funding, $22M+ a year selling search access to salespeople.

Model 4. Data As A Content Engine. $500 to $10,000+ a month.

Use the data to create content that attracts an audience. Monetize with ads, affiliate links, sponsorships. Example: pull public salary data for every job title at every major tech company, turn it into articles like “What Google Pays Its Engineers vs. What Amazon Pays”. People search for this. You run ads.

Numbeo, StatMuse, and thousands of local “best of” sites earn $1,000 to $5,000 a month using this exact approach.

I Buy Data From BuiltWith All The Time

I know the BuiltWith business well because I am a customer. I go through and get all kinds of information from them. I will go to BuiltWith, find lists of sites that use WordPress, then compile that data and look at the ones I want to use for my own affiliate sites or for client work.

I also buy from Spamzilla. Same idea. They show you expired domains. I have been buying their data for years, and I have actually sold a lot of Spamzilla subscriptions through my affiliate link, because it is a product I genuinely use. There is real money in buying data from these services and using it as the input to your business.

Real Data Niches That Make Serious Money

Niche 1. Startup Data. Who Just Got Funded.

Sales teams, recruiters, PR agencies, software vendors. $50 to $500 a month. Funded startups start spending immediately. Vendors want to know who got money so they can reach out first. Data is in press releases, SEC filings, Y Combinator listings.

Crunchbase charges $29 to $99 a month and does tens of millions in revenue, mostly from public announcements. Pick a vertical (funded health-tech, funded e-commerce). Set AI to monitor weekly. Sell access for $49 a month. 100 subscribers = $4,900 a month.

Niche 2. Sites And Apps That Already Sold

Aspiring online business buyers, investors, brokers. $29 to $199 a month. Transaction data is publicly listed on Flippa, Acquire.com, Empire Flippers. Every completed sale shows niche, revenue, asking price, sale price.

The opportunity: nobody is selling a clean weekly digest of “the 20 most interesting online businesses that just sold this week.” Use AI to pull listings, summarize key stats, write the digest in minutes.

Niche 3. Side Hustle Income Data

$19 to $99 a month. Millions search “what side hustles actually make money.” They want real income numbers. Use AI to monitor public communities and extract posts where people share actual numbers. Organize by niche, method, and income range. Publish weekly.

Niche 4. Local Business Data. The Invisible Opportunity.

$99 to $500 per list. Web designers, marketing agencies, software companies, franchise consultants. Local businesses with no website, broken sites, no online booking, outdated info. Examples:

  • The No-Website List. Scan Google Maps for restaurants in Dallas with no website link. Sell “200 Restaurants in Dallas With No Website” to a web design agency for $299.
  • The Broken Site List. Scan dentist websites in a metro area. Flag broken contact forms, missing SSL, no mobile optimization. Sell to a dental web developer.
  • The No-Booking List. Hair salons, massage therapists, personal trainers with a site but no online booking. Sell to booking software companies as warm leads at $50 to $200 per qualified lead.

Niche 5. Competitor Ad And Tech Intelligence

Bloggers, affiliate marketers, ad agencies, SaaS companies. $49 to $299 a month. New bloggers want to know: what ad network should I apply to? What are the top sites in my niche using? Track 1,000 personal finance blogs. Sell a monthly “Personal Finance Blog Tech Report” for $29 a month. 200 subscribers = $5,800 a month.

Niche 6. Franchise Opportunity Data

$97 to $497 a month. SpyFu shows advertisers paying $9+ per click on “best franchise to buy” and “franchise loans”. The traffic is worth a lot. Compile data on which franchise locations in a region are underperforming. Sell to franchise consultants and aspiring buyers.

I’m Building A Local Orlando Version Of This

I am actually making a local version of my business for Orlando companies. We are featuring Orlando companies, talking about businesses in Orlando that make money, finding the data on what they do well, and packaging it. I am also part of a startup group here in Orlando.

This is the beauty of the local angle. Pick your city. There is so much public info out there. I could talk about real estate trends in New Smyrna, sell that data to local realtors, then branch out to a bunch of other places. Sell it not as just a list of data, but as a website where they can explore the info, maybe with AI that helps them use the data to get customers.

The Ethics Rules. Read This Twice.

Just because AI can find it doesn’t mean you should sell it. Data is sensitive. There is a lot of bad advice out there telling people to scrape stuff they should not scrape. Don’t do it. You are on the hook for what you do with AI. OpenAI has more lawsuits than they can handle right now precisely because they didn’t think about this carefully.

Stick to surface-level public info only:

  • Lists of websites that use a specific platform (WordPress, Shopify, etc.)
  • Lists of products available for drop shipping
  • Lists of web hosting companies
  • Public income reports that bloggers chose to publish
  • Listings on Flippa, Acquire.com, Empire Flippers (sellers chose to publish)
  • SEC filings, press releases, Y Combinator listings
  • Cost of living, salary surveys, and job benchmarks already published

Stay completely away from:

  • Personal info that requires login or authentication
  • Anything behind a paywall, even if you can technically get past it
  • Email lists you didn’t earn through opt-in
  • Health, financial, or legal records
  • Anything covered by GDPR or CCPA without compliance
  • Any site that says no in robots.txt or its terms of service
  • Sports data, where leagues often hold tight licensing

If you would not want someone to do this with your data, don’t do it with theirs. The whole point of this business is that the data is already public. If you have to bend a rule to get it, you are in the wrong niche.

Marcus’s 7-Step Action Plan

If you watched the video and want to actually do something with what you learned, here is the order of operations. Most people make nothing because they skip steps. Don’t skip.

  1. Pick one specific audience. Not “people who want to make money.” Pick something tighter. Realtors in Florida. Pickleball coaches. Funded health-tech startups. The narrower, the better.
  2. Figure out what data they pay to know. What do they wake up wanting? Realtors want lists of motivated sellers. Bloggers want to know what ad networks the top sites use. Sales reps want lists of just-funded companies.
  3. Build the stacked Google search. Use the operators. Stack at least three. Test in Google first. If the first 20 results are exactly what your buyer wants, you have a good search.
  4. Hand the search to AI with the master prompt. Tell AI to run, visit, and extract specific fields. This forces a task instead of recall, which dramatically reduces hallucination.
  5. Add a layer on top. This is the move that separates pros from beginners. Anyone can list 1,000 sites. You list 1,000 sites plus what platform they use, plus what ad network they monetize with, plus a niche tag, plus traffic estimate. The extra layer is the entire reason they pay you.
  6. Pick your monetization model. One-off product. Membership. SaaS tool. Or content site monetized by ads and affiliates. Pick one. Don’t try to do all four.
  7. Get in front of the buyer. The hardest part is not making the data. It is finding the buyer. Run ads in newsletters they read. Post in communities they live in. Distribution is the business.

The Big Takeaway

Stop trying to compete in crowded spaces where everyone is doing the exact same thing. Step back, look at the ecosystem, and ask yourself one question:

“What information do these people desperately need to succeed, and where is it hiding in plain sight?”

The data is already out there. It is sitting on public websites, completely disorganized, waiting for someone to find it and structure it. Your job is to let AI find it, organize it, and then sell it to the people who need it most.

Whether you sell it as a $50 digital product, a $99-a-month membership, or a $500-a-month software tool, you are in the best business on the internet. The Shovel Business. While everyone else is panning for gold, you are selling them the shovels.

Final aha: You don’t need to be a programmer. You don’t need funding. You need to know what data is valuable, know how to find it, and know who will pay for it. AI handles the rest.

Get The Full Notes, Prompts, And Training

If you want the full prompts, the search operator templates, the niche directory, and weekly calls to help you actually build this, head to JoinMarcus.com. Inside AI Profit Scoop Elite we have the list of websites we found with Manus, the search operator tool with over 1,000 built-in operators, the niche directory, and weekly group calls.

→ Join us at JoinMarcus.com

Earnings disclaimer: The results discussed in this video and these notes are not typical. Your results will vary based on your effort, education, business model, and market forces beyond our control. We make no earnings claims or return on investment claims. Business involves risk. Most people who try to make money online make nothing. Always conduct your own due diligence before starting any business.

Leave a Reply

Your email address will not be published. Required fields are marked *