Natural Language API Basics for Machine Interpretation

What this teaches

How to choose PLAIN_TEXT vs HTML input for copy-clarity vs rendered-page analysis
The six core methods: analyzeSentiment, analyzeEntities, analyzeEntitySentiment, analyzeSyntax, classifyText, annotateText
Why entity salience is the primary semantic SEO signal in NL output
How to frame audits around machine interpretation, not keyword presence

Why it matters for retrieval and machine interpretation

Retrieval systems and answer engines do not read your brand intention. They read strings, entities, and statistical patterns. If Natural Language returns the wrong dominant entities, downstream systems may classify the page under the wrong category, cite the wrong concept, or skip the page entirely during grounding. The operator question is always: What does the machine think this page is about?

CORE CONCEPTS

Core concepts

Entity salience

A score indicating how central an entity is within the document. High salience means the machine treats that entity as a primary subject — not a passing mention.

PLAIN_TEXT

Best for testing whether copy alone communicates intended meaning, stripped of layout and navigation noise.

HTML input

Useful when you need to analyze rendered page text as extracted from DOM — closer to what some crawlers see after parsing.

content vs gcsContentUri

Pass text directly via content for audits; use gcsContentUri when analyzing files already stored in Google Cloud Storage.

UTF-8 encoding

Required for correct character offsets in syntax analysis — critical when auditing non-ASCII or mixed-language copy.

annotateText

Runs multiple analyses in one request — the standard operator workflow for page-level briefings.

EXAMPLE INPUT

Example: premium domain brokerage homepage (anonymized pattern)

A boutique brokerage page targeting high-intent buyers. Human copy emphasizes “premium domain brokerage” and “advisory acquisition.”

Search our inventory of premium domains. Browse listings, compare prices, and make an offer instantly. Our marketplace connects buyers and sellers with thousands of verified domain names. Filter by category, TLD, and price. Start your domain search today.

MACHINE SIGNAL

Illustrative analyzeEntities output (pattern — not a live API dump)

Scores are representative of audit patterns NRLC has observed on similar page types.

Entity	Type	Salience
marketplace	OTHER	0.41
domain name	OTHER	0.28
search	OTHER	0.19
listing	OTHER	0.12
brokerage	OTHER	0.06

Despite human intent (“brokerage”), machine salience ranks marketplace, search, and inventory language above brokerage. The page reads as a marketplace to the API — not a high-touch advisory service.

Operational application

Run PLAIN_TEXT on hero + first 500 words before touching schema — isolate copy clarity from template noise.
If salience leaders do not match target entities, rewrite headings and first paragraphs before adding JSON-LD.
Compare PLAIN_TEXT vs HTML extraction when navigation or footer boilerplate dominates entity signals.
Use annotateText for baseline briefings; escalate to entity-only passes when salience drift is the primary failure.
Document target entities (Organization, Service, Place) and required salience ordering before implementation work.

Additional audit patterns

AI consultancy homepage

Copy mentions “AI visibility” repeatedly but salience ranks generic “marketing agency” and “SEO services” above proprietary methodology entities.

Local service page

City name and “plumber” salience low; “coupon,” “discount,” and “call now” dominate — machine reads promotional landing page, not local service entity.

SaaS landing page

Feature bullets drive salience toward “software” and “tool” while product name and category entity stay below 0.08.

Common mistakes

Assuming keywords in copy guarantee entity salience in API output.
Adding Organization schema while body copy still signals a different business model (marketplace vs consultancy).
Auditing full HTML when the failure is in hero copy — boilerplate drowns signal.
Treating sentiment score as a ranking metric instead of a reputation-risk indicator.
Skipping classification — a page can have correct entities but wrong topical category.

PRACTICAL EXERCISE

Practical exercise

Select one money page (service, product, or homepage).
Extract PLAIN_TEXT from the hero, H1, and first two paragraphs only.
Run analyzeEntities (or annotateText with ENTITY analysis).
List top 5 entities by salience. Circle any that conflict with your intended business entity.
Write one sentence: “The machine thinks this page is about ___.”
If that sentence is wrong, draft three copy changes before opening Schema Markup or Search Console.

QUIZ

Briefing quiz

Five questions. Check each answer for immediate feedback. Complete the briefing to record your score.

1 What does entity salience measure?

How many times a keyword appears in the document. How central or important an entity is within the document. The sentiment polarity of a brand mention. Whether an entity has a Wikipedia link.

2 When should you use PLAIN_TEXT?

When you need to include full navigation and footer boilerplate. When testing whether the copy itself communicates the intended meaning clearly. Only when the page has no HTML version. When uploading files to Google Cloud Storage.

3 Why can a page that says “premium domain brokerage” still be interpreted as a marketplace?

Because Google Natural Language does not read English. Because other dominant entities like search, inventory, listings, and make offer may outweigh the brokerage entity. Because brokerage is always classified as marketplace in the API. Because PLAIN_TEXT mode ignores nouns.

4 Which method extracts dominant entities from a page?

analyzeSentiment analyzeSyntax analyzeEntities classifyText

5 What is annotateText useful for?

Generating JSON-LD automatically. Running multiple Natural Language analyses in one request. Submitting URLs directly to Google Search. Training custom LLM models on site content.

Natural Language API Basics