FIELD LESSON · MACHINE UNDERSTANDING

Natural Language API Basics

Google Cloud Natural Language API is an audit instrument, not a ranking button. It shows what machines extract from the text you feed it — which may differ sharply from what humans assume the page communicates. This briefing teaches the API surface area operators need before running entity salience audits on live copy.

LESSON 1 · OPERATOR BRIEF

What this teaches

  • How to choose PLAIN_TEXT vs HTML input for copy-clarity vs rendered-page analysis
  • The six core methods: analyzeSentiment, analyzeEntities, analyzeEntitySentiment, analyzeSyntax, classifyText, annotateText
  • Why entity salience is the primary semantic SEO signal in NL output
  • How to frame audits around machine interpretation, not keyword presence

Why it matters for retrieval and machine interpretation

Retrieval systems and answer engines do not read your brand intention. They read strings, entities, and statistical patterns. If Natural Language returns the wrong dominant entities, downstream systems may classify the page under the wrong category, cite the wrong concept, or skip the page entirely during grounding. The operator question is always: What does the machine think this page is about?

CORE CONCEPTS

Core concepts

Entity salience

A score indicating how central an entity is within the document. High salience means the machine treats that entity as a primary subject — not a passing mention.

PLAIN_TEXT

Best for testing whether copy alone communicates intended meaning, stripped of layout and navigation noise.

HTML input

Useful when you need to analyze rendered page text as extracted from DOM — closer to what some crawlers see after parsing.

content vs gcsContentUri

Pass text directly via content for audits; use gcsContentUri when analyzing files already stored in Google Cloud Storage.

UTF-8 encoding

Required for correct character offsets in syntax analysis — critical when auditing non-ASCII or mixed-language copy.

annotateText

Runs multiple analyses in one request — the standard operator workflow for page-level briefings.

EXAMPLE INPUT

Example: premium domain brokerage homepage (anonymized pattern)

A boutique brokerage page targeting high-intent buyers. Human copy emphasizes “premium domain brokerage” and “advisory acquisition.”

Search our inventory of premium domains. Browse listings, compare prices, and make an offer instantly. Our marketplace connects buyers and sellers with thousands of verified domain names. Filter by category, TLD, and price. Start your domain search today.

MACHINE SIGNAL

Illustrative analyzeEntities output (pattern — not a live API dump)

Scores are representative of audit patterns NRLC has observed on similar page types.

EntityTypeSalience
marketplace OTHER 0.41
domain name OTHER 0.28
search OTHER 0.19
listing OTHER 0.12
brokerage OTHER 0.06

Despite human intent (“brokerage”), machine salience ranks marketplace, search, and inventory language above brokerage. The page reads as a marketplace to the API — not a high-touch advisory service.

Operational application

  • Run PLAIN_TEXT on hero + first 500 words before touching schema — isolate copy clarity from template noise.
  • If salience leaders do not match target entities, rewrite headings and first paragraphs before adding JSON-LD.
  • Compare PLAIN_TEXT vs HTML extraction when navigation or footer boilerplate dominates entity signals.
  • Use annotateText for baseline briefings; escalate to entity-only passes when salience drift is the primary failure.
  • Document target entities (Organization, Service, Place) and required salience ordering before implementation work.

Additional audit patterns

AI consultancy homepage

Copy mentions “AI visibility” repeatedly but salience ranks generic “marketing agency” and “SEO services” above proprietary methodology entities.

Local service page

City name and “plumber” salience low; “coupon,” “discount,” and “call now” dominate — machine reads promotional landing page, not local service entity.

SaaS landing page

Feature bullets drive salience toward “software” and “tool” while product name and category entity stay below 0.08.

Common mistakes

  • Assuming keywords in copy guarantee entity salience in API output.
  • Adding Organization schema while body copy still signals a different business model (marketplace vs consultancy).
  • Auditing full HTML when the failure is in hero copy — boilerplate drowns signal.
  • Treating sentiment score as a ranking metric instead of a reputation-risk indicator.
  • Skipping classification — a page can have correct entities but wrong topical category.

PRACTICAL EXERCISE

Practical exercise

  1. Select one money page (service, product, or homepage).
  2. Extract PLAIN_TEXT from the hero, H1, and first two paragraphs only.
  3. Run analyzeEntities (or annotateText with ENTITY analysis).
  4. List top 5 entities by salience. Circle any that conflict with your intended business entity.
  5. Write one sentence: “The machine thinks this page is about ___.”
  6. If that sentence is wrong, draft three copy changes before opening Schema Markup or Search Console.

QUIZ

Briefing quiz

Five questions. Check each answer for immediate feedback. Complete the briefing to record your score.

1 What does entity salience measure?
2 When should you use PLAIN_TEXT?
3 Why can a page that says “premium domain brokerage” still be interpreted as a marketplace?
4 Which method extracts dominant entities from a page?
5 What is annotateText useful for?

Build citation retrieval infrastructure for your organization.

For teams that need AI systems to retrieve, cite, and represent the right information, NRLC provides entity architecture, structured data engineering, retrieval signal implementation, and source-of-truth systems for AI-mediated discovery.