Website Analysis

Technical Checks

Understand the technical checks that assess your website's accessibility, security, and readiness for AI crawlers.

Technical Checks

Technical Checks evaluate the foundational infrastructure of your website that determines whether AI systems can access, crawl, and understand your content. Even the best-optimized content is invisible to AI if technical barriers prevent crawlers from reaching it.

These checks focus on accessibility, security, and AI-specific standards that are becoming increasingly important as AI-powered search grows.

HTTPS

What Is Checked

Whether your website is served over a secure HTTPS connection with a valid SSL/TLS certificate.

Why It Matters for GEO

HTTPS is a baseline trust signal for both traditional search engines and AI systems. AI crawlers may deprioritize or skip websites served over insecure HTTP connections. A valid SSL certificate confirms that the website is legitimate and that data transmitted between the server and client is encrypted. Without HTTPS, AI systems may flag your content as less trustworthy.

Possible Results

Status	Meaning
🟢 Pass	Site is served over HTTPS with a valid certificate
🔴 Fail	Site uses HTTP or has an invalid/expired certificate

How to Fix

Obtain an SSL/TLS certificate (free via Let's Encrypt or through your hosting provider)
Configure your web server to redirect all HTTP traffic to HTTPS
Update all internal links and resources to use HTTPS URLs
Ensure your certificate is set to auto-renew
Test your SSL configuration using SSL Labs

robots.txt — AI Crawler Access

What Is Checked

Your robots.txt file is analyzed for rules that affect 6 major AI crawlers:

Crawler	Operator	Purpose
GPTBot	OpenAI	Training data and browsing for ChatGPT
ClaudeBot	Anthropic	Training data for Claude
Google-Extended	Google	Training data for Gemini and AI features
Amazonbot	Amazon	Training data for Alexa and AI services
FacebookBot	Meta	Content understanding for Meta AI
Bytespider	ByteDance	Training data for TikTok and AI services

Why It Matters for GEO

Your robots.txt file is the primary mechanism for controlling which AI systems can access your content. Blocking AI crawlers means your content will not be included in the AI's knowledge base — making it impossible for those AI systems to recommend or cite your business. This is the most critical technical check for GEO: if AI cannot crawl your site, nothing else matters.

Possible Results

Status	Meaning
🟢 All allowed	All 6 AI crawlers have access
🟡 Partial	Some crawlers are blocked, others are allowed
🔴 Blocked	Most or all AI crawlers are blocked

How to Fix

Review your robots.txt file (located at https://yourdomain.com/robots.txt) and ensure you are not blocking AI crawlers you want to have access. A GEO-friendly robots.txt looks like:

User-agent: *
Allow: /

# Explicitly allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: FacebookBot
Allow: /

User-agent: Bytespider
Allow: /

Important: Some CMS platforms and security plugins add blanket bot-blocking rules. Check whether your CMS has added restrictive rules without your knowledge.

llms.txt

What Is Checked

Whether your website hosts a valid llms.txt file — a new standard for providing AI systems with a structured, human- and machine-readable summary of your website.

Why It Matters for GEO

The llms.txt file is an emerging standard designed specifically for the AI era. It provides language models with a concise overview of your website's content, purpose, and structure — acting as a "cover letter" for AI systems. While still new, early adoption signals AI-awareness and ensures that AI systems that support this standard can quickly understand your site.

Possible Results

Status	Meaning
🟢 Pass	Valid `llms.txt` file found at `/llms.txt`
🟡 Partial	File exists but has formatting or content issues
🔴 Fail	No `llms.txt` file found

How to Fix

Create an llms.txt file in your website's root directory. The file should follow the emerging standard format:

# Your Company Name

> Brief one-line description of your company or website.

## About

A paragraph describing your organization, what you do, and who you serve.

## Services

- Service 1: Brief description
- Service 2: Brief description
- Service 3: Brief description

## Key Pages

- [About Us](https://www.example.com/about): Learn about our company
- [Services](https://www.example.com/services): Our full service offering
- [Blog](https://www.example.com/blog): Industry insights and guides
- [Contact](https://www.example.com/contact): Get in touch

Best practices for llms.txt:

Keep it concise — this is a summary, not a full sitemap
Include your most important pages with brief descriptions
Update it when you add major new sections or pages
Use clear, descriptive language that AI can parse easily

sitemap.xml

What Is Checked

Whether your website has a valid sitemap.xml file and how many URLs it contains.

Why It Matters for GEO

A sitemap tells AI crawlers about all the pages on your website and when they were last updated. Without a sitemap, AI crawlers must discover pages by following links — potentially missing important content that is poorly linked. The sitemap also communicates update frequency, helping AI systems prioritize crawling recently changed content.

Possible Results

Status	Meaning
🟢 Pass	Valid sitemap found with URLs listed
🟡 Partial	Sitemap exists but has issues (empty, malformed, or very few URLs)
🔴 Fail	No sitemap found

How to Fix

Generate a sitemap using your CMS (most CMS platforms have built-in sitemap generation)
Ensure the sitemap includes all important pages (not just blog posts)
Add <lastmod> dates to help crawlers identify recently updated content
Reference the sitemap in your robots.txt:

Sitemap: https://www.example.com/sitemap.xml

Keep the sitemap under 50,000 URLs (use sitemap index files for larger sites)
Exclude pages you do not want indexed (e.g., admin pages, duplicate content)
Validate your sitemap using an XML validator

IndexNow

What Is Checked

Whether your website supports the IndexNow protocol for instant URL submission to search engines.

Why It Matters for GEO

IndexNow allows you to proactively notify search engines (including Bing, Yandex, and participating AI systems) when content is created or updated. Instead of waiting for crawlers to discover changes, IndexNow pushes updates instantly. This is particularly valuable for GEO because it ensures AI systems have access to your latest content as quickly as possible.

Possible Results

Status	Meaning
🟢 Pass	IndexNow support detected (API key or integration found)
🔴 Fail	No IndexNow support detected

How to Fix

Register for an IndexNow API key at indexnow.org
Place the API key file in your website's root directory
Integrate IndexNow into your CMS or publishing workflow to automatically submit URLs when content changes
Many CMS platforms offer IndexNow plugins (e.g., for WordPress, use the IndexNow plugin)
For custom implementations, submit a POST request to the IndexNow API whenever you publish or update a page:

curl -X POST "https://api.indexnow.org/indexnow" \
  -H "Content-Type: application/json" \
  -d '{
    "host": "www.example.com",
    "key": "your-api-key",
    "urlList": [
      "https://www.example.com/updated-page"
    ]
  }'

AI Crawler Access — Overall Assessment

What Is Checked

An overall assessment of how accessible your website is to AI crawlers, combining results from the robots.txt check, HTTPS status, sitemap availability, and other technical factors.

Why It Matters for GEO

This is the summary check that tells you whether AI systems can effectively discover and process your content. Even if your content is perfectly optimized for GEO, technical barriers at the crawler level nullify all that effort. This assessment provides a holistic view of your technical GEO readiness.

Possible Results

Status	Meaning
🟢 Fully accessible	AI crawlers can discover and access all content
🟡 Partially accessible	Some barriers exist that may limit AI crawler access
🔴 Largely inaccessible	Significant technical barriers preventing AI access

How to Fix

Address the individual technical checks above in this priority order:

HTTPS — Fix certificate issues first (foundational trust requirement)
robots.txt — Unblock AI crawlers (most common barrier)
sitemap.xml — Create or fix your sitemap (ensures content discovery)
llms.txt — Add an llms.txt file (emerging best practice)
IndexNow — Implement IndexNow (proactive indexing)

General Technical Best Practices for GEO

Page Speed

While not a scored check, page speed affects crawler behavior. Slow-loading pages may be abandoned by crawlers with time limits. Aim for a page load time under 3 seconds.

Canonical Tags

Use canonical tags to indicate the preferred version of duplicate or similar pages. This prevents AI from being confused by multiple versions of the same content.

Structured Data

Ensure your JSON-LD Schema.org markup is valid and free of errors. Malformed structured data is worse than no structured data — it can mislead AI about your content.

Crawl Budget

If your site has thousands of pages, be strategic about which pages are accessible. Use robots.txt and noindex tags to direct crawlers toward your most important content.

Server Reliability

Ensure your server responds consistently with proper HTTP status codes. Frequent 500 errors or timeouts will cause AI crawlers to reduce their crawl frequency for your site.

Keyword Coverage

Understand how keyword coverage analysis measures whether your content addresses the terms your audience searches for.

GEO Readiness Score

Understand the 10 criteria that determine how well your content is optimized for AI-powered search engines.