Showing posts with label HTML. Show all posts
Showing posts with label HTML. Show all posts

11 May 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 185: How Input Sanitization Can Be Used to Counter AI Invisible Prompt Injection)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a post of 600 words on how input sanitization can be used AI invisible prompt injection"

Introduction

Invisible prompt injection is one of the most subtle and disruptive vulnerabilities in modern AI systems. It exploits the fact that large language models treat nearly all incoming text as potentially meaningful instructions. When hidden commands are embedded inside documents, images, or metadata, the model may follow them without the user ever noticing. This creates a dangerous gap between what the user thinks they are asking and what the AI is actually responding to. Among the available defenses, input sanitization stands out as one of the most practical and foundational. It does not solve the problem entirely, but it dramatically reduces the attack surface by filtering, normalizing, and constraining the content that reaches the model’s interpretive layer.

The first way input sanitization helps is by removing hidden characters and invisible control sequences. Many prompt injection attacks rely on zero‑width characters, Unicode tricks, or formatting markers that humans cannot see but the model interprets as part of the prompt. These characters can smuggle instructions into otherwise harmless text. Sanitization routines that strip or normalize these characters prevent the model from reading them as meaningful input. This is similar to how web applications sanitize user input to prevent hidden SQL commands from being executed. By reducing the 'invisible' portion of the input, sanitization makes it harder for attackers to hide instructions in plain sight.

A second benefit comes from filtering out hidden markup and metadata. Invisible prompt injection often hides inside HTML comments, alt‑text, EXIF metadata, or other fields that users rarely inspect. When an AI system ingests a webpage, document, or image, it may treat these hidden fields as part of the prompt. Sanitization can remove or neutralize these elements before they reach the model. For example, stripping HTML tags, flattening markup, or removing metadata ensures that only the visible, user‑intended content is passed to the AI. This prevents attackers from embedding instructions in places that humans cannot easily detect.

Another important role of sanitization is normalizing the structure of the input. Many prompt injection attacks rely on breaking the expected structure of the prompt - introducing unexpected delimiters, injecting new instruction blocks, or manipulating formatting to confuse the model. Sanitization can enforce a consistent structure by collapsing whitespace, removing unusual delimiters, or reformatting the input into a predictable template. This reduces the model’s exposure to structural manipulation and makes it harder for attackers to smuggle in new instruction boundaries.

Input sanitization also supports context isolation, a broader architectural strategy. By sanitizing external content before it is combined with user instructions, systems can ensure that only the user’s explicit prompt influences the model’s behavior. For example, if a user uploads a document for summarization, sanitization can remove any embedded instructions before the document is passed to the model. This prevents the document from overriding the user’s intent. Sanitization becomes a gatekeeper that separates trusted instructions from untrusted content.

A further advantage is reducing ambiguity, which is often exploited in invisible prompt injection. When input is messy, inconsistent, or contains mixed signals, the model may latch onto the wrong part of the text. Sanitization that clarifies formatting, removes noise, and enforces consistency helps the model focus on the intended content rather than on accidental or malicious artifacts. Cleaner input leads to more predictable behavior.

Finally, input sanitization is valuable because it is scalable and proactive. It does not require detecting every possible attack pattern. Instead, it reduces the overall complexity of the input space, making it harder for attackers to exploit obscure or unexpected pathways. While sanitization cannot eliminate invisible prompt injection entirely, it forms a crucial first line of defense - one that strengthens other safeguards such as retrieval grounding, context isolation, and self‑critique mechanisms.

Invisible prompt injection is a structural challenge, but input sanitization offers a practical, effective way to reduce its impact. By filtering, normalizing, and constraining the content that reaches AI systems, we can build more resilient models that remain aligned with user intent - even when confronted with hidden manipulation.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

13 January 2021

🔏MS Office: Excel for SQL Developers V (Formatting Output to HTML)

Some years back I found a tool to format the SQL and VB.Net code in posts (see hilite.me), tool which made blogging much easier, as I didn't had to format the code manually myself. However, showing the output of the queries into blog posts resumed mainly to making screenshots, which is unproductive and wastes space from my quota. Therefore, today I took the time to create a small Excel macro which allows formatting an MS Excel range to a HTML table. The macro is pretty basic, looping though range's cells:

'formats a range as html table
Public Function GetTable(rng As Range)
  Dim retval As String
  Dim i, j As Long
  
  retval = "<table style=""width: 90%;color:black;border-color:black;font-size:10px;"" border=""0"" cellpadding=""1"">"
  
  For i = 1 To rng.Rows.Count()
    retval = retval & "<tr style=""background-color:" & IIf(i = 1, "#b0c4de", "white") & ";font-weight:" & IIf(i = 1, "bold", "normal") & """>"
    
    For j = 1 To rng.Columns.Count()
       retval = retval & "<td align=""" & IIf(IsNumeric(rng.Cells(i, j)), "right", "left") & """>" & rng.Cells(i, j) & "</td>"
    Next
    
    retval = retval & "</tr>" & vbCrLf
  Next
  
  retval = retval & "</table>"
  
  GetTable = retval
End Function

Just copy the GetTable macro into a new module in Excel and provide the range with data as parameter.

 Unfortunately, copying macro's output to a text file introduces two double quotes where just one was supposed to be:

This requires as intermediary step to replace the two double quotes with one in Notepad (e.g. via the Replace functionality), respectively to remove manually the first and last double quotes. 

Notes:
1. Feel free to use and improve the macro. 
2. Further formatting can be added afterwards as seems fit. 

Happy coding!

29 August 2009

🛢DBMS: Extensible Markup Language (Definitions)

"A standard for a markup language, similar to HTML, that allows tags to be defined to describe any kind of data you have, making it very popular as a format for data feeds." (Mike Moran & Bill Hunt , "Search Engine Marketing, Inc", 2005)

"Facilitates the assignment of meaningful structures and definitions of data and services for use by multiple systems. XML simplifies the ability to transmit and share data." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"Simple and flexible text format used to represent data. XML was designed by the World Wide Web Consortium (W3C)." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"separates content from format, thus letting the browser decide how and where content gets displayed. XML is not a language, but a system for defining other languages so that they understand their vocabulary." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A platform-independent markup language for specifying the structure of data in a text document used for both data storage and the transfer of data." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"A way of representing data and data relationships in text files, typically for data exchange between software of different types." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed. , 2010)

"A metalanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document’s data elements. XML is designed to facilitate the exchange of structured documents such as orders and invoices over the Internet." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A specification for creating text files that contain hierarchical data." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"Has been created to overcome some difficulties proper to HTML (Hypertext Markup Language) that – developed as a means for instructing the Web browsers how to display a given Web page – is a ‘presentation-oriented’ markup tool. XML is called ‘extensible’ because, at the difference of HTML, is not characterized by a fixed format, but it lets the user design its own customized markup languages (using, e.g., a specific DTD, Document Type Description) for limitless different types of documents; XML is then a ‘content-oriented’ markup tool." (Gian P Zarri, "RDF and OWL for Knowledge Management", 2011)

"A set of rules for encoding documents electronically. XML was chosen as the standard message format because of its widespread use and open source development efforts." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"A standard metalanguage for defining markup languages that is based on Standard Generalized Markup Language (SGML)." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Extensible markup language (XML) is a simple, very flexible text format derived from SGML (standard generalized markup language). While XML was originally designed to meet the challenges of large-scale electronic publishing, it plays an increasingly significant role in the exchange of a wide variety of data on the web." (Kamalendu Pal, "Integrating Heterogeneous Enterprise Data Using Ontology in Supply Chain Management", 2019)

"A universal markup language for text and data, using nested tags to add structure and meta-information to the content." (Daniel Leuck et al, "Learning Java" 5th Ed., 2020)

"A 'best practices' subset of SGML that has been designed by the Worldwide Web Consortium (W3C) for use on the Internet." (Microfocus)

"A notation in which you describe the structure of information in a text document by enclosing information in user-defined tags that define the syntactic elements. A flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere. J2EE deployment descriptors are expressed in XML." (Microfocus)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.