Initial commit

This commit is contained in:
2025-03-07 19:22:02 +01:00
commit 4a98255d83
55743 changed files with 5280367 additions and 0 deletions
+22
View File
@@ -0,0 +1,22 @@
Copyright (c) 2010-2019 Juriy "kangax" Zaytsev
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
+168
View File
@@ -0,0 +1,168 @@
# HTMLMinifier
[![NPM version](https://img.shields.io/npm/v/html-minifier-terser.svg)](https://www.npmjs.com/package/html-minifier-terser)
[![Build Status](https://github.com/terser/html-minifier-terser/workflows/CI/badge.svg)](https://github.com/terser/html-minifier-terser/actions?workflow=CI)
[HTMLMinifier](https://terser.org/html-minifier-terser/) is a highly **configurable**, **well-tested**, JavaScript-based HTML minifier.
## Installation
From NPM for use as a command line app:
```shell
npm install html-minifier-terser -g
```
From NPM for programmatic use:
```shell
npm install html-minifier-terser
```
## Usage
**Note** that almost all options are disabled by default. Experiment and find what works best for you and your project.
For command line usage please see `html-minifier-terser --help` for a list of available options.
**Sample command line:**
```bash
html-minifier-terser --collapse-whitespace --remove-comments --minify-js true
```
### Node.js
```js
const { minify } = require('html-minifier-terser');
const result = await minify('<p title="blah" id="moo">foo</p>', {
removeAttributeQuotes: true,
});
result; // '<p title=blah id=moo>foo</p>'
```
See [corresponding blog post](http://perfectionkills.com/experimenting-with-html-minifier) for all the gory details of [how it works](http://perfectionkills.com/experimenting-with-html-minifier#how_it_works), [description of each option](http://perfectionkills.com/experimenting-with-html-minifier#options), [testing results](http://perfectionkills.com/experimenting-with-html-minifier#field_testing) and [conclusions](http://perfectionkills.com/experimenting-with-html-minifier#cost_and_benefits).
Also see corresponding [Ruby wrapper](https://github.com/stereobooster/html_minifier), and for Node.js, [Grunt plugin](https://github.com/gruntjs/grunt-contrib-htmlmin), [Gulp plugin](https://github.com/pioug/gulp-html-minifier-terser), [Koa middleware wrapper](https://github.com/koajs/html-minifier) and [Express middleware wrapper](https://github.com/melonmanchan/express-minify-html).
For lint-like capabilities take a look at [HTMLLint](https://github.com/kangax/html-lint).
## Minification comparison
How does HTMLMinifier compare to other solutions — [HTML Minifier from Will Peavy](http://www.willpeavy.com/minifier/) (1st result in [Google search for "html minifier"](https://www.google.com/#q=html+minifier)) as well as [htmlcompressor.com](http://htmlcompressor.com) and [minimize](https://github.com/Swaagie/minimize)?
| Site | Original size *(KB)* | HTMLMinifier | minimize | Will Peavy | htmlcompressor.com |
| ---------------------------------------------------------------------------- |:--------------------:| ------------:| --------:| ----------:| ------------------:|
| [Google](https://www.google.com/) | 52 | **48** | 52 | 54 | n/a |
| [Stack Overflow](https://stackoverflow.com/) | 177 | **143** | 154 | 154 | n/a |
| [HTMLMinifier](https://github.com/kangax/html-minifier) | 252 | **171** | 230 | 250 | n/a |
| [Bootstrap CSS](https://getbootstrap.com/docs/3.3/css/) | 271 | **260** | 269 | 229 | n/a |
| [BBC](https://www.bbc.co.uk/) | 355 | **324** | 353 | 344 | n/a |
| [Amazon](https://www.amazon.co.uk/) | 466 | **430** | 456 | 474 | n/a |
| [Twitter](https://twitter.com/) | 469 | **394** | 462 | 513 | n/a |
| [Wikipedia](https://en.wikipedia.org/wiki/President_of_the_United_States) | 703 | **569** | 682 | 708 | n/a |
| [Eloquent Javascript](https://eloquentjavascript.net/1st_edition/print.html) | 870 | **815** | 840 | 864 | n/a |
| [NBC](https://www.nbc.com/) | 1701 | **1566** | 1689 | 1705 | n/a |
| [New York Times](https://www.nytimes.com/) | 1731 | **1583** | 1726 | 1680 | n/a |
| [ES draft](https://tc39.github.io/ecma262/) | 6296 | **5538** | 5733 | n/a | n/a |
## Options Quick Reference
Most of the options are disabled by default.
| Option | Description | Default |
|--------------------------------|-----------------|---------|
| `caseSensitive` | Treat attributes in case sensitive manner (useful for custom HTML tags) | `false` |
| `collapseBooleanAttributes` | [Omit attribute values from boolean attributes](http://perfectionkills.com/experimenting-with-html-minifier#collapse_boolean_attributes) | `false` |
| `collapseInlineTagWhitespace` | Don't leave any spaces between `display:inline;` elements when collapsing. Must be used in conjunction with `collapseWhitespace=true` | `false` |
| `collapseWhitespace` | [Collapse white space that contributes to text nodes in a document tree](http://perfectionkills.com/experimenting-with-html-minifier#collapse_whitespace) | `false` |
| `conservativeCollapse` | Always collapse to 1 space (never remove it entirely). Must be used in conjunction with `collapseWhitespace=true` | `false` |
| `continueOnParseError` | [Handle parse errors](https://html.spec.whatwg.org/multipage/parsing.html#parse-errors) instead of aborting. | `false` |
| `customAttrAssign` | Arrays of regex'es that allow to support custom attribute assign expressions (e.g. `'<div flex?="{{mode != cover}}"></div>'`) | `[ ]` |
| `customAttrCollapse` | Regex that specifies custom attribute to strip newlines from (e.g. `/ng-class/`) | |
| `customAttrSurround` | Arrays of regex'es that allow to support custom attribute surround expressions (e.g. `<input {{#if value}}checked="checked"{{/if}}>`) | `[ ]` |
| `customEventAttributes` | Arrays of regex'es that allow to support custom event attributes for `minifyJS` (e.g. `ng-click`) | `[ /^on[a-z]{3,}$/ ]` |
| `decodeEntities` | Use direct Unicode characters whenever possible | `false` |
| `html5` | Parse input according to HTML5 specifications | `true` |
| `ignoreCustomComments` | Array of regex'es that allow to ignore certain comments, when matched | `[ /^!/, /^\s*#/ ]` |
| `ignoreCustomFragments` | Array of regex'es that allow to ignore certain fragments, when matched (e.g. `<?php ... ?>`, `{{ ... }}`, etc.) | `[ /<%[\s\S]*?%>/, /<\?[\s\S]*?\?>/ ]` |
| `includeAutoGeneratedTags` | Insert tags generated by HTML parser | `true` |
| `keepClosingSlash` | Keep the trailing slash on singleton elements | `false` |
| `maxLineLength` | Specify a maximum line length. Compressed output will be split by newlines at valid HTML split-points |
| `minifyCSS` | Minify CSS in style elements and style attributes (uses [clean-css](https://github.com/jakubpawlowicz/clean-css)) | `false` (could be `true`, `Object`, `Function(text, type)`) |
| `minifyJS` | Minify JavaScript in script elements and event attributes (uses [Terser](https://github.com/terser/terser)) | `false` (could be `true`, `Object`, `Function(text, inline)`) |
| `minifyURLs` | Minify URLs in various attributes (uses [relateurl](https://github.com/stevenvachon/relateurl)) | `false` (could be `String`, `Object`, `Function(text)`) |
| `noNewlinesBeforeTagClose` | Never add a newline before a tag that closes an element | `false` |
| `preserveLineBreaks` | Always collapse to 1 line break (never remove it entirely) when whitespace between tags include a line break. Must be used in conjunction with `collapseWhitespace=true` | `false` |
| `preventAttributesEscaping` | Prevents the escaping of the values of attributes | `false` |
| `processConditionalComments` | Process contents of conditional comments through minifier | `false` |
| `processScripts` | Array of strings corresponding to types of script elements to process through minifier (e.g. `text/ng-template`, `text/x-handlebars-template`, etc.) | `[ ]` |
| `quoteCharacter` | Type of quote to use for attribute values (' or ") | |
| `removeAttributeQuotes` | [Remove quotes around attributes when possible](http://perfectionkills.com/experimenting-with-html-minifier#remove_attribute_quotes) | `false` |
| `removeComments` | [Strip HTML comments](http://perfectionkills.com/experimenting-with-html-minifier#remove_comments) | `false` |
| `removeEmptyAttributes` | [Remove all attributes with whitespace-only values](http://perfectionkills.com/experimenting-with-html-minifier#remove_empty_or_blank_attributes) | `false` (could be `true`, `Function(attrName, tag)`) |
| `removeEmptyElements` | [Remove all elements with empty contents](http://perfectionkills.com/experimenting-with-html-minifier#remove_empty_elements) | `false` |
| `removeOptionalTags` | [Remove optional tags](http://perfectionkills.com/experimenting-with-html-minifier#remove_optional_tags) | `false` |
| `removeRedundantAttributes` | [Remove attributes when value matches default.](http://perfectionkills.com/experimenting-with-html-minifier#remove_redundant_attributes) | `false` |
| `removeScriptTypeAttributes` | Remove `type="text/javascript"` from `script` tags. Other `type` attribute values are left intact | `false` |
| `removeStyleLinkTypeAttributes`| Remove `type="text/css"` from `style` and `link` tags. Other `type` attribute values are left intact | `false` |
| `removeTagWhitespace` | Remove space between attributes whenever possible. **Note that this will result in invalid HTML!** | `false` |
| `sortAttributes` | [Sort attributes by frequency](#sorting-attributes--style-classes) | `false` |
| `sortClassName` | [Sort style classes by frequency](#sorting-attributes--style-classes) | `false` |
| `trimCustomFragments` | Trim white space around `ignoreCustomFragments`. | `false` |
| `useShortDoctype` | [Replaces the `doctype` with the short (HTML5) doctype](http://perfectionkills.com/experimenting-with-html-minifier#use_short_doctype) | `false` |
### Sorting attributes / style classes
Minifier options like `sortAttributes` and `sortClassName` won't impact the plain-text size of the output. However, they form long repetitive chains of characters that should improve compression ratio of gzip used in HTTP compression.
## Special cases
### Ignoring chunks of markup
If you have chunks of markup you would like preserved, you can wrap them `<!-- htmlmin:ignore -->`.
### Minifying JSON-LD
You can minify script tags with JSON-LD by setting the option `{ processScripts: ['application/ld+json'] }`. Note that this minification is very rudimentary, it is mainly useful for removing newlines and excessive whitespace.
### Preserving SVG tags
SVG tags are automatically recognized, and when they are minified, both case-sensitivity and closing-slashes are preserved, regardless of the minification settings used for the rest of the file.
### Working with invalid markup
HTMLMinifier **can't work with invalid or partial chunks of markup**. This is because it parses markup into a tree structure, then modifies it (removing anything that was specified for removal, ignoring anything that was specified to be ignored, etc.), then it creates a markup out of that tree and returns it.
Input markup (e.g. `<p id="">foo`)
Internal representation of markup in a form of tree (e.g. `{ tag: "p", attr: "id", children: ["foo"] }`)
Transformation of internal representation (e.g. removal of `id` attribute)
Output of resulting markup (e.g. `<p>foo</p>`)
HTMLMinifier can't know that original markup was only half of the tree; it does its best to try to parse it as a full tree and it loses information about tree being malformed or partial in the beginning. As a result, it can't create a partial/malformed tree at the time of the output.
## Running benchmarks
Benchmarks for minified HTML:
```shell
cd benchmarks
npm install
npm run benchmark
```
## Running local server
```shell
npm run serve
```
Generated Vendored Executable
+308
View File
@@ -0,0 +1,308 @@
#!/usr/bin/env node
/**
* html-minifier-terser CLI tool
*
* The MIT License (MIT)
*
* Copyright (c) 2014-2016 Zoltan Frombach
*
* Permission is hereby granted, free of charge, to any person obtaining a copy of
* this software and associated documentation files (the "Software"), to deal in
* the Software without restriction, including without limitation the rights to
* use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
* the Software, and to permit persons to whom the Software is furnished to do so,
* subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
* FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
* COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
* IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
import fs from 'fs';
import path from 'path';
import { createRequire } from 'module';
import { camelCase } from 'camel-case';
import { paramCase } from 'param-case';
import { Command } from 'commander';
import { minify } from './src/htmlminifier.js';
const require = createRequire(import.meta.url);
const pkg = require('./package.json');
const program = new Command();
program.name(pkg.name);
program.version(pkg.version);
function fatal(message) {
console.error(message);
process.exit(1);
}
/**
* JSON does not support regexes, so, e.g., JSON.parse() will not create
* a RegExp from the JSON value `[ "/matchString/" ]`, which is
* technically just an array containing a string that begins and end with
* a forward slash. To get a RegExp from a JSON string, it must be
* constructed explicitly in JavaScript.
*
* The likelihood of actually wanting to match text that is enclosed in
* forward slashes is probably quite rare, so if forward slashes were
* included in an argument that requires a regex, the user most likely
* thought they were part of the syntax for specifying a regex.
*
* In the unlikely case that forward slashes are indeed desired in the
* search string, the user would need to enclose the expression in a
* second set of slashes:
*
* --customAttrSrround "[\"//matchString//\"]"
*/
function parseRegExp(value) {
if (value) {
return new RegExp(value.replace(/^\/(.*)\/$/, '$1'));
}
}
function parseJSON(value) {
if (value) {
try {
return JSON.parse(value);
} catch (e) {
if (/^{/.test(value)) {
fatal('Could not parse JSON value \'' + value + '\'');
}
return value;
}
}
}
function parseJSONArray(value) {
if (value) {
value = parseJSON(value);
return Array.isArray(value) ? value : [value];
}
}
function parseJSONRegExpArray(value) {
value = parseJSONArray(value);
return value && value.map(parseRegExp);
}
function parseString(value) {
return value;
}
const mainOptions = {
caseSensitive: 'Treat attributes in case sensitive manner (useful for SVG; e.g. viewBox)',
collapseBooleanAttributes: 'Omit attribute values from boolean attributes',
collapseInlineTagWhitespace: 'Collapse white space around inline tag',
collapseWhitespace: 'Collapse white space that contributes to text nodes in a document tree.',
conservativeCollapse: 'Always collapse to 1 space (never remove it entirely)',
continueOnParseError: 'Handle parse errors instead of aborting',
customAttrAssign: ['Arrays of regex\'es that allow to support custom attribute assign expressions (e.g. \'<div flex?="{{mode != cover}}"></div>\')', parseJSONRegExpArray],
customAttrCollapse: ['Regex that specifies custom attribute to strip newlines from (e.g. /ng-class/)', parseRegExp],
customAttrSurround: ['Arrays of regex\'es that allow to support custom attribute surround expressions (e.g. <input {{#if value}}checked="checked"{{/if}}>)', parseJSONRegExpArray],
customEventAttributes: ['Arrays of regex\'es that allow to support custom event attributes for minifyJS (e.g. ng-click)', parseJSONRegExpArray],
decodeEntities: 'Use direct Unicode characters whenever possible',
html5: 'Parse input according to HTML5 specifications',
ignoreCustomComments: ['Array of regex\'es that allow to ignore certain comments, when matched', parseJSONRegExpArray],
ignoreCustomFragments: ['Array of regex\'es that allow to ignore certain fragments, when matched (e.g. <?php ... ?>, {{ ... }})', parseJSONRegExpArray],
includeAutoGeneratedTags: 'Insert tags generated by HTML parser',
keepClosingSlash: 'Keep the trailing slash on singleton elements',
maxLineLength: ['Max line length', parseInt],
minifyCSS: ['Minify CSS in style elements and style attributes (uses clean-css)', parseJSON],
minifyJS: ['Minify Javascript in script elements and on* attributes (uses terser)', parseJSON],
minifyURLs: ['Minify URLs in various attributes (uses relateurl)', parseJSON],
noNewlinesBeforeTagClose: 'Never add a newline before a tag that closes an element',
preserveLineBreaks: 'Always collapse to 1 line break (never remove it entirely) when whitespace between tags include a line break.',
preventAttributesEscaping: 'Prevents the escaping of the values of attributes.',
processConditionalComments: 'Process contents of conditional comments through minifier',
processScripts: ['Array of strings corresponding to types of script elements to process through minifier (e.g. "text/ng-template", "text/x-handlebars-template", etc.)', parseJSONArray],
quoteCharacter: ['Type of quote to use for attribute values (\' or ")', parseString],
removeAttributeQuotes: 'Remove quotes around attributes when possible.',
removeComments: 'Strip HTML comments',
removeEmptyAttributes: 'Remove all attributes with whitespace-only values',
removeEmptyElements: 'Remove all elements with empty contents',
removeOptionalTags: 'Remove unrequired tags',
removeRedundantAttributes: 'Remove attributes when value matches default.',
removeScriptTypeAttributes: 'Removes the following attributes from script tags: text/javascript, text/ecmascript, text/jscript, application/javascript, application/x-javascript, application/ecmascript. Other type attribute values are left intact',
removeStyleLinkTypeAttributes: 'Remove type="text/css" from style and link tags. Other type attribute values are left intact.',
removeTagWhitespace: 'Remove space between attributes whenever possible',
sortAttributes: 'Sort attributes by frequency',
sortClassName: 'Sort style classes by frequency',
trimCustomFragments: 'Trim white space around ignoreCustomFragments.',
useShortDoctype: 'Replaces the doctype with the short (HTML5) doctype'
};
// configure commandline flags
const mainOptionKeys = Object.keys(mainOptions);
mainOptionKeys.forEach(function (key) {
const option = mainOptions[key];
if (Array.isArray(option)) {
key = key === 'minifyURLs' ? '--minify-urls' : '--' + paramCase(key);
key += option[1] === parseJSON ? ' [value]' : ' <value>';
program.option(key, option[0], option[1]);
} else if (~['html5', 'includeAutoGeneratedTags'].indexOf(key)) {
program.option('--no-' + paramCase(key), option);
} else {
program.option('--' + paramCase(key), option);
}
});
program.option('-o --output <file>', 'Specify output file (if not specified STDOUT will be used for output)');
function readFile(file) {
try {
return fs.readFileSync(file, { encoding: 'utf8' });
} catch (e) {
fatal('Cannot read ' + file + '\n' + e.message);
}
}
let config = {};
program.option('-c --config-file <file>', 'Use config file', function (configPath) {
const data = readFile(configPath);
try {
config = JSON.parse(data);
} catch (je) {
try {
config = require(path.resolve(configPath));
} catch (ne) {
fatal('Cannot read the specified config file.\nAs JSON: ' + je.message + '\nAs module: ' + ne.message);
}
}
mainOptionKeys.forEach(function (key) {
if (key in config) {
const option = mainOptions[key];
if (Array.isArray(option)) {
const value = config[key];
config[key] = option[1](typeof value === 'string' ? value : JSON.stringify(value));
}
}
});
});
program.option('--input-dir <dir>', 'Specify an input directory');
program.option('--output-dir <dir>', 'Specify an output directory');
program.option('--file-ext <text>', 'Specify an extension to be read, ex: html');
let content;
program.arguments('[files...]').action(function (files) {
content = files.map(readFile).join('');
}).parse(process.argv);
const programOptions = program.opts();
function createOptions() {
const options = {};
mainOptionKeys.forEach(function (key) {
const param = programOptions[key === 'minifyURLs' ? 'minifyUrls' : camelCase(key)];
if (typeof param !== 'undefined') {
options[key] = param;
} else if (key in config) {
options[key] = config[key];
}
});
return options;
}
function mkdir(outputDir, callback) {
fs.mkdir(outputDir, { recursive: true }, function (err) {
if (err) {
fatal('Cannot create directory ' + outputDir + '\n' + err.message);
}
callback();
});
}
function processFile(inputFile, outputFile) {
fs.readFile(inputFile, { encoding: 'utf8' }, async function (err, data) {
if (err) {
fatal('Cannot read ' + inputFile + '\n' + err.message);
}
let minified;
try {
minified = await minify(data, createOptions());
} catch (e) {
fatal('Minification error on ' + inputFile + '\n' + e.message);
}
fs.writeFile(outputFile, minified, { encoding: 'utf8' }, function (err) {
if (err) {
fatal('Cannot write ' + outputFile + '\n' + err.message);
}
});
});
}
function processDirectory(inputDir, outputDir, fileExt) {
fs.readdir(inputDir, function (err, files) {
if (err) {
fatal('Cannot read directory ' + inputDir + '\n' + err.message);
}
files.forEach(function (file) {
const inputFile = path.join(inputDir, file);
const outputFile = path.join(outputDir, file);
fs.stat(inputFile, function (err, stat) {
if (err) {
fatal('Cannot read ' + inputFile + '\n' + err.message);
} else if (stat.isDirectory()) {
processDirectory(inputFile, outputFile, fileExt);
} else if (!fileExt || path.extname(file) === '.' + fileExt) {
mkdir(outputDir, function () {
processFile(inputFile, outputFile);
});
}
});
});
});
}
const writeMinify = async () => {
const minifierOptions = createOptions();
let minified;
try {
minified = await minify(content, minifierOptions);
} catch (e) {
fatal('Minification error:\n' + e.message);
}
let stream = process.stdout;
if (programOptions.output) {
stream = fs.createWriteStream(programOptions.output)
.on('error', (e) => {
fatal('Cannot write ' + programOptions.output + '\n' + e.message);
});
}
stream.write(minified);
};
const { inputDir, outputDir, fileExt } = programOptions;
if (inputDir || outputDir) {
if (!inputDir) {
fatal('The option output-dir needs to be used with the option input-dir. If you are working with a single file, use -o.');
} else if (!outputDir) {
fatal('You need to specify where to write the output files with the option --output-dir');
}
processDirectory(inputDir, outputDir, fileExt);
} else if (content) { // Minifying one or more files specified on the CMD line
writeMinify();
} else { // Minifying input coming from STDIN
content = '';
process.stdin.setEncoding('utf8');
process.stdin.on('data', function (data) {
content += data;
}).on('end', writeMinify);
}
File diff suppressed because it is too large Load Diff
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
+105
View File
@@ -0,0 +1,105 @@
{
"name": "html-minifier-terser",
"description": "Highly configurable, well-tested, JavaScript-based HTML minifier.",
"version": "7.2.0",
"license": "MIT",
"repository": "https://github.com/terser/html-minifier-terser.git",
"bugs": "https://github.com/terser/html-minifier-terser/issues",
"homepage": "https://terser.org/html-minifier-terser/",
"author": "Daniel Ruf",
"maintainers": [
"Daniel Ruf <daniel@daniel-ruf.de",
"Alex Lam <alexlamsl@gmail.com>",
"Juriy Zaytsev <kangax@gmail.com> (http://perfectionkills.com/)",
"Sibiraj <sibiraj.dev>"
],
"contributors": [
"Gilmore Davidson (https://github.com/gilmoreorless)",
"Hugo Wetterberg <hugo@wetterberg.nu>",
"Zoltan Frombach <tssajo@gmail.com>"
],
"keywords": [
"cli",
"compress",
"compressor",
"css",
"html",
"htmlmin",
"javascript",
"min",
"minification",
"minifier",
"minify",
"optimize",
"optimizer",
"pack",
"packer",
"parse",
"parser",
"terser",
"uglifier",
"uglify"
],
"engines": {
"node": "^14.13.1 || >=16.0.0"
},
"type": "module",
"main": "./dist/htmlminifier.cjs",
"module": "./src/htmlminifier.js",
"exports": {
".": {
"require": "./dist/htmlminifier.cjs",
"import": "./src/htmlminifier.js"
},
"./dist/*": "./dist/*.js",
"./package.json": "./package.json"
},
"bin": {
"html-minifier-terser": "./cli.js"
},
"files": [
"dist/",
"src/",
"cli.js"
],
"scripts": {
"build": "rollup -c",
"test:node": "NODE_OPTIONS=--experimental-vm-modules jest --verbose",
"test:web": "NODE_OPTIONS=--experimental-vm-modules jest --verbose --environment=jsdom",
"test:watch": "NODE_OPTIONS=--experimental-vm-modules jest verbose --watch",
"test": "npm run test:node",
"serve": "vite",
"build:docs": "vite build --base /html-minifier-terser/",
"lint": "eslint . --ignore-path .gitignore",
"prepare": "is-ci || husky install"
},
"dependencies": {
"camel-case": "^4.1.2",
"clean-css": "~5.3.2",
"commander": "^10.0.0",
"entities": "^4.4.0",
"param-case": "^3.0.4",
"relateurl": "^0.2.7",
"terser": "^5.15.1"
},
"devDependencies": {
"@commitlint/cli": "^17.5.1",
"@jest/globals": "^29.5.0",
"@rollup/plugin-commonjs": "^24.0.1",
"@rollup/plugin-json": "^6.0.0",
"@rollup/plugin-node-resolve": "^15.0.2",
"@rollup/plugin-terser": "^0.4.1",
"alpinejs": "^3.12.0",
"commitlint-config-non-conventional": "^1.0.1",
"eslint": "^8.38.0",
"eslint-config-standard": "^17.0.0",
"husky": "^8.0.3",
"is-ci": "^3.0.1",
"jest": "^29.5.0",
"jest-environment-jsdom": "^29.5.0",
"lint-staged": "^13.2.1",
"rollup": "^3.20.2",
"rollup-plugin-polyfill-node": "^0.12.0",
"vite": "^4.2.1"
}
}
File diff suppressed because it is too large Load Diff
+565
View File
@@ -0,0 +1,565 @@
/*!
* HTML Parser By John Resig (ejohn.org)
* Modified by Juriy "kangax" Zaytsev
* Original code by Erik Arvidsson, Mozilla Public License
* http://erik.eae.net/simplehtmlparser/simplehtmlparser.js
*/
/*
* // Use like so:
* HTMLParser(htmlString, {
* start: function(tag, attrs, unary) {},
* end: function(tag) {},
* chars: function(text) {},
* comment: function(text) {}
* });
*
* // or to get an XML string:
* HTMLtoXML(htmlString);
*
* // or to get an XML DOM Document
* HTMLtoDOM(htmlString);
*
* // or to inject into an existing document/DOM node
* HTMLtoDOM(htmlString, document);
* HTMLtoDOM(htmlString, document.body);
*
*/
/* global ActiveXObject, DOMDocument */
import { replaceAsync } from './utils.js';
class CaseInsensitiveSet extends Set {
has(str) {
return super.has(str.toLowerCase());
}
}
// Regular Expressions for parsing tags and attributes
const singleAttrIdentifier = /([^\s"'<>/=]+)/;
const singleAttrAssigns = [/=/];
const singleAttrValues = [
// attr value double quotes
/"([^"]*)"+/.source,
// attr value, single quotes
/'([^']*)'+/.source,
// attr value, no quotes
/([^ \t\n\f\r"'`=<>]+)/.source
];
// https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-QName
const qnameCapture = (function () {
// based on https://www.npmjs.com/package/ncname
const combiningChar = '\\u0300-\\u0345\\u0360\\u0361\\u0483-\\u0486\\u0591-\\u05A1\\u05A3-\\u05B9\\u05BB-\\u05BD\\u05BF\\u05C1\\u05C2\\u05C4\\u064B-\\u0652\\u0670\\u06D6-\\u06E4\\u06E7\\u06E8\\u06EA-\\u06ED\\u0901-\\u0903\\u093C\\u093E-\\u094D\\u0951-\\u0954\\u0962\\u0963\\u0981-\\u0983\\u09BC\\u09BE-\\u09C4\\u09C7\\u09C8\\u09CB-\\u09CD\\u09D7\\u09E2\\u09E3\\u0A02\\u0A3C\\u0A3E-\\u0A42\\u0A47\\u0A48\\u0A4B-\\u0A4D\\u0A70\\u0A71\\u0A81-\\u0A83\\u0ABC\\u0ABE-\\u0AC5\\u0AC7-\\u0AC9\\u0ACB-\\u0ACD\\u0B01-\\u0B03\\u0B3C\\u0B3E-\\u0B43\\u0B47\\u0B48\\u0B4B-\\u0B4D\\u0B56\\u0B57\\u0B82\\u0B83\\u0BBE-\\u0BC2\\u0BC6-\\u0BC8\\u0BCA-\\u0BCD\\u0BD7\\u0C01-\\u0C03\\u0C3E-\\u0C44\\u0C46-\\u0C48\\u0C4A-\\u0C4D\\u0C55\\u0C56\\u0C82\\u0C83\\u0CBE-\\u0CC4\\u0CC6-\\u0CC8\\u0CCA-\\u0CCD\\u0CD5\\u0CD6\\u0D02\\u0D03\\u0D3E-\\u0D43\\u0D46-\\u0D48\\u0D4A-\\u0D4D\\u0D57\\u0E31\\u0E34-\\u0E3A\\u0E47-\\u0E4E\\u0EB1\\u0EB4-\\u0EB9\\u0EBB\\u0EBC\\u0EC8-\\u0ECD\\u0F18\\u0F19\\u0F35\\u0F37\\u0F39\\u0F3E\\u0F3F\\u0F71-\\u0F84\\u0F86-\\u0F8B\\u0F90-\\u0F95\\u0F97\\u0F99-\\u0FAD\\u0FB1-\\u0FB7\\u0FB9\\u20D0-\\u20DC\\u20E1\\u302A-\\u302F\\u3099\\u309A';
const digit = '0-9\\u0660-\\u0669\\u06F0-\\u06F9\\u0966-\\u096F\\u09E6-\\u09EF\\u0A66-\\u0A6F\\u0AE6-\\u0AEF\\u0B66-\\u0B6F\\u0BE7-\\u0BEF\\u0C66-\\u0C6F\\u0CE6-\\u0CEF\\u0D66-\\u0D6F\\u0E50-\\u0E59\\u0ED0-\\u0ED9\\u0F20-\\u0F29';
const extender = '\\xB7\\u02D0\\u02D1\\u0387\\u0640\\u0E46\\u0EC6\\u3005\\u3031-\\u3035\\u309D\\u309E\\u30FC-\\u30FE';
const letter = 'A-Za-z\\xC0-\\xD6\\xD8-\\xF6\\xF8-\\u0131\\u0134-\\u013E\\u0141-\\u0148\\u014A-\\u017E\\u0180-\\u01C3\\u01CD-\\u01F0\\u01F4\\u01F5\\u01FA-\\u0217\\u0250-\\u02A8\\u02BB-\\u02C1\\u0386\\u0388-\\u038A\\u038C\\u038E-\\u03A1\\u03A3-\\u03CE\\u03D0-\\u03D6\\u03DA\\u03DC\\u03DE\\u03E0\\u03E2-\\u03F3\\u0401-\\u040C\\u040E-\\u044F\\u0451-\\u045C\\u045E-\\u0481\\u0490-\\u04C4\\u04C7\\u04C8\\u04CB\\u04CC\\u04D0-\\u04EB\\u04EE-\\u04F5\\u04F8\\u04F9\\u0531-\\u0556\\u0559\\u0561-\\u0586\\u05D0-\\u05EA\\u05F0-\\u05F2\\u0621-\\u063A\\u0641-\\u064A\\u0671-\\u06B7\\u06BA-\\u06BE\\u06C0-\\u06CE\\u06D0-\\u06D3\\u06D5\\u06E5\\u06E6\\u0905-\\u0939\\u093D\\u0958-\\u0961\\u0985-\\u098C\\u098F\\u0990\\u0993-\\u09A8\\u09AA-\\u09B0\\u09B2\\u09B6-\\u09B9\\u09DC\\u09DD\\u09DF-\\u09E1\\u09F0\\u09F1\\u0A05-\\u0A0A\\u0A0F\\u0A10\\u0A13-\\u0A28\\u0A2A-\\u0A30\\u0A32\\u0A33\\u0A35\\u0A36\\u0A38\\u0A39\\u0A59-\\u0A5C\\u0A5E\\u0A72-\\u0A74\\u0A85-\\u0A8B\\u0A8D\\u0A8F-\\u0A91\\u0A93-\\u0AA8\\u0AAA-\\u0AB0\\u0AB2\\u0AB3\\u0AB5-\\u0AB9\\u0ABD\\u0AE0\\u0B05-\\u0B0C\\u0B0F\\u0B10\\u0B13-\\u0B28\\u0B2A-\\u0B30\\u0B32\\u0B33\\u0B36-\\u0B39\\u0B3D\\u0B5C\\u0B5D\\u0B5F-\\u0B61\\u0B85-\\u0B8A\\u0B8E-\\u0B90\\u0B92-\\u0B95\\u0B99\\u0B9A\\u0B9C\\u0B9E\\u0B9F\\u0BA3\\u0BA4\\u0BA8-\\u0BAA\\u0BAE-\\u0BB5\\u0BB7-\\u0BB9\\u0C05-\\u0C0C\\u0C0E-\\u0C10\\u0C12-\\u0C28\\u0C2A-\\u0C33\\u0C35-\\u0C39\\u0C60\\u0C61\\u0C85-\\u0C8C\\u0C8E-\\u0C90\\u0C92-\\u0CA8\\u0CAA-\\u0CB3\\u0CB5-\\u0CB9\\u0CDE\\u0CE0\\u0CE1\\u0D05-\\u0D0C\\u0D0E-\\u0D10\\u0D12-\\u0D28\\u0D2A-\\u0D39\\u0D60\\u0D61\\u0E01-\\u0E2E\\u0E30\\u0E32\\u0E33\\u0E40-\\u0E45\\u0E81\\u0E82\\u0E84\\u0E87\\u0E88\\u0E8A\\u0E8D\\u0E94-\\u0E97\\u0E99-\\u0E9F\\u0EA1-\\u0EA3\\u0EA5\\u0EA7\\u0EAA\\u0EAB\\u0EAD\\u0EAE\\u0EB0\\u0EB2\\u0EB3\\u0EBD\\u0EC0-\\u0EC4\\u0F40-\\u0F47\\u0F49-\\u0F69\\u10A0-\\u10C5\\u10D0-\\u10F6\\u1100\\u1102\\u1103\\u1105-\\u1107\\u1109\\u110B\\u110C\\u110E-\\u1112\\u113C\\u113E\\u1140\\u114C\\u114E\\u1150\\u1154\\u1155\\u1159\\u115F-\\u1161\\u1163\\u1165\\u1167\\u1169\\u116D\\u116E\\u1172\\u1173\\u1175\\u119E\\u11A8\\u11AB\\u11AE\\u11AF\\u11B7\\u11B8\\u11BA\\u11BC-\\u11C2\\u11EB\\u11F0\\u11F9\\u1E00-\\u1E9B\\u1EA0-\\u1EF9\\u1F00-\\u1F15\\u1F18-\\u1F1D\\u1F20-\\u1F45\\u1F48-\\u1F4D\\u1F50-\\u1F57\\u1F59\\u1F5B\\u1F5D\\u1F5F-\\u1F7D\\u1F80-\\u1FB4\\u1FB6-\\u1FBC\\u1FBE\\u1FC2-\\u1FC4\\u1FC6-\\u1FCC\\u1FD0-\\u1FD3\\u1FD6-\\u1FDB\\u1FE0-\\u1FEC\\u1FF2-\\u1FF4\\u1FF6-\\u1FFC\\u2126\\u212A\\u212B\\u212E\\u2180-\\u2182\\u3007\\u3021-\\u3029\\u3041-\\u3094\\u30A1-\\u30FA\\u3105-\\u312C\\u4E00-\\u9FA5\\uAC00-\\uD7A3';
const ncname = '[' + letter + '_][' + letter + digit + '\\.\\-_' + combiningChar + extender + ']*';
return '((?:' + ncname + '\\:)?' + ncname + ')';
})();
const startTagOpen = new RegExp('^<' + qnameCapture);
const startTagClose = /^\s*(\/?)>/;
export const endTag = new RegExp('^<\\/' + qnameCapture + '[^>]*>');
const doctype = /^<!DOCTYPE\s?[^>]+>/i;
let IS_REGEX_CAPTURING_BROKEN = false;
'x'.replace(/x(.)?/g, function (m, g) {
IS_REGEX_CAPTURING_BROKEN = g === '';
});
// Empty Elements
const empty = new CaseInsensitiveSet(['area', 'base', 'basefont', 'br', 'col', 'embed', 'frame', 'hr', 'img', 'input', 'isindex', 'keygen', 'link', 'meta', 'param', 'source', 'track', 'wbr']);
// Inline Elements
const inline = new CaseInsensitiveSet(['a', 'abbr', 'acronym', 'applet', 'b', 'basefont', 'bdo', 'big', 'br', 'button', 'cite', 'code', 'del', 'dfn', 'em', 'font', 'i', 'iframe', 'img', 'input', 'ins', 'kbd', 'label', 'map', 'noscript', 'object', 'q', 's', 'samp', 'script', 'select', 'small', 'span', 'strike', 'strong', 'sub', 'sup', 'svg', 'textarea', 'tt', 'u', 'var']);
// Elements that you can, intentionally, leave open
// (and which close themselves)
const closeSelf = new CaseInsensitiveSet(['colgroup', 'dd', 'dt', 'li', 'option', 'p', 'td', 'tfoot', 'th', 'thead', 'tr', 'source']);
// Attributes that have their values filled in disabled='disabled'
const fillAttrs = new CaseInsensitiveSet(['checked', 'compact', 'declare', 'defer', 'disabled', 'ismap', 'multiple', 'nohref', 'noresize', 'noshade', 'nowrap', 'readonly', 'selected']);
// Special Elements (can contain anything)
const special = new CaseInsensitiveSet(['script', 'style']);
// HTML5 tags https://html.spec.whatwg.org/multipage/indices.html#elements-3
// Phrasing Content https://html.spec.whatwg.org/multipage/dom.html#phrasing-content
const nonPhrasing = new CaseInsensitiveSet(['address', 'article', 'aside', 'base', 'blockquote', 'body', 'caption', 'col', 'colgroup', 'dd', 'details', 'dialog', 'div', 'dl', 'dt', 'fieldset', 'figcaption', 'figure', 'footer', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'head', 'header', 'hgroup', 'hr', 'html', 'legend', 'li', 'menuitem', 'meta', 'ol', 'optgroup', 'option', 'param', 'rp', 'rt', 'source', 'style', 'summary', 'tbody', 'td', 'tfoot', 'th', 'thead', 'title', 'tr', 'track', 'ul']);
const reCache = {};
function attrForHandler(handler) {
let pattern = singleAttrIdentifier.source +
'(?:\\s*(' + joinSingleAttrAssigns(handler) + ')' +
'[ \\t\\n\\f\\r]*(?:' + singleAttrValues.join('|') + '))?';
if (handler.customAttrSurround) {
const attrClauses = [];
for (let i = handler.customAttrSurround.length - 1; i >= 0; i--) {
attrClauses[i] = '(?:' +
'(' + handler.customAttrSurround[i][0].source + ')\\s*' +
pattern +
'\\s*(' + handler.customAttrSurround[i][1].source + ')' +
')';
}
attrClauses.push('(?:' + pattern + ')');
pattern = '(?:' + attrClauses.join('|') + ')';
}
return new RegExp('^\\s*' + pattern);
}
function joinSingleAttrAssigns(handler) {
return singleAttrAssigns.concat(
handler.customAttrAssign || []
).map(function (assign) {
return '(?:' + assign.source + ')';
}).join('|');
}
export class HTMLParser {
constructor(html, handler) {
this.html = html;
this.handler = handler;
}
async parse() {
let html = this.html;
const handler = this.handler;
const stack = []; let lastTag;
const attribute = attrForHandler(handler);
let last, prevTag, nextTag;
while (html) {
last = html;
// Make sure we're not in a script or style element
if (!lastTag || !special.has(lastTag)) {
let textEnd = html.indexOf('<');
if (textEnd === 0) {
// Comment:
if (/^<!--/.test(html)) {
const commentEnd = html.indexOf('-->');
if (commentEnd >= 0) {
if (handler.comment) {
await handler.comment(html.substring(4, commentEnd));
}
html = html.substring(commentEnd + 3);
prevTag = '';
continue;
}
}
// https://en.wikipedia.org/wiki/Conditional_comment#Downlevel-revealed_conditional_comment
if (/^<!\[/.test(html)) {
const conditionalEnd = html.indexOf(']>');
if (conditionalEnd >= 0) {
if (handler.comment) {
await handler.comment(html.substring(2, conditionalEnd + 1), true /* non-standard */);
}
html = html.substring(conditionalEnd + 2);
prevTag = '';
continue;
}
}
// Doctype:
const doctypeMatch = html.match(doctype);
if (doctypeMatch) {
if (handler.doctype) {
handler.doctype(doctypeMatch[0]);
}
html = html.substring(doctypeMatch[0].length);
prevTag = '';
continue;
}
// End tag:
const endTagMatch = html.match(endTag);
if (endTagMatch) {
html = html.substring(endTagMatch[0].length);
await replaceAsync(endTagMatch[0], endTag, parseEndTag);
prevTag = '/' + endTagMatch[1].toLowerCase();
continue;
}
// Start tag:
const startTagMatch = parseStartTag(html);
if (startTagMatch) {
html = startTagMatch.rest;
await handleStartTag(startTagMatch);
prevTag = startTagMatch.tagName.toLowerCase();
continue;
}
// Treat `<` as text
if (handler.continueOnParseError) {
textEnd = html.indexOf('<', 1);
}
}
let text;
if (textEnd >= 0) {
text = html.substring(0, textEnd);
html = html.substring(textEnd);
} else {
text = html;
html = '';
}
// next tag
let nextTagMatch = parseStartTag(html);
if (nextTagMatch) {
nextTag = nextTagMatch.tagName;
} else {
nextTagMatch = html.match(endTag);
if (nextTagMatch) {
nextTag = '/' + nextTagMatch[1];
} else {
nextTag = '';
}
}
if (handler.chars) {
await handler.chars(text, prevTag, nextTag);
}
prevTag = '';
} else {
const stackedTag = lastTag.toLowerCase();
const reStackedTag = reCache[stackedTag] || (reCache[stackedTag] = new RegExp('([\\s\\S]*?)</' + stackedTag + '[^>]*>', 'i'));
html = await replaceAsync(html, reStackedTag, async (_, text) => {
if (stackedTag !== 'script' && stackedTag !== 'style' && stackedTag !== 'noscript') {
text = text
.replace(/<!--([\s\S]*?)-->/g, '$1')
.replace(/<!\[CDATA\[([\s\S]*?)]]>/g, '$1');
}
if (handler.chars) {
await handler.chars(text);
}
return '';
});
await parseEndTag('</' + stackedTag + '>', stackedTag);
}
if (html === last) {
throw new Error('Parse Error: ' + html);
}
}
if (!handler.partialMarkup) {
// Clean up any remaining tags
await parseEndTag();
}
function parseStartTag(input) {
const start = input.match(startTagOpen);
if (start) {
const match = {
tagName: start[1],
attrs: []
};
input = input.slice(start[0].length);
let end, attr;
while (!(end = input.match(startTagClose)) && (attr = input.match(attribute))) {
input = input.slice(attr[0].length);
match.attrs.push(attr);
}
if (end) {
match.unarySlash = end[1];
match.rest = input.slice(end[0].length);
return match;
}
}
}
async function closeIfFound(tagName) {
if (findTag(tagName) >= 0) {
await parseEndTag('', tagName);
return true;
}
}
async function handleStartTag(match) {
const tagName = match.tagName;
let unarySlash = match.unarySlash;
if (handler.html5) {
if (lastTag === 'p' && nonPhrasing.has(tagName)) {
await parseEndTag('', lastTag);
} else if (tagName === 'tbody') {
await closeIfFound('thead');
} else if (tagName === 'tfoot') {
if (!await closeIfFound('tbody')) {
await closeIfFound('thead');
}
}
if (tagName === 'col' && findTag('colgroup') < 0) {
lastTag = 'colgroup';
stack.push({ tag: lastTag, attrs: [] });
if (handler.start) {
await handler.start(lastTag, [], false, '');
}
}
}
if (!handler.html5 && !inline.has(tagName)) {
while (lastTag && inline.has(lastTag)) {
await parseEndTag('', lastTag);
}
}
if (closeSelf.has(tagName) && lastTag === tagName) {
await parseEndTag('', tagName);
}
const unary = empty.has(tagName) || (tagName === 'html' && lastTag === 'head') || !!unarySlash;
const attrs = match.attrs.map(function (args) {
let name, value, customOpen, customClose, customAssign, quote;
const ncp = 7; // number of captured parts, scalar
// hackish work around FF bug https://bugzilla.mozilla.org/show_bug.cgi?id=369778
if (IS_REGEX_CAPTURING_BROKEN && args[0].indexOf('""') === -1) {
if (args[3] === '') { delete args[3]; }
if (args[4] === '') { delete args[4]; }
if (args[5] === '') { delete args[5]; }
}
function populate(index) {
customAssign = args[index];
value = args[index + 1];
if (typeof value !== 'undefined') {
return '"';
}
value = args[index + 2];
if (typeof value !== 'undefined') {
return '\'';
}
value = args[index + 3];
if (typeof value === 'undefined' && fillAttrs.has(name)) {
value = name;
}
return '';
}
let j = 1;
if (handler.customAttrSurround) {
for (let i = 0, l = handler.customAttrSurround.length; i < l; i++, j += ncp) {
name = args[j + 1];
if (name) {
quote = populate(j + 2);
customOpen = args[j];
customClose = args[j + 6];
break;
}
}
}
if (!name && (name = args[j])) {
quote = populate(j + 1);
}
return {
name,
value,
customAssign: customAssign || '=',
customOpen: customOpen || '',
customClose: customClose || '',
quote: quote || ''
};
});
if (!unary) {
stack.push({ tag: tagName, attrs });
lastTag = tagName;
unarySlash = '';
}
if (handler.start) {
await handler.start(tagName, attrs, unary, unarySlash);
}
}
function findTag(tagName) {
let pos;
const needle = tagName.toLowerCase();
for (pos = stack.length - 1; pos >= 0; pos--) {
if (stack[pos].tag.toLowerCase() === needle) {
break;
}
}
return pos;
}
async function parseEndTag(tag, tagName) {
let pos;
// Find the closest opened tag of the same type
if (tagName) {
pos = findTag(tagName);
} else { // If no tag name is provided, clean shop
pos = 0;
}
if (pos >= 0) {
// Close all the open elements, up the stack
for (let i = stack.length - 1; i >= pos; i--) {
if (handler.end) {
handler.end(stack[i].tag, stack[i].attrs, i > pos || !tag);
}
}
// Remove the open elements from the stack
stack.length = pos;
lastTag = pos && stack[pos - 1].tag;
} else if (tagName.toLowerCase() === 'br') {
if (handler.start) {
await handler.start(tagName, [], true, '');
}
} else if (tagName.toLowerCase() === 'p') {
if (handler.start) {
await handler.start(tagName, [], false, '', true);
}
if (handler.end) {
handler.end(tagName, []);
}
}
}
}
}
export const HTMLtoXML = (html) => {
let results = '';
const parser = new HTMLParser(html, {
start: function (tag, attrs, unary) {
results += '<' + tag;
for (let i = 0, len = attrs.length; i < len; i++) {
results += ' ' + attrs[i].name + '="' + (attrs[i].value || '').replace(/"/g, '&#34;') + '"';
}
results += (unary ? '/' : '') + '>';
},
end: function (tag) {
results += '</' + tag + '>';
},
chars: function (text) {
results += text;
},
comment: function (text) {
results += '<!--' + text + '-->';
},
ignore: function (text) {
results += text;
}
});
parser.parse();
return results;
};
export const HTMLtoDOM = (html, doc) => {
// There can be only one of these elements
const one = {
html: true,
head: true,
body: true,
title: true
};
// Enforce a structure for the document
const structure = {
link: 'head',
base: 'head'
};
if (doc) {
doc = doc.ownerDocument || (doc.getOwnerDocument && doc.getOwnerDocument()) || doc;
} else if (typeof DOMDocument !== 'undefined') {
doc = new DOMDocument();
} else if (typeof document !== 'undefined' && document.implementation && document.implementation.createDocument) {
doc = document.implementation.createDocument('', '', null);
} else if (typeof ActiveX !== 'undefined') {
doc = new ActiveXObject('Msxml.DOMDocument');
}
const elems = [];
const documentElement = doc.documentElement || (doc.getDocumentElement && doc.getDocumentElement());
// If we're dealing with an empty document then we
// need to pre-populate it with the HTML document structure
if (!documentElement && doc.createElement) {
(function () {
const html = doc.createElement('html');
const head = doc.createElement('head');
head.appendChild(doc.createElement('title'));
html.appendChild(head);
html.appendChild(doc.createElement('body'));
doc.appendChild(html);
})();
}
// Find all the unique elements
if (doc.getElementsByTagName) {
for (const i in one) {
one[i] = doc.getElementsByTagName(i)[0];
}
}
// If we're working with a document, inject contents into
// the body element
let curParentNode = one.body;
const parser = new HTMLParser(html, {
start: function (tagName, attrs, unary) {
// If it's a pre-built element, then we can ignore
// its construction
if (one[tagName]) {
curParentNode = one[tagName];
return;
}
const elem = doc.createElement(tagName);
for (const attr in attrs) {
elem.setAttribute(attrs[attr].name, attrs[attr].value);
}
if (structure[tagName] && typeof one[structure[tagName]] !== 'boolean') {
one[structure[tagName]].appendChild(elem);
} else if (curParentNode && curParentNode.appendChild) {
curParentNode.appendChild(elem);
}
if (!unary) {
elems.push(elem);
curParentNode = elem;
}
},
end: function (/* tag */) {
elems.length -= 1;
// Init the new parentNode
curParentNode = elems[elems.length - 1];
},
chars: function (text) {
curParentNode.appendChild(doc.createTextNode(text));
},
comment: function (/* text */) {
// create comment node
},
ignore: function (/* text */) {
// What to do here?
}
});
parser.parse();
return doc;
};
+68
View File
@@ -0,0 +1,68 @@
class Sorter {
sort(tokens, fromIndex = 0) {
for (let i = 0, len = this.keys.length; i < len; i++) {
const key = this.keys[i];
const token = key.slice(1);
let index = tokens.indexOf(token, fromIndex);
if (index !== -1) {
do {
if (index !== fromIndex) {
tokens.splice(index, 1);
tokens.splice(fromIndex, 0, token);
}
fromIndex++;
} while ((index = tokens.indexOf(token, fromIndex)) !== -1);
return this[key].sort(tokens, fromIndex);
}
}
return tokens;
}
}
class TokenChain {
add(tokens) {
tokens.forEach((token) => {
const key = '$' + token;
if (!this[key]) {
this[key] = [];
this[key].processed = 0;
}
this[key].push(tokens);
});
}
createSorter() {
const sorter = new Sorter();
sorter.keys = Object.keys(this).sort((j, k) => {
const m = this[j].length;
const n = this[k].length;
return m < n ? 1 : m > n ? -1 : j < k ? -1 : j > k ? 1 : 0;
}).filter((key) => {
if (this[key].processed < this[key].length) {
const token = key.slice(1);
const chain = new TokenChain();
this[key].forEach((tokens) => {
let index;
while ((index = tokens.indexOf(token)) !== -1) {
tokens.splice(index, 1);
}
tokens.forEach((token) => {
this['$' + token].processed++;
});
chain.add(tokens.slice(0));
});
sorter[key] = chain.createSorter();
return true;
}
return false;
});
return sorter;
}
}
export default TokenChain;
+11
View File
@@ -0,0 +1,11 @@
export async function replaceAsync(str, regex, asyncFn) {
const promises = [];
str.replace(regex, (match, ...args) => {
const promise = asyncFn(match, ...args);
promises.push(promise);
});
const data = await Promise.all(promises);
return str.replace(regex, () => data.shift());
}