Improve markdown syntax highlighting in kate #174

Merged
MicheleC merged 2 commits from feat/markdown_mods into master 2 years ago
Ray-V commented 2 years ago
Collaborator

I've made some changes to markdown.xml which work well for me.

Some, like the diff block, I think will be acceptable, others like the colouring are obviously personal preferences, which might not be, so this is posted as a wip.

I haven't tried to be too clever with the regexes - they work but could probably do with some improvement.

I think the original author probably used some StyleNum definitions to show the highlighting in their default colours [which seem to be set in katehighlight.cpp]. My preference is to use defStyleNum="dsNormal" and define the colours required with color="" and/or backgroundColor="" - easier for the user to choose their own colours - and set their own values for boolean options.


The changes are:

Add an 'idlink' to allow differentiation between an inline link and a link to an 'id' reference within the same page

+			 <!ENTITY idlinkregex '\[[^\]\^]+\]\s*\(#[^\(]*\)'>

The \s? is necessary because a ruler can be char-space-char-space-char as well as three or more contiguous characters.

+			 <!ENTITY rulerregex '^\s*([\*\-_]\s?){3,}\s*$'>

Show line only through the string and the two tildes at each end - any other characters attached won't be struck out.

+			 <!ENTITY strikeoutregex "~~[^~].*~~">

Set up a prediff context to display diffs (see later)
For a diff, any line starting with a '-', '+', '--- ', '+++ ' or '@@', or 'diff ' will be displayed as set by <itemData name="diff..
Separate diffheader1/2 to allow lines starting '--- ' to be set as bold

+			 <!ENTITY difflineremoveregex "^-[^-].*$">
+			 <!ENTITY difflineaddregex "^\+[^\+].*$">
+			 <!ENTITY diffheader1regex "^-{3} .*$">
+			 <!ENTITY diffheader2regex "^\+{3} .*$|^@@.*$">
+			 <!ENTITY diffheaderdiffregex "^diff .*$">

Don't display files with a .text extension as markdown

+		extensions="*.md;*.mmd;*.mdwn"

Bullet and numlists can be indented by tabs or multiples of 4 spaces - so they are prioritized over code to avoid code highlighting on indented items

+				<RegExpr context="bullet" String="^[,\t, {4}]*[\*\+\-]\s" />
+				<RegExpr context="numlist" String="^[,\t, {4}]*[\d]+\.\s" />
				<RegExpr attribute="code" String="^([\s]{4,}|\t+).*$" />

Add bold+italics option to blockquote, bullet lists, and num lists

+				<RegExpr attribute="bq-strongemphasis" String="&strongemphasisregex;" />
+				<RegExpr attribute="bl-strongemphasis" String="&strongemphasisregex;" />
+				<RegExpr attribute="nl-strongemphasis" String="&strongemphasisregex;" />

Add a context for a fenced code block enclosed between triple backticks - anything not a diff
Copied from <context attribute="comment" ... - not sure of the significance of all the values, but it works!

+			<context attribute="pre" lineEndContext="#stay" name="pre" >
+				<RegExpr String="```$" attribute="pre" context="#pop" endRegion="pre"/>
+			</context>


Add a context for a diff block
Expand on the above to display diffs with colours and bold as appropriate

+			<context attribute="prediff" lineEndContext="#stay" name="prediff" >
+				<RegExpr String="```$" attribute="prediff" context="#pop" endRegion="prediff"/>
+				<RegExpr attribute="difflineremove" String="&difflineremoveregex;" />
+				<RegExpr attribute="difflineadd" String="&difflineaddregex;" />
+				<RegExpr attribute="diffheader1" String="&diffheader1regex;" />
+				<RegExpr attribute="diffheader2" String="&diffheader2regex;" />
+				<RegExpr attribute="diffheaderdiff" String="&diffheaderdiffregex;" />
+			</context>

Regex for the fenced code blocks pre and prediff - 'prediff' has precedence over 'pre' to enable diff highlighting
The space between ``` and diff can be null, space(s), or tabs - ditto, for example, ``` shell

+				<RegExpr context="prediff" String="```\s{0,}diff" beginRegion="prediff" />
+				<RegExpr context="pre" String="```.*" beginRegion="pre" />

Display code additionally as this is code, that is as one line enclosed in single backticks, but not if the backtick has been escaped

+				<RegExpr attribute="code" String="[?!=\]`[^`].*[^\\]`" />

For a link to an internal 'id' reference

+				<RegExpr attribute="idlink" String="&idlinkregex;"/>

Colour choices and some bold for diff - based on *.diff/*.patch highlighting

+			<itemData name="diffheader1" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="#800000" bold="true" />
+			<itemData name="diffheader2" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="#800000" />
+			<itemData name="diffheaderdiff" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="#000000" bold="true" />
+			<itemData name="difflineremove" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="red" />
+			<itemData name="difflineadd" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="blue" />

Ruler colour

+			<itemData name="ruler" defStyleNum="dsNormal" color="#e200e2" bold="true" />

Add a background colour to strikeout - shows better than just the line through

+			<itemData name="strikeout" defStyleNum="dsNormal" strikeOut="true" backgroundColor="#fafaaf" />

Make the double space line break more visible - remove underscore and add background colour

+			<itemData name="linebreak" defStyleNum="dsNormal" backgroundColor="#F8E0FF" />

Colours for blockquote, bullet lists, and num lists text

+			<itemData name="blockquote" defStyleNum="dsNormal" color="black" />
to
+			<itemData name="nl-strongemphasis" defStyleNum="dsNormal" color="#b700b7" bold="true" italic="true" />

Add light grey background to fenced code blocks as with some markdown viewers

+			<itemData name="pre"  defStyleNum="dsNormal" backgroundColor="#f5f5f5" />
+			<itemData name="prediff"  defStyleNum="dsNormal" backgroundColor="#f5f5f5" />

Add light grey background and set color with color="" rather than through defStyleNum="dsBaseN"

+			<itemData name="code" defStyleNum="dsNormal" color="darkcyan" backgroundColor="#f5f5f5" />

Colour links blue, and show a link to an internal id in italics

+			<itemData name="reflink" defStyleNum="dsOthers" color="blue"  />
+			<itemData name="idlink" defStyleNum="dsOthers" color="blue" italic="true" />
+			<itemData name="inlinelink" defStyleNum="dsOthers" color="blue" />

The italic and bold values of false are defaults [as per language.dtd]

+			<itemData name="reflinktarget" defStyleNum="dsOthers" />

Colour image links blue, and tone down the default background colour, which was for defStyleNum="dsAlert"

+			<itemData name="inlineimage" defStyleNum="dsNormal" backgroundColor="#fff8f8" color="blue" />
+			<itemData name="refimage" defStyleNum="dsNormal" backgroundColor="#fff8f8" color="blue" />
I've made some changes to markdown.xml which work well for me. Some, like the diff block, I think will be acceptable, others like the colouring are obviously personal preferences, which might not be, so this is posted as a wip. I haven't tried to be too clever with the regexes - they work but could probably do with some improvement. I think the original author probably used some StyleNum definitions to show the highlighting in their default colours [which seem to be set in katehighlight.cpp]. My preference is to use defStyleNum="dsNormal" and define the colours required with color="" and/or backgroundColor="" - easier for the user to choose their own colours - and set their own values for boolean options. - - - #### The changes are: Add an 'idlink' to allow differentiation between an inline link and a link to an 'id' reference within the same page ``` + <!ENTITY idlinkregex '\[[^\]\^]+\]\s*\(#[^\(]*\)'> ``` --- The \s? is necessary because a ruler can be char-space-char-space-char as well as three or more contiguous characters. ``` + <!ENTITY rulerregex '^\s*([\*\-_]\s?){3,}\s*$'> ``` - - - Show line only through the string and the two tildes at each end - any other characters attached won't be struck out. ``` + <!ENTITY strikeoutregex "~~[^~].*~~"> ``` --- Set up a prediff context to display diffs [(see later)](#prediff) For a diff, any line starting with a '-', '+', '--- ', '+++ ' or '@@', or 'diff ' will be displayed as set by `<itemData name="diff..` Separate diffheader1/2 to allow lines starting '--- ' to be set as bold ``` + <!ENTITY difflineremoveregex "^-[^-].*$"> + <!ENTITY difflineaddregex "^\+[^\+].*$"> + <!ENTITY diffheader1regex "^-{3} .*$"> + <!ENTITY diffheader2regex "^\+{3} .*$|^@@.*$"> + <!ENTITY diffheaderdiffregex "^diff .*$"> ``` --- Don't display files with a .text extension as markdown ``` + extensions="*.md;*.mmd;*.mdwn" ``` --- Bullet and numlists can be indented by tabs or multiples of 4 spaces - so they are prioritized over code to avoid code highlighting on indented items ``` + <RegExpr context="bullet" String="^[,\t, {4}]*[\*\+\-]\s" /> + <RegExpr context="numlist" String="^[,\t, {4}]*[\d]+\.\s" /> <RegExpr attribute="code" String="^([\s]{4,}|\t+).*$" /> ``` --- Add bold+italics option to blockquote, bullet lists, and num lists ``` + <RegExpr attribute="bq-strongemphasis" String="&strongemphasisregex;" /> + <RegExpr attribute="bl-strongemphasis" String="&strongemphasisregex;" /> + <RegExpr attribute="nl-strongemphasis" String="&strongemphasisregex;" /> ``` --- Add a context for a fenced code block enclosed between triple backticks - anything not a diff Copied from `<context attribute="comment" ...` - not sure of the significance of all the values, but it works! ``` + <context attribute="pre" lineEndContext="#stay" name="pre" > + <RegExpr String="```$" attribute="pre" context="#pop" endRegion="pre"/> + </context> ``` <a id=prediff></a> Add a context for a diff block Expand on the above to display diffs with colours and bold as appropriate ``` + <context attribute="prediff" lineEndContext="#stay" name="prediff" > + <RegExpr String="```$" attribute="prediff" context="#pop" endRegion="prediff"/> + <RegExpr attribute="difflineremove" String="&difflineremoveregex;" /> + <RegExpr attribute="difflineadd" String="&difflineaddregex;" /> + <RegExpr attribute="diffheader1" String="&diffheader1regex;" /> + <RegExpr attribute="diffheader2" String="&diffheader2regex;" /> + <RegExpr attribute="diffheaderdiff" String="&diffheaderdiffregex;" /> + </context> ``` --- Regex for the fenced code blocks pre and prediff - 'prediff' has precedence over 'pre' to enable diff highlighting The space between \`\`\` and diff can be null, space(s), or tabs - ditto, for example, \`\`\` shell ``` + <RegExpr context="prediff" String="```\s{0,}diff" beginRegion="prediff" /> + <RegExpr context="pre" String="```.*" beginRegion="pre" /> ``` --- Display code additionally as `this is code`, that is as one line enclosed in single backticks, but not if the backtick has been escaped ``` + <RegExpr attribute="code" String="[?!=\]`[^`].*[^\\]`" /> ``` --- For a link to an internal 'id' reference ``` + <RegExpr attribute="idlink" String="&idlinkregex;"/> ``` --- Colour choices and some bold for diff - based on \*.diff/\*.patch highlighting ``` + <itemData name="diffheader1" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="#800000" bold="true" /> + <itemData name="diffheader2" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="#800000" /> + <itemData name="diffheaderdiff" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="#000000" bold="true" /> + <itemData name="difflineremove" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="red" /> + <itemData name="difflineadd" defStyleNum="dsNormal" backgroundColor="#f5f5f5" color="blue" /> ``` --- Ruler colour ``` + <itemData name="ruler" defStyleNum="dsNormal" color="#e200e2" bold="true" /> ``` --- Add a background colour to strikeout - shows better than just the line through ``` + <itemData name="strikeout" defStyleNum="dsNormal" strikeOut="true" backgroundColor="#fafaaf" /> ``` --- Make the double space line break more visible - remove underscore and add background colour ``` + <itemData name="linebreak" defStyleNum="dsNormal" backgroundColor="#F8E0FF" /> ``` --- Colours for blockquote, bullet lists, and num lists text ``` + <itemData name="blockquote" defStyleNum="dsNormal" color="black" /> to + <itemData name="nl-strongemphasis" defStyleNum="dsNormal" color="#b700b7" bold="true" italic="true" /> ``` --- Add light grey background to fenced code blocks as with some markdown viewers ``` + <itemData name="pre" defStyleNum="dsNormal" backgroundColor="#f5f5f5" /> + <itemData name="prediff" defStyleNum="dsNormal" backgroundColor="#f5f5f5" /> ``` --- Add light grey background and set color with color="" rather than through defStyleNum="dsBaseN" ``` + <itemData name="code" defStyleNum="dsNormal" color="darkcyan" backgroundColor="#f5f5f5" /> ``` --- Colour links blue, and show a link to an internal id in italics ``` + <itemData name="reflink" defStyleNum="dsOthers" color="blue" /> + <itemData name="idlink" defStyleNum="dsOthers" color="blue" italic="true" /> + <itemData name="inlinelink" defStyleNum="dsOthers" color="blue" /> ``` --- The italic and bold values of false are defaults [as per language.dtd] ``` + <itemData name="reflinktarget" defStyleNum="dsOthers" /> ``` --- Colour image links blue, and tone down the default background colour, which was for defStyleNum="dsAlert" ``` + <itemData name="inlineimage" defStyleNum="dsNormal" backgroundColor="#fff8f8" color="blue" /> + <itemData name="refimage" defStyleNum="dsNormal" backgroundColor="#fff8f8" color="blue" /> ```
Ray-V added the PR/wip label 2 years ago
Ray-V changed title from WIP: improve markdown syntax highlighting in kate to improve markdown syntax highlighting in kate 2 years ago
Ray-V removed the PR/wip label 2 years ago
Owner

Hi @Ray-V, thanks for this. I will take a look during the weekend.

Hi @Ray-V, thanks for this. I will take a look during the weekend.
MicheleC reviewed 2 years ago
<!ENTITY autolinkregex '&lt;(https?|ftp):[^\"&gt;\s]+&gt;'>
<!ENTITY mailtolinkregex '&lt;(?:mailto:)?([-.\w]+\@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+)&gt;'>
<!ENTITY rulerregex '^\s*([\*\-_]){3,}\s*$'>
<!ENTITY rulerregex '^\s*([\*\-_]\s?){3,}\s*$'>
Owner

This may actually need to be

<!ENTITY rulerregex '^\s*([\*\-_]\s*){3,}\s*$'>

since it seems multiple separators are allowed in between *-_

This may actually need to be ``` <!ENTITY rulerregex '^\s*([\*\-_]\s*){3,}\s*$'> ``` since it seems multiple separators are allowed in between *-_
MicheleC marked this conversation as resolved
MicheleC reviewed 2 years ago
<!-- two spaces at end of line generates linebreak -->
<!ENTITY linebreakregex " $">
<!ENTITY strikeoutregex "[~]{2}[^~].*[^~][~]{2}"> <!-- pandoc style -->
<!ENTITY strikeoutregex "~~[^~].*~~">
Owner

<!ENTITY strikeoutregex "~~[^~].*~~"
IMO, this should be:
<!ENTITY strikeoutregex "~~[^~]+~~"
We don't want .* to eat tildas on a long tilda sequence.
What do you think?

EDIT: this seems to work fine already with the current regex, although the new proposed one is shorter. Did you have any specific problem that caused you to modify the original regex?

```<!ENTITY strikeoutregex "~~[^~].*~~"``` IMO, this should be: ```<!ENTITY strikeoutregex "~~[^~]+~~"``` We don't want ```.*``` to eat tildas on a long tilda sequence. What do you think? EDIT: this seems to work fine already with the current regex, although the new proposed one is shorter. Did you have any specific problem that caused you to modify the original regex?
MicheleC marked this conversation as resolved
MicheleC reviewed 2 years ago
<RegExpr attribute="emphasis" String="&emphasisregex;" />
<RegExpr attribute="ruler" String="&rulerregex;" />
<RegExpr context="bullet" String="^[,\t, {4}]*[\*\+\-]\s" />
<RegExpr context="numlist" String="^[,\t, {4}]*[\d]+\.\s" />
Owner

[,\t, {4}]*
This part of the regex does not seems right to me. It would match any line starting with comma, tab, space, open brace, number 4, close brace, followed by * or + or -, separator and the rest of the list.
These lines match:

4444+ Create a list by starting a line with `+`, `-`, or `*`
,,,,+ Create a list by starting a line with `+`, `-`, or `*`
{{{{+ Create a list by starting a line with `+`, `-`, or `*`
}}}}+ Create a list by starting a line with `+`, `-`, or `*`

These lines don't match:

....+ Create a list by starting a line with `+`, `-`, or `*`
aaaa+ Create a list by starting a line with `+`, `-`, or `*`

Probably what we need here is:

<RegExpr context="bullet" String="^(\t|\s)*[\*\+\-]\s" />
<RegExpr context="numlist" String="^(\t|\s)*[\d]+\.\s" />
```[,\t, {4}]*``` This part of the regex does not seems right to me. It would match any line starting with comma, tab, space, open brace, number 4, close brace, followed by * or + or -, separator and the rest of the list. These lines match: ``` 4444+ Create a list by starting a line with `+`, `-`, or `*` ,,,,+ Create a list by starting a line with `+`, `-`, or `*` {{{{+ Create a list by starting a line with `+`, `-`, or `*` }}}}+ Create a list by starting a line with `+`, `-`, or `*` ``` These lines don't match: ``` ....+ Create a list by starting a line with `+`, `-`, or `*` aaaa+ Create a list by starting a line with `+`, `-`, or `*` ``` Probably what we need here is: ``` <RegExpr context="bullet" String="^(\t|\s)*[\*\+\-]\s" /> <RegExpr context="numlist" String="^(\t|\s)*[\d]+\.\s" /> ```
MicheleC marked this conversation as resolved
MicheleC reviewed 2 years ago
</context>
<context attribute="pre" lineEndContext="#stay" name="pre" >
<RegExpr String="```$" attribute="pre" context="#pop" endRegion="pre"/>
Owner

We need String="```" without end-of-line $ to handle inline code sections too.

We need String="```" without end-of-line $ to handle inline code sections too.
MicheleC marked this conversation as resolved
MicheleC reviewed 2 years ago
<RegExpr attribute="code" String="`[^`]+`" />
<RegExpr context="comment" String="&lt;!--" beginRegion="comment" />
<RegExpr context="prediff" String="```\s{0,}diff" beginRegion="prediff" />
<RegExpr context="pre" String="```.*" beginRegion="pre" />
Owner

Just String="```" without .* will do.

EDIT: actually we need to distinguish between the case ``` is used for inline code orfenced code block. Probably need more changes around here. We also need to be able to handle fenced blocks surrounded by ~~~

Just String="```" without .* will do. EDIT: actually we need to distinguish between the case ``` is used for inline code orfenced code block. Probably need more changes around here. We also need to be able to handle fenced blocks surrounded by ~~~
MicheleC marked this conversation as resolved
Owner

@Ray-V I am going through your work and commenting as I go. Once I am done, I will probably push a commit on top of yours with some of the changes I suggested, so that you can test them as well. In the end when everything is ready, we will squash them into a single commit and push to master.

@Ray-V I am going through your work and commenting as I go. Once I am done, I will probably push a commit on top of yours with some of the changes I suggested, so that you can test them as well. In the end when everything is ready, we will squash them into a single commit and push to master.
MicheleC reviewed 2 years ago
<RegExpr context="comment" String="&lt;!--" beginRegion="comment" />
<RegExpr context="prediff" String="```\s{0,}diff" beginRegion="prediff" />
<RegExpr context="pre" String="```.*" beginRegion="pre" />
<RegExpr attribute="code" String="`[^`].*[^\\]`" />
Owner

As mentioned above, we need a rule to handle also inline code

As mentioned above, we need a rule to handle also inline ```code```
MicheleC marked this conversation as resolved
MicheleC reviewed 2 years ago
<itemData name="diffheader2" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="#800000" />
<itemData name="diffheaderdiff" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="#000000" bold="true" />
<itemData name="difflineremove" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="red" />
<itemData name="difflineadd" defStyleNum="dsNormal" backgroundColor="#eeeeee" color="blue" />
Owner

blue and red are not bad, but it may be worth considering red abd green backgrounds, as in many online diff tools.

blue and red are not bad, but it may be worth considering red abd green backgrounds, as in many online diff tools.
MicheleC marked this conversation as resolved
MicheleC reviewed 2 years ago
<itemData name="code" defStyleNum="dsNormal" color="darkcyan" backgroundColor="#eeeeee" />
<itemData name="reflink" defStyleNum="dsOthers" color="blue" />
<itemData name="idlink" defStyleNum="dsOthers" color="blue" italic="true" />
<itemData name="inlinelink" defStyleNum="dsOthers" color="blue" />
Owner

I think it is quite normal for links to be underlined.
Of course it is a very subjective and personal thing.

@SlavekB @blu.256: what do you think? Should we keep links underlined or not?

I think it is quite normal for links to be underlined. Of course it is a very subjective and personal thing. @SlavekB @blu.256: what do you think? Should we keep links underlined or not?
Owner

Discussed with Slavek. Since this is not really a clickable link in katepart, it makes more sense to leave the item not underlined.

Discussed with Slavek. Since this is not really a clickable link in katepart, it makes more sense to leave the item not underlined.
MicheleC marked this conversation as resolved
Owner

NOTE: I will be pushing a commit later today. I am marking issues 'resolved' as I am working through changes locally. This helps me tracking the resolution of the various issues.

NOTE: I will be pushing a commit later today. I am marking issues 'resolved' as I am working through changes locally. This helps me tracking the resolution of the various issues.
MicheleC force-pushed feat/markdown_mods from 3861f22b30 to c5f768e605 2 years ago
Owner

@Ray-V I have added a commit on top of yours (and rebased on top of current master). Could you please test if everything works fine for you and feedback about it?

EDIT: I tested using the test file found here plus some extra changes I made myself.

@Ray-V I have added a commit on top of yours (and rebased on top of current master). Could you please test if everything works fine for you and feedback about it? EDIT: I tested using the test file found [here](https://markdown-it.github.io/) plus some extra changes I made myself.
Owner

@Ray-V
reminder, if you could have a look at the adidtional changes pushed on top of your commits. Thank you :-)

@Ray-V reminder, if you could have a look at the adidtional changes pushed on top of your commits. Thank you :-)
Ray-V commented 2 years ago
Poster
Collaborator

since it seems multiple separators are allowed in between *-_

Gitea allows a zero to three space lead-in with multiple separators, so let's base the regex on that.


Bullet and Num lists should be indented by tabs or groups of spaces set to tab_width == 4 spaces.
Any deviation is allowed, for example indentation==3 spaces would show in the list, but it would be better if the kate highlighting didn't work for any list item which is out of line.

The original markdown spec states:

List markers must be followed by one or more spaces or a tab.

although Markdown.pl allows for one or more tabs.

Gitea only supports one tab or 1-4 spaces, so let's base the regex on that.

The code regex needs modifying otherwise any indentation by any other number of spaces shows as code.


The regex for the code between single backticks doesn't work where there is more than one block in a line.
The whole string between the first and last backtick is highlighted where it should only be the individual code blocks that are highlighted.

AND

Re: the addition of inlinecoderegex - this needs to be for any number of backticks, which gitea supports.

Let's reserve the attribute 'inlinecode' for a single line code block, and 'code' for whatever is indented by tab[s] or 4x spaces.

The inlinecode string seems to cover these two scenarios, except it doesn't provide a check of whether the ending number of backticks equals the number at the start. I would think though that in 99.99% of cases a single backtick will be used.


blue and red are not bad, but it may be worth considering red abd green backgrounds, as in many online diff tools.

This is syntax highlighting for kate and I think it would be better to colour the text, to be consistent with diff.xml.
The addition of the grey background highlights the block and the extent of the diff as there are other elements in a markdown document which aren't present in a diff/patch.


For the strikeout, any string containing a tilde, whether escaped or not, won't be struck out.
And there should be no tab or space after the starting tilde pair.


The 'tilde' regexes don't work for a ~~~diff ... ~~~ block - it just shows as normal text.
Add tildes to the pre* regexes - not strictly correct because start and end must be the same character, but who's not going to use matching sets at each end?


I think there's more that can be added.
Gitea for example, supports the commonmark use of a backslash for a line-break.

>since it seems multiple separators are allowed in between \*\-\_ Gitea allows a zero to three space lead-in with multiple separators, so let's base the regex on that. --- Bullet and Num lists should be indented by tabs or groups of spaces set to tab_width == 4 spaces. Any deviation is allowed, for example indentation==3 spaces would show in the list, but it would be better if the kate highlighting didn't work for any list item which is out of line. The original markdown spec states: >List markers must be followed by one or more spaces or a tab. although Markdown.pl allows for one or more tabs. Gitea only supports one tab or 1-4 spaces, so let's base the regex on that. The code regex needs modifying otherwise any indentation by any other number of spaces shows as code. --- The regex for the code between single backticks doesn't work where there is more than one block in a line. The whole string between the first and last backtick is highlighted where it should only be the individual code blocks that are highlighted. AND Re: the addition of inlinecoderegex - this needs to be for any number of backticks, which gitea supports. Let's reserve the attribute 'inlinecode' for a single line code block, and 'code' for whatever is indented by tab[s] or 4x spaces. The inlinecode string seems to cover these two scenarios, except it doesn't provide a check of whether the ending number of backticks equals the number at the start. I would think though that in 99.99% of cases a single backtick will be used. --- >blue and red are not bad, but it may be worth considering red abd green backgrounds, as in many online diff tools. This is syntax highlighting for kate and I think it would be better to colour the text, to be consistent with diff.xml. The addition of the grey background highlights the block and the extent of the diff as there are other elements in a markdown document which aren't present in a diff/patch. --- For the strikeout, any string containing a tilde, whether escaped or not, won't be struck out. And there should be no tab or space after the starting tilde pair. --- The 'tilde' regexes don't work for a `~~~diff ... ~~~` block - it just shows as normal text. Add tildes to the pre* regexes - not strictly correct because start and end must be the same character, but who's not going to use matching sets at each end? --- I think there's more that can be added. Gitea for example, supports the commonmark use of a backslash for a line-break.
Owner

Please, is there anything waiting to move forward? It would be good to finish it soon to be part of the upcoming R14.0.13.

Please, is there anything waiting to move forward? It would be good to finish it soon to be part of the upcoming R14.0.13.
Owner

Please, is there anything waiting to move forward? It would be good to finish it soon to be part of the upcoming R14.0.13.

Nothing holding this up other than me reviewing the latest comments from @Ray-V. I will have a look through the weekend or next week and feedback. Definitely something to include in R14.0.13.

> Please, is there anything waiting to move forward? It would be good to finish it soon to be part of the upcoming R14.0.13. Nothing holding this up other than me reviewing the latest comments from @Ray-V. I will have a look through the weekend or next week and feedback. Definitely something to include in R14.0.13.
Owner

Please, is there anything waiting to move forward? It would be good to finish it soon to be part of the upcoming R14.0.13.

Nothing holding this up other than me reviewing the latest comments from @Ray-V. I will have a look through the weekend or next week and feedback. Definitely something to include in R14.0.13.

Well, thank you.

> > Please, is there anything waiting to move forward? It would be good to finish it soon to be part of the upcoming R14.0.13. > > Nothing holding this up other than me reviewing the latest comments from @Ray-V. I will have a look through the weekend or next week and feedback. Definitely something to include in R14.0.13. Well, thank you.
Owner

Sorry, had a busy week and so I haven't looked at this yet. Will do this week though, so we can add this to R14.0.13.

Sorry, had a busy week and so I haven't looked at this yet. Will do this week though, so we can add this to R14.0.13.
Owner

Ok, finally got some time to dedicate to this.
Comments on the first two points, more to follow later. In the end I will upload a commit with the new proposed changes.

Gitea allows a zero to three space lead-in with multiple separators, so let's base the regex on that.

Thanks for the correction on the leading separators. I had not noticed that if we have more than 3 separators, the line should not be considered an horizontal line at all. So we do need to limit the leading numbers to a max of 3.
I don't see the point to replace \s with [ \t] though. Although technically \s matches characters that should not be considered (\n, \r, \f), \s is used extensively in most of the other regexes in use.
[ \t] makes the regex harder to read without any real benefit.

The new proposed regex is:
<!ENTITY rulerregex '^ {,3}([\*\-_]\s*){3,}\s*$'>


Bullet and Num lists should be...

Good point to bring up. According to the original markdown specs:

List markers typically start at the left margin, but may be indented by up to three spaces. List markers must be followed by one or more spaces or a tab.

gitea/github and some online websites follow that, so we should limit the leading spaces to a max of 3.
For the spaces following the marker, gitea/github and online websites seems to behave differently if there are 5 or more spaces between marker and the rest of the line. I propose we follow the spec and we don't put a limit on the number of spaces after the marker.
The new proposed regex are:

<RegExpr context="bullet"  String="^ {,3}[\*\+\-]\s" />
<RegExpr context="numlist" String="^ {,3}[\d]+\.\s" />

With this, no change to the code regex is required since there is no overlapping between the rules for bullet, numlist and code.

Ok, finally got some time to dedicate to this. Comments on the first two points, more to follow later. In the end I will upload a commit with the new proposed changes. > Gitea allows a zero to three space lead-in with multiple separators, so let's base the regex on that. Thanks for the correction on the leading separators. I had not noticed that if we have more than 3 separators, the line should not be considered an horizontal line at all. So we do need to limit the leading numbers to a max of 3. I don't see the point to replace `\s` with `[ \t]` though. Although technically `\s` matches characters that should not be considered (`\n`, `\r`, `\f`), `\s` is used extensively in most of the other regexes in use. `[ \t]` makes the regex harder to read without any real benefit. The new proposed regex is: `<!ENTITY rulerregex '^ {,3}([\*\-_]\s*){3,}\s*$'>` --- > Bullet and Num lists should be... Good point to bring up. According to the original markdown specs: `List markers typically start at the left margin, but may be indented by up to three spaces. List markers must be followed by one or more spaces or a tab`. gitea/github and some online websites follow that, so we should limit the leading spaces to a max of 3. For the spaces following the marker, gitea/github and online websites seems to behave differently if there are 5 or more spaces between marker and the rest of the line. I propose we follow the spec and we don't put a limit on the number of spaces after the marker. The new proposed regex are: ``` <RegExpr context="bullet" String="^ {,3}[\*\+\-]\s" /> <RegExpr context="numlist" String="^ {,3}[\d]+\.\s" /> ``` With this, no change to the `code` regex is required since there is no overlapping between the rules for `bullet`, `numlist` and `code`.
Owner

The regex for the code between single backticks doesn't work where there is more than one block in a line.

Good catch, it was indeed wrong. But the proposed solution is also wrong, because it can't handle single ticks within code sections. Most notably the example in the initial markdown page:
There is a literal backtick (`) here.
I have tested this at length and it seems the current regex engine that is used in TDE does not support non greedy regex nor lookahead with backreferences. Therefore I had to come up with a bit of a trick, but that seems to handle all the cases correctly.
Proposed regex for it is:

<RegExpr attribute="code" String="``.*``" />
<RegExpr attribute="code" String="`[^`]*`" />

This handles the cases where code starts with 1 or 2 backticks. If we have 3 backticks we are handling them as a fenced code block already.

I don't think we need to introduce an addtional inlinecode itemData, since it would presents the possible problem that code and inlinecode could be rendered in a different way.

The proposed soultion seems to handle the following cases correctly:

``There is a literal backtick (`) here.` df` ``
``There is a literal backtick (`) here.` df` `
a `` `foo` `` bar
``Use `code` in your Markdown file.``

a `bc` d `ef` g
a `bc` d `ef g
a ``bc` d `ef g

Nevertheless the original name of inlinecode I had proposed was confusing (a better name would have been fencedcodeblock). I also noticed that such blocks are rendered differently from code blocks defined by 4 spaces or tab. I will do further investigation on that and write back when fixed.

> The regex for the code between single backticks doesn't work where there is more than one block in a line. Good catch, it was indeed wrong. But the proposed solution is also wrong, because it can't handle single ticks within code sections. Most notably the example in the initial markdown page: ``There is a literal backtick (`) here.`` I have tested this at length and it seems the current regex engine that is used in TDE does not support non greedy regex nor lookahead with backreferences. Therefore I had to come up with a bit of a trick, but that seems to handle all the cases correctly. Proposed regex for it is: ``` <RegExpr attribute="code" String="``.*``" /> <RegExpr attribute="code" String="`[^`]*`" /> ``` This handles the cases where code starts with 1 or 2 backticks. If we have 3 backticks we are handling them as a fenced code block already. I don't think we need to introduce an addtional `inlinecode` `itemData`, since it would presents the possible problem that `code` and `inlinecode` could be rendered in a different way. The proposed soultion seems to handle the following cases correctly: ``` ``There is a literal backtick (`) here.` df` `` ``There is a literal backtick (`) here.` df` ` a `` `foo` `` bar ``Use `code` in your Markdown file.`` a `bc` d `ef` g a `bc` d `ef g a ``bc` d `ef g ``` Nevertheless the original name of `inlinecode` I had proposed was confusing (a better name would have been `fencedcodeblock`). I also noticed that such blocks are rendered differently from code blocks defined by 4 spaces or tab. I will do further investigation on that and write back when fixed.
Owner

I have fixed the problem with the visualization of pre and pretilde blocks: their content now shows in the same way of code.
Please note that we need to keep both pre and pretilde contexts separate because 3 tildes will not terminate a fenced block started with 3 backticks and viceversa. So the change in commit e6abb13e to unify the two regex into one is not correct.

@Ray-V @SlavekB The question is: do we want code blocks (created by 4 space indentation) and fenced blocks (created with 3 backticks/tildes) to look different? Often we use 3 backticks to show code, so do you think it is better to show code and fenced blocks with the same colors?


This is syntax highlighting for kate and I think it would be better to colour the text, to be consistent with diff.xml.

I take the point in having something consistent with what we do for diff.xml and in such case I am ok with it, as long as we do exactly the same (no gray background).
On the other hand I also propose: why we don't update diff.xml to use green/red diffs in gitea/github style? @Ray-V @SlavekB what do you think?


For the strikeout, any string containing a tilde, whether escaped or not, won't be struck out.
And there should be no tab or space after the starting tilde pair.

Good point in not having a separator after ~~. And from testing, the same applies to the character before the ending ~~.
New proposed regex is:

 <!ENTITY strikeoutregex "~~[^~\s].*[^~\s]~~">

For linebreak, good suggestion to add the backslash character. The regex can be simplified as follow though:

 <!ENTITY linebreakregex "(  |\\)$">

The 'tilde' regexes don't work for a ~~~diff ... ~~~ block - it just shows as normal text.

Good catch. Fixed that.
@Ray-V @SlavekB currently the filenames in diffs are shown with the same color. It would be good to differentiate them in red/green or red/blue as gitea/github do. What do you think?

I have fixed the problem with the visualization of `pre` and `pretilde` blocks: their content now shows in the same way of `code`. Please note that we need to keep both `pre` and `pretilde` contexts separate because 3 tildes will not terminate a fenced block started with 3 backticks and viceversa. So the change in commit e6abb13e to unify the two regex into one is not correct. @Ray-V @SlavekB The question is: do we want code blocks (created by 4 space indentation) and fenced blocks (created with 3 backticks/tildes) to look different? Often we use 3 backticks to show code, so do you think it is better to show code and fenced blocks with the same colors? --- > This is syntax highlighting for kate and I think it would be better to colour the text, to be consistent with diff.xml. I take the point in having something consistent with what we do for diff.xml and in such case I am ok with it, as long as we do exactly the same (no gray background). On the other hand I also propose: why we don't update diff.xml to use green/red diffs in gitea/github style? @Ray-V @SlavekB what do you think? --- > For the strikeout, any string containing a tilde, whether escaped or not, won't be struck out. And there should be no tab or space after the starting tilde pair. Good point in not having a separator after `~~`. And from testing, the same applies to the character before the ending `~~`. New proposed regex is: ``` <!ENTITY strikeoutregex "~~[^~\s].*[^~\s]~~"> ``` --- For linebreak, good suggestion to add the backslash character. The regex can be simplified as follow though: ``` <!ENTITY linebreakregex "( |\\)$"> ``` --- > The 'tilde' regexes don't work for a ~~~diff ... ~~~ block - it just shows as normal text. Good catch. Fixed that. @Ray-V @SlavekB currently the filenames in diffs are shown with the same color. It would be good to differentiate them in red/green or red/blue as gitea/github do. What do you think?
MicheleC referenced this issue from a commit 2 years ago
MicheleC force-pushed feat/markdown_mods from e6abb13e34 to 653f20b8c5 2 years ago
Owner

I have rebased the branch on top of master and add a commit based on the explanation above. Please review and comment.
Once we are all happy with the final version, all those commits can be squashed into one before merging.

I have rebased the branch on top of master and add a commit based on the explanation above. Please review and comment. Once we are all happy with the final version, all those commits can be squashed into one before merging.
Owner

I have fixed the problem with the visualization of pre and pretilde blocks: their content now shows in the same way of code.
Please note that we need to keep both pre and pretilde contexts separate because 3 tildes will not terminate a fenced block started with 3 backticks and viceversa. So the change in commit e6abb13e to unify the two regex into one is not correct.

@Ray-V @SlavekB The question is: do we want code blocks (created by 4 space indentation) and fenced blocks (created with 3 backticks/tildes) to look different? Often we use 3 backticks to show code, so do you think it is better to show code and fenced blocks with the same colors?

Yes, we commonly use three backticks in the same meaning as a fenced block to show code. So it seems like a good idea to be in the same colors.


This is syntax highlighting for kate and I think it would be better to colour the text, to be consistent with diff.xml.

I take the point in having something consistent with what we do for diff.xml and in such case I am ok with it, as long as we do exactly the same (no gray background).
On the other hand I also propose: why we don't update diff.xml to use green/red diffs in gitea/github style? @Ray-V @SlavekB what do you think?

Yes, green/red appears to be a frequently used combination, so yes, it would make sense for diff.xml to be united with others.


The 'tilde' regexes don't work for a ~~~diff ... ~~~ block - it just shows as normal text.

Good catch. Fixed that.
@Ray-V @SlavekB currently the filenames in diffs are shown with the same color. It would be good to differentiate them in red/green or red/blue as gitea/github do. What do you think?

The names of the files can be considered as part of the diff control headers that are in one color. Therefore, it seems good to me that such headers do not use red/green and do not disturb real changes in content.

> I have fixed the problem with the visualization of `pre` and `pretilde` blocks: their content now shows in the same way of `code`. > Please note that we need to keep both `pre` and `pretilde` contexts separate because 3 tildes will not terminate a fenced block started with 3 backticks and viceversa. So the change in commit e6abb13e to unify the two regex into one is not correct. > > @Ray-V @SlavekB The question is: do we want code blocks (created by 4 space indentation) and fenced blocks (created with 3 backticks/tildes) to look different? Often we use 3 backticks to show code, so do you think it is better to show code and fenced blocks with the same colors? > Yes, we commonly use three backticks in the same meaning as a fenced block to show code. So it seems like a good idea to be in the same colors. > --- > > > This is syntax highlighting for kate and I think it would be better to colour the text, to be consistent with diff.xml. > > I take the point in having something consistent with what we do for diff.xml and in such case I am ok with it, as long as we do exactly the same (no gray background). > On the other hand I also propose: why we don't update diff.xml to use green/red diffs in gitea/github style? @Ray-V @SlavekB what do you think? > Yes, green/red appears to be a frequently used combination, so yes, it would make sense for diff.xml to be united with others. > --- > > > The 'tilde' regexes don't work for a ~~~diff ... ~~~ block - it just shows as normal text. > > Good catch. Fixed that. > @Ray-V @SlavekB currently the filenames in diffs are shown with the same color. It would be good to differentiate them in red/green or red/blue as gitea/github do. What do you think? > The names of the files can be considered as part of the diff control headers that are in one color. Therefore, it seems good to me that such headers do not use red/green and do not disturb real changes in content.
Owner

@SlavekB thanks for the feedback. I have now pushed a further commit to align the colors for diff and markdown and set on green/red scheme as default.
If a user wants a different color scheme, it has the possibility to customize the colors within Kate to his/her liking.

@SlavekB thanks for the feedback. I have now pushed a further commit to align the colors for diff and markdown and set on green/red scheme as default. If a user wants a different color scheme, it has the possibility to customize the colors within Kate to his/her liking.
MicheleC force-pushed feat/markdown_mods from 5b7fdae779 to 4e910c2ad6 2 years ago
MicheleC merged commit 4e910c2ad6 into master 2 years ago
MicheleC deleted branch feat/markdown_mods 2 years ago
Owner

Merged as agreed with @SlavekB on jabber discussions.
@Ray-V thanks for the great work!

Merged as agreed with @SlavekB on jabber discussions. @Ray-V thanks for the great work!
MicheleC added this to the R14.0.13 release milestone 2 years ago
MicheleC changed title from improve markdown syntax highlighting in kate to Improve markdown syntax highlighting in kate 2 years ago
The pull request has been merged as 4e910c2ad6.
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: TDE/tdelibs#174
Loading…
There is no content yet.