TDE incorrectly assigns mimetype application/octet-stream to text file #211

Closed
opened 11 months ago by VinceR · 11 comments
Collaborator

Problem illustration

# xdg-mime query filetype sample.ebuild # version 1.1.3
text/plain

# mimetype  --brief --debug sample.ebuild # version 0.30
> Data dirs are: /home/User/.local/share, /home/User/.local/share/flatpak/exports/share, /var/lib/flatpak/exports/share, /usr/local/share, /usr/tqt3/share, /usr/trinity/14/share, /usr/share, /opt/share, /etc/eselect/wine/share, /usr/local/TDEDIRlast/share
> Checking inode type
> Checking globs for basename 'sample.ebuild'
> Checking for extension '.ebuild'
> Value "#" at offset 1 matches at /usr/share/mime/magic line 1369
> Failed nested rules
> Value "01010000" at offset 113 matches at /usr/share/mime/magic line 2109
> Failed nested rules
> File exists, trying default method
text/plain

# file --brief sample.ebuild # version 5.44
Gentoo ebuild, EAPI 8, ASCII text

# file --brief --mime sample.ebuild
application/vnd.gentoo.ebuild; charset=us-ascii

# tdefile --allValues sample.ebuild
sample.ebuild: Unknown (application/octet-stream)
sample.ebuild: Cannot determine metadata

The problem was detected in system running branch master (commit 6c8b373f34943b76845e0de979d76f9344bf25bb) and file-5.44.

The problem did not manifest on system running branch master (commit cc4de7a3cc20aab13430bad461a4e1e35cca794d and file-5.43.

Problem is similar to one I reported in 2016 that was fixed in commit f54496a1f2d99bea12af3db999a53515109f99a3

The attached sample file will need to be renamed sample.ebuild.txt => sample.ebuild to illustrate the problem - gitea did not allow me to attach it with the original name, which sounds like another example of not correctly identifying a text file :)

Problem illustration ``` # xdg-mime query filetype sample.ebuild # version 1.1.3 text/plain # mimetype --brief --debug sample.ebuild # version 0.30 > Data dirs are: /home/User/.local/share, /home/User/.local/share/flatpak/exports/share, /var/lib/flatpak/exports/share, /usr/local/share, /usr/tqt3/share, /usr/trinity/14/share, /usr/share, /opt/share, /etc/eselect/wine/share, /usr/local/TDEDIRlast/share > Checking inode type > Checking globs for basename 'sample.ebuild' > Checking for extension '.ebuild' > Value "#" at offset 1 matches at /usr/share/mime/magic line 1369 > Failed nested rules > Value "01010000" at offset 113 matches at /usr/share/mime/magic line 2109 > Failed nested rules > File exists, trying default method text/plain # file --brief sample.ebuild # version 5.44 Gentoo ebuild, EAPI 8, ASCII text # file --brief --mime sample.ebuild application/vnd.gentoo.ebuild; charset=us-ascii # tdefile --allValues sample.ebuild sample.ebuild: Unknown (application/octet-stream) sample.ebuild: Cannot determine metadata ``` ----- The problem was detected in system running branch master (commit `6c8b373f34943b76845e0de979d76f9344bf25bb`) and `file-5.44`. The problem did not manifest on system running branch master (commit `cc4de7a3cc20aab13430bad461a4e1e35cca794d` and `file-5.43`. Problem is similar to one I reported in 2016 that was fixed in commit `f54496a1f2d99bea12af3db999a53515109f99a3` The attached sample file will need to be renamed `sample.ebuild.txt` => `sample.ebuild` to illustrate the problem - gitea did not allow me to attach it with the original name, which sounds like another example of not correctly identifying a text file :)
VinceR added the SL/regression label 11 months ago
MicheleC added this to the R14.1.1 release milestone 11 months ago
Owner

Thanks @VinceR, I can replicate the issue here.
Do you want to work on a fix yourself of should we take a look instead?

Thanks @VinceR, I can replicate the issue here. Do you want to work on a fix yourself of should we take a look instead?
Poster
Collaborator

Thanks @VinceR, I can replicate the issue here.
Do you want to work on a fix yourself of should we take a look instead?

Since this is terra incognita for me, I think it would be best if somebody else started on this. At first glance, I didn't really understand the solution to the 2016 problem.

But since I am a stakeholder (I work with this file type a lot), I would be willing to learn from others as to how tackle such problems in the future.

For instance, I am now curious: for file type identification, when is TDE relying on its legacy mimelnk files, and when does it turn xdg-mime? and libmagic.so (from file)? What is the order of query?

> Thanks @VinceR, I can replicate the issue here. > Do you want to work on a fix yourself of should we take a look instead? Since this is *terra incognita* for me, I think it would be best if somebody else started on this. At first glance, I didn't really understand the solution to the 2016 problem. But since I am a stakeholder (I work with this file type a lot), I would be willing to learn from others as to how tackle such problems in the future. For instance, I am now curious: for file type identification, when is TDE relying on its legacy `mimelnk` files, and when does it turn `xdg-mime`? and `libmagic.so` (from `file`)? What is the order of query?
Owner

It's terra incognita for me too at the moment, but will have a look between here and R14.1.1 release and see if I can come up with a fix. Currently I don't have an answer for your questions above, but once I go through the code I should be able to clear some of those questions.

It's *terra incognita* for me too at the moment, but will have a look between here and R14.1.1 release and see if I can come up with a fix. Currently I don't have an answer for your questions above, but once I go through the code I should be able to clear some of those questions.
Owner

I would say that it certainly does not depend on the commits in tdelibs, but on the version of file.

Here is one long-term problem that identification instead of mime types defined in the system /usr/share/mime depends on the mime types defined in TDE /opt/trinity/share/mimelnk. If file identification (libmime) returns a type that is not defined in TDE, it is displayed as unknown (application/octet-stream). And this is exactly the case of ebuilds.

Therefore, there are two possible (gradual) solutions:

  1. Fast: add ebuild mime type definition to tdelibs.
  2. Long-term: adjust KMimeMagic to use the definitions from the system.

See also:
https://bugs.trinitydesktop.org/show_bug.cgi?id=2392#c7
https://bugs.trinitydesktop.org/show_bug.cgi?id=2564

I would say that it certainly does not depend on the commits in tdelibs, but on the version of `file`. Here is one long-term problem that identification instead of mime types defined in the system `/usr/share/mime` depends on the mime types defined in TDE `/opt/trinity/share/mimelnk`. If `file` identification (libmime) returns a type that is not defined in TDE, it is displayed as unknown (application/octet-stream). And this is exactly the case of ebuilds. Therefore, there are two possible (gradual) solutions: 1. Fast: add ebuild mime type definition to tdelibs. 2. Long-term: adjust KMimeMagic to use the definitions from the system. See also: https://bugs.trinitydesktop.org/show_bug.cgi?id=2392#c7 https://bugs.trinitydesktop.org/show_bug.cgi?id=2564
Owner

Thanks for the additional info Slavek.
I think we can do a quick fix for ebuild type, but we should look at proper solution for future. This is likely to require API changes, so that would go into R14.2.0, but the quick ebuild fix can go into R14.1.1.
What do you think?

Thanks for the additional info Slavek. I think we can do a quick fix for ebuild type, but we should look at proper solution for future. This is likely to require API changes, so that would go into R14.2.0, but the quick ebuild fix can go into R14.1.1. What do you think?
Owner

Yes, quick fix we can do quickly and will be applicable to R14.1.x branch. For the future thorough solution we decide on the possibility of backporting into a stable branch, depending on how we implement it.

Yes, quick fix we can do quickly and will be applicable to R14.1.x branch. For the future thorough solution we decide on the possibility of backporting into a stable branch, depending on how we implement it.
Owner

Issue #213 created to track the long term issue, while this issue will be closed once PR #212 is merged.

Issue #213 created to track the long term issue, while this issue will be closed once PR #212 is merged.
Poster
Collaborator

Thank you @SlavekB and @MicheleC for your feedback and fix!

In retrospect, it looks like file-43 had identified these files as text/plain but file-44 identified them as application/vnd.gentoo.ebuild. In both cases, the identification was based on input from Gentoo developers to file as evidenced by the un-compiled magic file /usr/share/misc/magic/gentoo.

I am not sure. but I speculate that TDE rejected the latest result of calling magic_file() because there was no mimelnk file associated with with the returned mime type. If that's the case, maybe changing the code to accept a non-null return result of this call "as is" instead of assuming application/octet-stream.

When I filed this bug, I had a false memory that TDE was already using XDG Mime for name-based file type identification. I later realized that I was using a Perl script that I created to supplement $TDEDIR/share/mimelink generated from information in /usr/share/mime. Perhaps something like this script can offer a way to automate the generation of TDE mimelnk files from update-mime-database artifacts and un-compiled libmagic magic file definitions.

Thank you @SlavekB and @MicheleC for your feedback and fix! In retrospect, it looks like `file-43` had identified these files as `text/plain` but `file-44` identified them as `application/vnd.gentoo.ebuild`. In both cases, the identification was based on input from Gentoo developers to `file` as evidenced by the un-compiled magic file `/usr/share/misc/magic/gentoo`. I am not sure. but I speculate that TDE rejected the latest result of calling `magic_file()` because there was no mimelnk file associated with with the returned mime type. If that's the case, maybe changing the code to accept a non-null return result of this call "as is" instead of assuming `application/octet-stream`. When I filed this bug, I had a false memory that TDE was already using XDG Mime for name-based file type identification. I later realized that I was using a Perl script that I created to supplement `$TDEDIR/share/mimelink` generated from information in `/usr/share/mime`. Perhaps something like this script can offer a way to automate the generation of TDE mimelnk files from `update-mime-database` artifacts and un-compiled `libmagic` magic file definitions.
Owner

Hi @VinceR,
thanks for the extra info. We will see what is the best way to fix mime detection, definetely using system mime types is the way forward. Issue #213 is open to remind that for us and we will definitely fix for R14.2.0 because it is an important part. Perhaps even backport to R14.1.x if the changes are compatible somehow.

Hi @VinceR, thanks for the extra info. We will see what is the best way to fix mime detection, definetely using system mime types is the way forward. Issue #213 is open to remind that for us and we will definitely fix for R14.2.0 because it is an important part. Perhaps even backport to R14.1.x if the changes are compatible somehow.
Owner

Leave the returned result as is is a problem, because according to the known mime type TDE gets information on what actions can be offered for a given type of file. If the type is unknown, it will have the same result as an application/octet-stream. That is why we need to solve to become a known file type. And that's why it will be useful to load information from XML files in the system instead of converting to desktop files specific to TDE.

Leave the returned result _as is_ is a problem, because according to the _known_ mime type TDE gets information on what actions can be offered for a given type of file. If the type is unknown, it will have the same result as an `application/octet-stream`. That is why we need to solve to become a known file type. And that's why it will be useful to load information from XML files in the system instead of converting to desktop files specific to TDE.
Owner

This was fixed by PR #212. Issue #213 tracks the issue of switching to a better mime detection system.

This was fixed by PR #212. Issue #213 tracks the issue of switching to a better mime detection system.
MicheleC closed this issue 9 months ago
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: TDE/tdelibs#211
Loading…
There is no content yet.